I want to use a headless CMS with Jekyll as the source of my blog posts. There aren’t many plugins that aim to faciliate this.
There is a WordPress jekyll-import tool, but this is intended for a one time import of Wordpress content to Markdown files inside of your Jekyll project, not a continual build process that sources all content from an API.
The Jekyll Engine
Jekyll Posts are just a natively supported form of Jekyll collection. The
documentation for Jekyll even states that if you configure your “collections” to
load from a different directory, you will need to move your _posts
and
_drafts
folder under that directory as well.
Jekyll builds your site through a process that involves the following steps:
- Read - Reads data from directories/files into the Jekyll::Site object
- Generate - Runs each of the Generators defined by plugins you’ve installed or coded yourself
- Render - Renders content in memory (markdown converted to HTML, SASS converted to CSS, etc.)
- Cleanup - Removes orphaned files and empty directories in destination
- Write - Writes static files, pages, and posts to build directory
(e.g.
_site
)
The entire process is about “reading” data from files into Ruby objects that are stored inside of the Jekyll::Site object. This includes pages, posts, collections, and data.
After this it runs the generators defined by Jekyll plugins you install, or that you write yourself.
Next it goes through a rendering process, where the content loaded from Markdown files is converted to HTML. For instance, inside of each Jeyll::Document object used to represent each blog post, the Markdown is stored to the ‘content’ attribute, but the rendered HTML is stored in the ‘output’ attribute.
The cleanup step performs the necessary file cleaning up in the site destination directory.
Lastly all the rendered data is then written to actual files under the
destination directory (e.g. defaults to _site
).
Here’s how I inspected objects in memory through each step of the site building process using IRB.
require 'jekyll'
options = {
"source" => File.expand_path("."),
"destination" => File.expand_path("./_site"),
"incremental" => true,
"profile" => true,
"watch" => true,
"serving" => true,
}
# merge build options with configuration data
options = Jekyll.configuration(options)
# initialize the site object
site = Jekyll::Site.new(options)
site.class
# => Jekyll::Site
# initialize attribute defaults
site.reset
# read data from directories/files
site.read
# inspect posts collection
site.collections['posts'].class
# => Jekyll::Collection
site.collections['posts'].docs.count
# => 163
example_doc = site.collections['posts'].docs[0]
# => #<Jekyll::Document _posts/2004-09-09-bug-tracking.md collection=posts>
example_doc.path
# => "/Developer/redconfetti.github.io/_posts/2004-09-09-bug-tracking.md"
example_doc.type
# => :posts
example_doc.data
# => {
# "draft"=>false,
# "categories"=>["php"],
# "layout"=>"post",
# "published"=>true,
# "title"=>"PHP/MySQL Bug Tracking",
# "author"=>"maxwell keyes",
# "date"=>2004-09-09 16:51:00 -0400,
# "comments"=>true,
# "tags"=>["bug tracking"],
# "slug"=>"bug-tracking",
# "ext"=>".md",
# "excerpt"=><Jekyll::Excerpt id=/2004/09/bug-tracking#excerpt>
# }
example_doc.data["permalink"]
# => nil
# Content is the markdown string
example_doc.content
# => "For anyone who needs a free web based Bug Tracking system programmed
# using\nPHP/MySQL, check out Flyspray.\n"
example_doc.output
# => nil
# After file content is loaded into Jekyll::Site, it is rendered from Markdown
# to actual HTML using site.render
site.render
# => nil
example_doc.output
# => "<!DOCTYPE html>\n<html>\n <head>\n <meta charset=\"utf-8\">\n
# <meta name=\"viewport\" content=\"width=device-width initial-scale=1\" />\n
# <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n\n
# <title>PHP/MySQL Bug Tracking</title>
# ...
# <footer class=\"footer\">\n <span class=\"footer__copyright\">
# © 2023 Jason Miller. All rights reserved.</span>\n</footer>\n\n
# </body>\n</html>\n"
site.cleanup
site.write
Custom Post Approach
I tried to write a plugin/generator for Jekyll that used a class that inherits
from Jekyll::Document, and patches various methods so that it can be used
without sourcing data from a Markdown file under the _posts
directory. I was
not able to get this to work without errors and complications.
Instead, it’s better that you create a custom page template, as suggested by
the Jekyll Generators documentation, with a generator that locates the
custom page by name, and simply adds the custom data under the pages data
hash attribute.
Liquid Drops
The only complication that could not be avoided is that the
Liquid templating system used by Jekyll will raise errors if you use
custom defined objects in your page template
(.e.g undefined method 'to_liquid'
).
If the objects you are iterating over and injecting into your page are not one of the basic Ruby types, then you’ll need to make sure the objects you’re iterating over inherit from Liquid::Drop.
See Liquid Drops
Wordpress RSS Feed Example
Here’s an example of code needed for a simple Jekyll generator that can retrieve posts from an RSS/XML feed hosted under Wordpress.com.
##########################################
# Gemfile
# XML to Hash translator
# https://github.com/savonrb/nori
gem 'nori', '~> 2.6'
# Nokogiri
# https://github.com/sparklemotion/nokogiri
gem 'nokogiri', '~> 1.15'
# Backport Jekyll Sass Converter to avoid deprecation warnings
gem 'jekyll-sass-converter', '~> 2.2'
##########################################
# _config.yml
wp_posts_page:
title: 'Blog'
layout: 'wp_posts_page'
feed_url: 'https://redconfetti.wordpress.com/feed/'
##########################################
# _plugins/wordpress_posts.rb
require "net/http"
require "uri"
require "nori"
module WordpressPosts
class Generator < Jekyll::Generator
def generate(site)
@post_page_config = site.config['wp_posts_page']
raise 'Missing Wordpress configuration in _config.yml' unless @post_page_config
page_layout = @post_page_config['layout']
page_title = @post_page_config['title']
page_slug = page_title.strip
.downcase
.gsub(/[\s\.\/\\]/, '-')
.gsub(/[^\w-]/, '')
.gsub(/[-_]{2,}/, '-')
.gsub(/^[-_]/, '')
.gsub(/[-_]$/, '')
feed_url = @post_page_config['feed_url']
post_feed = WordpressFeed.new(feed_url)
# get template
posts_page = site.pages.find { |page| page.name == 'wp_posts_page.html'}
posts_page.data['post_feed'] = post_feed.items
end
end
end
class WordpressFeed
attr_accessor :rss_url
def initialize(rss_url)
self.rss_url = rss_url
end
def rss_channel
@channel ||= begin
rss = hash_data['rss'] || {}
rss['channel']
end
end
def items
@item ||= begin
[rss_channel['item']].flatten.collect do |item|
ItemDrop.new(item)
end
end
end
private
def hash_data
@hash_data ||= begin
if !xml_string.blank?
return Nori.new.parse(xml_string)
end
end
end
def xml_string
@xml_string ||= begin
uri = URI(rss_url)
Net::HTTP.get(uri)
end
end
class ItemDrop < Liquid::Drop
attr_accessor :feed_item
def initialize(feed_item)
self.feed_item = feed_item || {}
end
def content
feed_item['content:encoded']
end
def title
feed_item['title']
end
end
end
Page template (_layouts/wp_posts_page.html
)
---
layout: page
title: Blog
permalink: /blog/
---
<ul style="list-style-type: none; padding: 0; margin: 0;">
</ul>
Storyblok
The above approach isn’t very ideal due to how Wordpress.com embeds oversized images into the posts. Also the free Wordpress hosting account limits the feed to 350 posts.
Hopefully the above example provides you with enough understanding to obtain content from any external data source and embed it into your custom Jekyll pages.
If you want to go with a free CMS for your own site, consider Storyblok. The article, Add a headless CMS to Jekyll, gives good instructions on how to use the Storyblok Ruby Gem with Jekyll. With Storyblok you can define your own schema for the objects you’re embedding in your pages, and nest the objects inside of the content of other objects.