diff options
author | Kenny Ballou <kballou@devnulllabs.io> | 2019-03-03 16:25:27 -0700 |
---|---|---|
committer | Kenny Ballou <kballou@devnulllabs.io> | 2019-03-03 17:00:15 -0700 |
commit | 2f0cf6f713b21ee2819d3bdc9d1ef0676d098239 (patch) | |
tree | 88bf351fbe3921b9e2bfb62358ac1560a5fb0358 /posts | |
parent | 50df8db548ea70142a87a263398a35b0c65a899c (diff) | |
download | blog.kennyballou.com-2f0cf6f713b21ee2819d3bdc9d1ef0676d098239.tar.gz blog.kennyballou.com-2f0cf6f713b21ee2819d3bdc9d1ef0676d098239.tar.xz |
post: add static-site generation
Signed-off-by: Kenny Ballou <kballou@devnulllabs.io>
Diffstat (limited to 'posts')
-rw-r--r-- | posts/static-site-generation.org | 339 |
1 files changed, 339 insertions, 0 deletions
diff --git a/posts/static-site-generation.org b/posts/static-site-generation.org new file mode 100644 index 0000000..721af3b --- /dev/null +++ b/posts/static-site-generation.org @@ -0,0 +1,339 @@ +#+TITLE: (New) Static Site Generation +#+DESCRIPTION: Migration from Hugo to org-mode, some scripts, and pandoc +#+TAGS: Emacs +#+TAGS: Org-mode +#+TAGS: GNU/Linux +#+TAGS: Bash +#+TAGS: Make +#+TAGS: Pandoc +#+DATE: 2019-03-03 +#+SLUG: static-site-generation +#+LINK: blog-git https://git.devnulllabs.io/blog.kennyballou.com.git/ +#+LINK: golang https://golang.org +#+LINK: hugo https://gohugo.io/ +#+LINK: wiki-markdown https://en.wikipedia.org/wiki/Markdown +#+LINK: org-mode https://org-mode.org +#+LINK: org-manual https://orgmode.org/manual/ +#+LINK: org-mode-publish https://orgmode.org/manual/Publishing.html#Publishing +#+LINK: wiki-rst https://en.wikipedia.org/wiki/ReStructuredText +#+LINK: justin-abrah-org-publish https://justin.abrah.ms/emacs/orgmode_static_site_generator.html +#+LINK: panchekha-org-publish https://pavpanchekha.com/blog/org-mode-publish.html +#+LINK: evenchick-org-publish https://www.evenchick.com/blog/blogging-with-org-mode.html +#+LINK: ogbe-org-publish https://ogbe.net/blog/blogging_with_org.html +#+LINK: pandoc https://pandoc.org + +#+BEGIN_PREVIEW +For a few years, I've been using [[hugo][Hugo]] for blog generation. Recently, +I've decided I wanted to take static site generation into a different +direction. Specifically, I wanted to use a different source markup and I +wanted to write my own tool set for generating the actual HTML. +#+END_PREVIEW + +We'll walk through the motivation of changing the content into a different +format and changing the generation process into a completely custom set of +scripts. + +** Motivation + +When I first set down to build a blog 5 years ago, I had a pretty basic set of +requirements. + +- Lightweight + +- Native Markdown Support + +- Minimal Dependencies + +[[hugo][Hugo]] met all of these requirements quite well. The templating engine +is fairly simplistic; It supports Markdown; It's a written in [[golang][Go]], +therefore, only the built artifact is necessary for site generation. + +If it fits so well, why change? + +It worked well for what I was asking, however, as I wrote more and time went +on, the features of [[hugo][Hugo]] became more and more complex and created a +mismatch of how I wanted to express the text in the markup. I've felt this was +going in a direction I did not need nor wanted. More specifically, the issues +start surfacing more in the [[wiki-markdown][Markdown]] side. +[[wiki-markdown][Markdown]] simply lacks some features in its markup that is +corrected by blocks of HTML inline and other hacks that are standard in only +some specific "flavor" or translator implementation. [[hugo][Hugo]] attempts +to patch this over with its implementation of ~shortcode templates~, however, +these still felt unnatural. + +The final nail was discovering [[org-mode][Org-mode]]. I liked the weight of +[[wiki-markdown][Markdown]], but I didn't like its lack of features when +needed. I liked [[wiki-rst][rStructured Text]], however, I felt it was always +too heavy for the documents I was working on. + +In finally giving Emacs a full try (another blog post), I discovered +[[org-mode][Org-mode]]. It was the exact middle weight I was looking for +between [[wiki-markdown][Markdown]] and [[wiki-rst][reST]]. + +Thus, I was in search of a new tool for generating static HTML from a set of +source files written in [[org-mode][Org-mode]]. + +[[org-mode][Org-mode]] (within Emacs) has a native publish mode, and I had +discovered [[justin-abrah-org-publish][several]] +[[panchekha-org-publish][posts]] on how [[ogbe-org-publish][people]] are doing +[[evenchick-org-publish][exactly]] this. However, +[[org-mode-publish][Org-publish]] isn't exactly what I was looking for. + +Therefore, let's revise the current list of requirements: + +- Makefile driven + +- Does not require Emacs to generate + +That is, I wanted a ~Makefile~ that could generate the site contents. +Furthermore, and no less importantly, I wanted the ~Makefile~ to not include +lines like ~Emacs --quick --batch ...~. This obviously creates a bit of a +challenge since [[org-mode][Org-mode]] is an Emacs mode. + +I decided I could probably generate the content myself with a few scripts and +invocations to [[pandoc][Pandoc]]. + +** High-Level Implementation + +The core of the implementation of the new site generation is blog posts written +in [[org-mode][Org-mode]], processed by several shell scripts, using +[[pandoc][Pandoc]] to perform the translation from raw [[org-mode][Org-mode]] +markup to HTML, all of which is orchestrated by a ~Makefile~. + +I'm not going extol [[org-mode][Org-mode]]'s capabilities in this post. +There's plenty of resources on it already, no greater authority than the +[[org-manual][Org-mode Manual]] itself. + +There is, in fact, some limitations of [[org-mode][Org-mode]] due to the +choices of not allowing the generation to include ~Emacs~ itself. + +Along the tour of the implementation, it's important to note a guiding +principle in the conversion was not breaking existing links. That is, I was +and am satisfied with the folders and slug usage for posts and I didn't want +the new version to break existing links. + +** Detailed Implementation + +The easy part is generating each post. This is simply an ~index.html~ in the +correct folder. The majority of the complexities stem from the summaries and +main ~index~ page. + +*** Post Content Generation + +To generate a blog post's ~index.html~ page, we consider the following ~make~ +target: + +#+BEGIN_SRC makefile +blog_dir = $(shell $(SCRIPTS_DIR)/org-get-slug.sh $(1)) +TEMPLATE_FILES:=$(wildcard templates/*.html) + +define BLOG_BUILD_DEF +$(BUILD_DIR)$(call blog_dir,$T): + mkdir -p $$@ +$(BUILD_DIR)$(call blog_dir,$T)/index.html: $T \ + $(TEMPLATE_FILES) \ + Makefile \ + | $(BUILD_DIR)$(call blog_dir,$T) + $(SCRIPTS_DIR)/generate_post_html.sh $$< > $$@ +endef + +$(foreach T,$(POSTS_ORG_INPUT),$(eval $(BLOG_BUILD_DEF))) +#+END_SRC + +This definition is fairly opaque now. However, the definition will expand for +each post when the ~foreach~ macro expands. For example, when run, the +following targets will be defined for this post: + +#+BEGIN_SRC makefile +$(BUILD_DIR)/blog/2019/03/static-site-generation: + mkdir -p $@ +$(BUILD_DIR)/blog/2019/03/static-site-generation/index.html: posts/static-site-generation.org \ + $(TEMPLATE_FILES) \ + Makefile \ + | $(BUILD_DIR)/blog/2019/03/static-site-generation + $(SCRIPTS_DIR)/generate_post.html $< > $@ +#+END_SRC + +This will create the correct directory for each post, e.g., +~/blog/2019/03/static-site-generation~, and place the translated HTML into this +directory as ~index.html~. + +#+BEGIN_QUOTE +Note: it doesn't actually translate to ~$(TEMPLATE_FILES)~. During the +expansion of the definition, the variable ~$(TEMPLATE_FILES)~ is similarly +expanded. This is acceptable, however, since it's a static list of files and +has no bearing on which post's target is being expanded. +#+END_QUOTE + +The ~generate_post.sh~ script is fairly basic: + +#+BEGIN_SRC bash +#!/usr/bin/env bash +# Generate HTML for blog post + +ORGIN=${1} +PROJ_ROOT=$(git rev-parse --show-toplevel) +source ${PROJ_ROOT}/scripts/site-templates.sh +source ${PROJ_ROOT}/scripts/org-metadata.sh +DISPLAY_DATE=$(date -d ${DATE} +'%a %b %d, %Y') +SORT_DATE=$(date -d ${DATE} +'%Y %m %d ') + +cat ${HTML_HEADER_FILE} +cat ${HTML_SUB_HEADER_FILE} +echo -n "<h1 class=\"title\">${TITLE}</h1>" +echo -n "<div class=\"post-meta\">" +echo -n '<ul class="tags"><li><i class="fa fa-tags"></i></li>' +echo -n "${TAGS}" | awk '{ printf "<li>%s</li>", $0}' +echo -n '</ul>' +echo -n "<h4>${DISPLAY_DATE}</h4></div>" +pandoc --from org \ + --to html \ + ${ORGIN} +cat ${HTML_FOOTER_FILE} + +#+END_SRC + +The ~org-metadata.sh~ script, reads the [[org-mode][Org-mode]] preamble, lines +starting with ~#+~, and puts them into different variables available for other +scripts. For example, the ~TITLE~, ~DATE~, ~TAGS~ are pulled out and used to +generate the title section of each post. Furthermore, some templates are +pulled in to generate the headers and footers of each page. The templates are +written directly in HTML and really serve only to simplify each page with +otherwise largely duplicated content. + +*** Summary Page Generation + +The summary page is a bit more involved to generate. A few questions had to be +answered before it was possible: how to generate the summary text? And how +to sort and order posts? + +To answer the first question, I dug into how [[hugo][Hugo]] was generating +these summaries. It turns out, it really only takes the first couple hundred +characters and calls it the "summary". This depends largely on the content of +each post to actually describe the post in the first couple hundred characters. +Obviously, this led to some awkward results, especially with links and section +headings mixed in. + +To achieve similar results, it /would/ be fairly easy to write a script to +simply take the first few hundred characters after the preamble and output this +into something to be collected for the summary page. However, a better +solution is available since we are taking full control over the generation +process. Namely, we can put the preview content into a specific +[[org-mode][Org-mode]] block to be parsed out and used explicitly for this +purpose. If the summary for a post is only a sentence or two, the summary +generation process won't then start reading extra text, if the summary requires +a little more detail, it won't be cut short by the arbitrary read limit. + +To generate the preview content, the ~generate_post_preview.sh~ script is used: + +#+BEGIN_SRC bash +#!/usr/bin/env bash +# Generate HTML post summary tags + +ORGIN=${1} +PROJ_ROOT=$(git rev-parse --show-toplevel) + +source ${PROJ_ROOT}/scripts/org-metadata.sh + +echo "${LINKS}" +echo "${PREVIEW}" +#+END_SRC + +The ~LINKS~ variable is included in this file because we are generating an +intermediate file for [[pandoc][Pandoc]] to generate the summary content. +Without the ~LINKS~, any links included in the preview section would be broken. + +The second question actually turns out to be pretty easy in practice: we parse +the ~#+ DATE:~ line from the preamble and prepend it to the summary content. + +From the ~org-metadata.sh~ script: + +#+BEGIN_SRC bash file:org-metadata.sh +ORIGIN=${1} +DATE=$(awk -F': ' '/^#\+DATE:/ { printf "%s", $2}' ${ORGIN}) +#+END_SRC + +Then, from the ~generate_post_summary_html.sh~ script: + +#+BEGIN_SRC bash file: generate_post_summary_html.sh +#!/usr/bin/env bash +# Generate HTML post summary tags + +ORGIN=${1} +GENERATED_PREVIEW_FILE=${2} +PROJ_ROOT=$(git rev-parse --show-toplevel) + +source ${PROJ_ROOT}/scripts/org-metadata.sh +DISPLAY_DATE=$(date -d ${DATE} +'%a %b %d, %Y') +SORT_DATE=$(date -d ${DATE} +'%Y %m %d ') +PREVIEW_CONTENT=$(cat ${GENERATED_PREVIEW_FILE} | pandoc -f org -t html) + +echo -n "${SORT_DATE}" +echo -n '<article class="post"><header>' +echo -n "<h2><a href=\"${SLUG}\">${TITLE}</a></h2>" +echo -n "<div class=\"post-meta\">${DISPLAY_DATE}</div></header>" +echo -n "<blockquote>$(echo ${PREVIEW_CONTENT})</blockquote>" +echo -n '<ul class="tags"><li><i class="fa fa-tags"></i></li>' +echo -n "${TAGS}" | awk '{ printf "<li>%s</li>", $0}' +echo -n '</ul>' +echo -n '<footer>' +echo -n "<a href=\"${SLUG}\">Read More</a>" +echo -n "</footer>" +echo "" +#+END_SRC + +Finally, this is all put together with the ~generate_index_html.sh~ script: + +#+BEGIN_SRC bash +#!/usr/bin/env bash +# Generate index.html page + +INPUT_FILES=${@} +PROJ_ROOT=$(git rev-parse --show-toplevel) +source ${PROJ_ROOT}/scripts/site-templates.sh + +cat "${HTML_HEADER_FILE}" +echo "<body>" +cat "${HTML_SUB_HEADER_FILE}" +cat ${INPUT_FILES} | sort -r -n -k1 -k2 -k3 | awk -F' ' '{print $4}' +echo "</body>" +cat "${HTML_FOOTER_FILE}" +#+END_SRC + +Specifically, the following line is of interest with respect to properly +sorting: + +#+BEGIN_SRC bash +cat ${INPUT_FILES} | sort -r -n -k1 -k2 -k3 | awk -F' ' '{print $4}' +#+END_SRC + +Use the tab-separated date fields from before, and use them to sort each of the +post summaries onto the ~index.html~ page. + +*** RSS/XML Generation + +I also wanted to keep the RSS/XML feeds going. However, as it turns out, +generating the RSS feed was achieved by performing essentially the same steps +used for generating the summary ~index.html~ page. + +** Future Work + +There is a fairly obvious limitation of the summary page generation, but only +really obvious if I write more content. There was and is no current archive +page. Moreover, _all_ posts are put into the ~index.html~ summary page. +If/when more posts are written and published, a solution for the first page +will be necessary. However, this was necessary regardless of whether the blog +is generated using [[hugo][Hugo]] or generated via the new process. + +** Parting Thoughts + +Like many projects, this was started because I personally was dissatisfied with +the current state of options. However, that said, I did not write these +scripts to be used directly for someone else. I'm not sure I would necessarily +recommend this approach to someone else, unless, of course, they wanted to do +it to learn or to otherwise take control of their content. That said, I hope +this captures the essence of the scripts, their major functions, and the +motivations behind them. The scripts are available, WITHOUT WARRANTY, under +the [[gnu-gpl][GNU General Public License (version 3)]]. + +If you have questions or comments, feel free to reach out to me. |