, 5 min read

Adding Categories to Saaze

This blog runs using Simplified Saaze. Almost every blog post contains categories and tags. Although previously these categories and tags were not linked together in any way. I.e., there is no way to find all blog post relating to one specific category.

Task at hand: Add categories and tags to Simplified Saaze.

When I imported my content from WordPress I also imported categories and tags. Though, they were just shown at the bottom of each post. There was no connection between different post having the same category or tag.

It turns out that categories and tags can be added to Simplified Saaze without changing a single line in Saaze. It is sufficient to add a simple Perl script and change templates.

1. Overview. The overall architecture is as given below.

flowchart LR subgraph MAIN direction TB A[php saaze] --> B["TemplateManager::renderEntry()"] B --> C[category.php] C --> D[[index.html]] B --> E[tag.php] E --> F[[index.html]] end subgraph Perl direction LR G["blogcategory *.md"] --> H[[cat_and_tag.json]] end Perl --> MAIN

A separate program generates a JSON file with categories and tags. This file is then picked up by the template PHP code. The JSON file holds information for categories and tags in one single file. The JSON looks something like this:

{
        "categories": {
                "Android": [
                        "<a href=\"../../blog/2013/03-13-screenshots-on-nexus-4-android-4-x\">2013-03-13: Screenshots on Nexus 4 (Android 4.x)</a>",
                        "<a href=\"../../blog/2013/08-04-google-now-emergency-alert\">2013-08-04: Google Now Emergency Alert</a>",
    ...
        "tags": {
                "/etc/shells": [
                        "<a href=\"../../blog/2015/10-05-linux-pam-and-etcshells\">2015-10-05: Linux pam and /etc/shells</a>"
                ],
                "2GB": [
                        "<a href=\"../../blog/2014/07-05-splitting-large-files-on-microsoft-windows\">2014-07-05: Splitting Large Files on Microsoft Windows</a>"
                ],
    ...
}

This JSON file needs to be regenerated whenever new categories or tags show up. For example, when a new blog-post is written, or a previous post is changed in the frontmatter. In all other cases this JSON file stays untouched. It is similar to the way indexes are used in TeX or LaTeX.

2. Generate JSON file. Generating the JSON file from the frontmatter information in every Markdown file can be done by any program. I chose to write it in 50 lines of Perl. The script expects each Markdown file including directory as argument. Then the frontmatter in each Markdown file is parsed for date, title, categories, and tags. With these values it can then populate two hashes, called %cat and %tag. Key of each hash is either category or tag. Each value is an array. Each arry element is an URL pointing to the file where originated.

Finally the two hashes are printed out in JSON format.

#!/bin/perl -W
# Read frontmatter of Markdown file and write categories and tags to single JSON file
# Call like this:
#     blogcategory `find blog -name \*.md` > cat_and_tag.json
#
# Elmar Klausmeier, 22-Mar-2022: listSplit(), readMkd()
# Elmar Klausmeier, 01-Apr-2022: date, @ARGV, references in listSplit()
# Elmar Klausmeier, 02-Apr-2022: write JSON
# Elmar Klausmeier, 03-Apr-2022: JSON now in sub prtInnerJSON for both cat's & tags


use strict;
my (%cat, %tag);	# each is hash of array of strings

sub listSplit(@) {
    my $s = $_[0];
    $s =~ s/^\s*\[\s*"//;	# strip ["
    $s =~ s/"\s*\]//;	# strip "]
    return split(/",\s*"/,$s);
}


sub readMkd(@) {	# read Markdown file and put categories and tags in hashes
    my $fname = $_[0];
    #printf("fname=%s\n",$fname);
    open(F,"<$fname") || die("Cannot open $fname");
    my ($threedash,$draft,$title,$date,@catArr,@tagArr) = (0,0,"","",(),());
    while (<F>) {
        chomp;
        s/\s+$//;	# rtrim
        if (/^\-\-\-\s*$/) { last if (++$threedash >= 2); }
        elsif (/^title:/) { $title = substr($_,6); $title =~ s/^\s*"//; $title =~ s/"$//; }
        elsif (/^date:/) { $date = substr($_,5); $date =~ s/^\s*"//; $date = substr($date,0,10); }
        elsif (/^draft:\s*true/) { $draft = 1; last; }
        elsif (/^categories:/) { @catArr = listSplit(substr($_,11)); }
        elsif (/^tags:/) { @tagArr = listSplit(substr($_,5)); }
    }
    close(F) || die("Cannot close $fname");
    return if ($draft == 1);
    $fname =~ s/\.md$//;
    my $url = "<a href=\\\"../../$fname\\\">$date: $title</a>";
    foreach (@catArr) { push $cat{$_}->@*, $url; }
    foreach (@tagArr) { push $tag{$_}->@*, $url; }
}


sub prtInnerJSON(@) {
    my $href = $_[0];	# hash reference
    my $n = keys %{$href};	# for comma at end of list
    foreach my $key (sort keys %{$href}) {
        print "\t\t\"$key\": [\n";
        my $m = scalar @{%{$href}{$key}};	# for comma at end of list
        foreach (sort @{%{$href}{$key}}) {
            printf("\t\t\t\"%s\"%s\n", $_, --$m ? "," : "");
        }
        printf("\t\t]%s\n", --$n ? "," : "");
    }
}


while (<@ARGV>) {
    readMkd($_);
}

# Write the two hashes in JSON format
print "{\n\t\"categories\": {\n";
prtInnerJSON(\%cat);
print "\t},\n\t\"tags\": {\n";
prtInnerJSON(\%tag);
print "\t}\n}\n";

Running above Perl script blogcategory in my content directory on ca. 400 blog posts takes 30ms on an Intel i5-4250U, clocking 2.6 GHz.

$ time blogcategory `find . -name \*.md` > cat_and_tag.json
        real 0.03s
        user 0.03s
        sys 0
        swapped 0
        total space 0

3. Templates. In Simplified Saaze template files are just ordinary PHP files. The template categories.php for categories is as below. Here we use a feature in Saaze and Simplified Saaze: Every Markdown file can have its own template file, i.e., with the template keyword in the frontmatter one can specify which template to use. If no template keyword is found then entry is used for blog posts, and index is used for index-pages.

The PHP code require's a routine to read the JSON file. Then it simply prints the content of the JSON in table and/or list-form using prtCatOrTag().

<?php require SAAZE_PATH . "/templates/top-layout.php"; ?>
<?php require SAAZE_PATH . "/content/read_cattag_json.php"; ?>

    <div class=blogarea>
<p class=dimmedColor><?= date('jS F Y', strtotime($entry['date'])) ?></p>
<h1><?= $entry['title'] ?></h1>	

<div>
<?php prtCatOrTag($GLOBALS['cat_and_tag']['categories']); ?>
</div>
    </div>

<?php require SAAZE_PATH . "/templates/bottom-layout.php"; ?>

The PHP file to read JSON, read_cattag_json.php, is as below. It is 40 lines of PHP. It just reads the JSON file with file_get_contents() then feeds the string to json_decode().

<?php
// read JSON file and store it in GLOBALS
if (!array_key_exists('cat_and_tag',$GLOBALS)) {
    $cat_and_tag_json = @file_get_contents(SAAZE_PATH . "/content/cat_and_tag.json");
    if ($cat_and_tag_json === false)
        exit(81);
    if (($GLOBALS['cat_and_tag'] = json_decode($cat_and_tag_json,true)) === null)
        exit(82);


    function prtCatOrTag(array $hash) {	// hash contains either categories or tags
        ...
    }
}
?>

Added 13-Aug-2022: Simplified Saaze now has a new command-line argument -t added, which generates the cat_and_tag.json on the fly. So above Perl program is still valid, but no longer required.