30th May 2021

Generate RSS from Markdown

For this blog I wanted an RSS feed. Saaze by default does not provide this functionality. Saaze is supposed to be "stupidly simple" by design, which I consider a plus.

Luckily, generating an RSS feed is simple. It contains a header with some fixed XML. Then each post, is printed as so called "item" with

  1. link / URL
  2. publication date
  3. title
  4. an excerpt or even the full blog post

Finally the required closing XML tags. That's it.

Taking this information directly from Markdown file with some frontmatter seems to be the easiest approach. For example, the frontmatter for this blog post is:

---
date: "2021-05-30 20:00:00"
title: "Generate RSS from Markdown"
draft: false
categories: ["www"]
tags: ["RSS", "feed", "Markdown"]
author: "Elmar Klausmeier"
prismjs: true
---

Below Perl script mkdwnrss implements this. As input files it wants those blog posts which should be part of the RSS feed. So usually you will "generate" the list of files. Implementing this in PHP would be equally simple.

The excerpt is restricted to either 9 lines of Markdown or less than 500 characters.

#!/bin/perl -W
# Create RSS XML file ("feed") based on Markdown files
#
# Input: List of Markdown files (order of files determines order of <item>))
# Output: RSS (description with 3 lines of Markdown as excerpt)
#
# Example:
#      mkdwnrss `find blog/2021 -type f | sort -r`

use strict;

my $dt = localtime();
print <<"EOT";
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
    <title>Elmar Klausmeier's Blog</title>
    <description>Elmar Klausmeier's Blog</description>
    <lastBuildDate>$dt</lastBuildDate>
    <link>https://eklausmeier.goip.de</link>
    <atom:link href="https://eklausmeier.goip.de/feed.xml" rel="self" type="application/rss+xml" />
    <generator>mkdwnrss</generator>

EOT


sub item(@) {
    my $f = $_[0];
    open(F,"< $f") || die("Cannot open $f");

    my $link = $f;
    $link =~ s/\.md$/\//;
    print "\t<item>\n"
    . "\t\t<link>https://eklausmeier.goip.de/$link</link>\n"
    . "\t\t<guid>https://eklausmeier.goip.de/$link</guid>\n";

    my ($sep,$linecnt,$excerpt) = (0,0,"");
    while (<F>) {
        chomp;
        if (/^\-\-\-$/) { $sep++ ; next; }
        if ($sep == 1) {
            if (/^title:\s+"(.+)"$/) {
                printf("\t\t<title>%s</title>\n",$1);
            } elsif (/^date:\s+"(.+)"$/) {
                printf("\t\t<pubDate>%s</pubDate>\n",$1);
            }
        } elsif ($sep >= 2) {
            next if (length($_) == 0);
            if ($linecnt++ == 0) {
                print "\t\t<description><![CDATA[";
                $excerpt = $_;
            } elsif ($linecnt < 9 || length($excerpt) < 500) {
                $excerpt .= " " . $_;
            } else {
                last;
            }
        }
    }
    print $excerpt . "]]></description>\n" if ($linecnt > 0);
    print "\t</item>\n";

    close(F) || die("Cannot close $f");
}


while (<@ARGV>) {
    item($_);
}


print "</channel>\n</rss>\n";

Source code for mkdwnrss is in GitHub.

During development I checked whether my RSS looks similar to the RSS feed in WordPress: feed. I also checked Alex Le's blog post on RSS feed: Create An RSS Feed From Scratch.

Added 08-Jul-2021: When checking the RSS in W3C Feed Validation Service the dates and descriptions were marked as non-compliant. This is now corrected. Checking now gives: Valid RSS

Added 17-Jan-2022: Also see Generate RSS from HTML.