, 2 min read

Ampersands in Markdown URLs

In this blog I reference web-page- or image-URLs, which contain ampersands ("&"). For example, https://www.amazon.com/s?k=Richard+Stevens&ref=nb_sb_noss. Unfortunately, Markdown as specified by John Gruber does not allow this. This can be checked with John Gruber's "dingus" web-form.

So official Markdown when confronted with

abc [uvw](../iop&a) xyz

using "dingus" results in

<p>abc <a href="../iop&amp;a">uvw</a> xyz</p>

I consider this to be an error. On 12-May-2021 I wrote an e-mail to John Gruber, but didn't receive any reply up to today. So apparently he won't fix it.

So to use ampersands, I had to add a special rule to the Saaze extension MathParser to not destroy ampersands. The logic is as follows:

private function amplink($html) {
    $begintag = array(" href=\"http", " src=\"http");
    $i = 0;
    foreach($begintag as $tag) {
        $last = 0;
        for(;;) {
            $start = strpos($html,$tag,$last);
            if ($start === false) break;
            $last = $start + 10;
            $end = strpos($html,"\"",$last);
            if ($end === false) break;
            $link = substr($html,$start,$end-$start);
            $link = str_replace("&amp;","&",$link);
            $html = substr_replace($html, $link, $start, $end-$start);
            ++$i;
        }
    }
    //printf("\t\tamplink() changed %d times\n",$i);
    return $html;
}

The PHP program has to handle href= and src= cases.

Added 30-May-2023: Checking my website with the W3C validator I discovered that I made a mistake. John Gruber is right, and I am wrong. According W3C:

In both SGML and XML, the ampersand character ("&") declares the beginning of an entity reference (e.g., ® for the registered trademark symbol "®"). Unfortunately, many HTML user agents have silently ignored incorrect usage of the ampersand character in HTML documents - treating ampersands that do not look like entity references as literal ampersands. XML-based user agents will not tolerate this incorrect usage, and any document that uses an ampersand incorrectly will not be "valid", and consequently will not conform to this specification. In order to ensure that documents are compatible with historical HTML user agents and XML-based user agents, ampersands used in a document that are to be treated as literal characters must be expressed themselves as an entity reference (e.g. "&amp;"). For example, when the href attribute of the a element refers to a CGI script that takes parameters, it must be expressed as http://my.site.dom/cgi-bin/myscript.pl?class=guest&amp;name=user rather than as http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user.

I found this reference here How to use ampersands in HTML: to encode or not to encode?.