11th July 2021

Calling MD4C from PHP via FFI

1. Problem statement. When using one of the static site generators an important part of all of them is to convert Markdown to HTML. In my case I use Saaze, and I measured roughly 60% of the overall runtime is used for converting Markdown to HTML. I have written on Saaze here and here. When converting my roughly 320 posts it took two seconds. When my machine is fully loaded with other computations, for example astrophysical computations, then converting to static takes four seconds. Of that total runtime more than 60% were only located in the toHtml() routine.

PHP 8 offers FFI, Foreign Function Interface. It was inspired by LuaJIT FFI. PHP FFI is a very easy to use interface to C routines, authored by Dmitry Stogov. Although writing a PHP extension is quite easy, calling C routines via FFI is dead-simple. Hence, it was natural to substitute the toHtml() in PHP with MD4C.

FFI has to be enabled in php.ini, see for example PHP extension seg-faulting.

2. C library. MD4C is a C library and auxiliary stand-alone executable to convert Markdown to HTML. It was written by Martin Mitas. It is installed on many Linux distributions by default, as it is used in Qt.

MD4C is very fast. It is faster than cmark. In many cases it is 2-5 times faster than cmark. See Why is MD4C so fast?.

Test name Simple input MD4C (seconds) Cmark (seconds)
cmark-benchinput.md (benchmark from CMark) 0.3650 0.7060
long-block-multiline.md "foo\n" * 1000000 0.0400 0.2300
long-block-oneline.md "foo " * 10 * 1000000 0.0700 0.1000
many-atx-headers.md "###### foo\n" * 1000000 0.0900 0.4670
many-blanks.md "\n" * 10 * 1000000 0.0700 0.3110
many-emphasis.md "foo " * 1000000 0.1100 0.8460
many-fenced-code-blocks.md "~~~\nfoo\n~~~\n\n" * 1000000 0.1600 0.4010
many-links.md "a " * 1000000 0.2100 0.5110
many-paragraphs.md "foo\n\n" * 1000000 0.0900 0.4860

Here is another speed comparison between cmark, md4c and commonmark.js:

Implementation Time (sec)
commonmark.js 0.59
cmark 0.12
md4c 0.04

3. PHP and C code. The code to be called instead of toHtml is therefore:

$ffi = FFI::cdef("char *md4c_toHtml(const char*);","/srv/http/php_md4c_toHtml.so");
$html = FFI::string( $ffi->md4c_toHtml($markdown) );

For testing the call to md4c_toHtml() use below PHP program with a string of Markdown as first argument:

<?php
    $ffi = FFI::cdef("char *md4c_toHtml(const char*);","/srv/http/php_md4c_toHtml.so");
    printf("argv1 = %s\n", $argv[1]);
    $markdown = file_get_contents($argv[1]);
    $html = FFI::string( $ffi->md4c_toHtml($markdown) );
    printf("%s", $html);

The routine md4c_toHtml() is:

/* Provide md4c to PHP via FFI
   Copied many portions from Martin Mitas:
       https://github.com/mity/md4c/blob/master/md2html/md2html.c

   Compile like this:
       cc -fPIC -Wall -O2 -shared php_md4c_toHtml.c -o php_md4c_toHtml.so -lmd4c-html

   This routine is not thread-safe. For threading we either need a thread-id passed
   or using a mutex to guard the static/global mbuf.
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <md4c-html.h>



struct membuffer {
    char* data;
    size_t asize;
    size_t size;
};



static void membuf_init(struct membuffer* buf, MD_SIZE new_asize) {
    buf->size = 0;
    buf->asize = new_asize;
    if ((buf->data = malloc(buf->asize)) == NULL) {
        fprintf(stderr, "membuf_init: malloc() failed.\n");
        exit(1);
    }
}



static void membuf_grow(struct membuffer* buf, size_t new_asize) {
    buf->data = realloc(buf->data, new_asize);
    if(buf->data == NULL) {
        fprintf(stderr, "membuf_grow: realloc() failed.\n");
        exit(1);
    }
    buf->asize = new_asize;
}



static void membuf_append(struct membuffer* buf, const char* data, MD_SIZE size) {
    if(buf->asize < buf->size + size)
        membuf_grow(buf, buf->size + buf->size / 2 + size);
    memcpy(buf->data + buf->size, data, size);
    buf->size += size;
}



static void process_output(const MD_CHAR* text, MD_SIZE size, void* userdata) {
    membuf_append((struct membuffer*) userdata, text, size);
}



static struct membuffer mbuf = { NULL, 0, 0 };


char *md4c_toHtml(const char *markdown) {	// return HTML string
    int ret;
    if (mbuf.asize == 0) membuf_init(&mbuf,16777216);

    mbuf.size = 0;	// prepare for next call
    ret = md_html(markdown,strlen(markdown),process_output,&mbuf,MD_DIALECT_GITHUB,0);
    membuf_append(&mbuf,"\0",1); // make it a null-terminated C string, so PHP can deduce length
    if (ret < 0) return "<br>- - - Error in Markdown - - -<br>\n";

    return mbuf.data;
}

3. Application range. Any PHP based static-site generator would therefore profit if it simply used MD4C. But also any PHP based CMS employing Markdown. For a list of generators see Jamstack.

  1. Jigsaw, based on Blade templates like Saaze
  2. Statamic, commercial license
  3. Stati by Jonathan Foucher, Jekyll compatible
  4. Saaze
  5. Pico CMS, flat file CMS using Twig templates, not a static site generator
  6. Grav, a flat-file CMS
  7. Sculpin, static site generator using Twig templates

4. Benchmarks. Benchmarks were run on a fully loaded machine:

    1[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]   Tasks: 116, 352 thr; 8 running
    2[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]   Load average: 7.54 7.64 7.59 
    3[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]   Uptime: 14 days, 07:28:59
    4[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
    5[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
    6[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
    7[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
    8[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
  Mem[||||||||||||||||||||||||||||||||||||||||||||||||||  35.6G/60.8G]
  Swp[                                                          0K/0K]

    PID USER      PRI  NI  VIRT   RES   SHR S CPU%â–½MEM%   TIME+  Command                                                                           
 449817 edh        20   0 45.7G 34.8G  112M S 793. 55.9     936h /usr/bin/python /usr/bin/ipython -i script/playground.py                          
 449911 edh        20   0 45.7G 34.8G  112M R 99.8 55.9     101h /usr/bin/python /usr/bin/ipython -i script/playground.py
 449913 edh        20   0 45.7G 34.8G  112M R 99.8 55.9     101h /usr/bin/python /usr/bin/ipython -i script/playground.py
 449918 edh        20   0 45.7G 34.8G  112M R 99.8 55.9     120h /usr/bin/python /usr/bin/ipython -i script/playground.py
 449909 edh        20   0 45.7G 34.8G  112M R 99.1 55.9     102h /usr/bin/python /usr/bin/ipython -i script/playground.py
 449914 edh        20   0 45.7G 34.8G  112M R 99.1 55.9     101h /usr/bin/python /usr/bin/ipython -i script/playground.py
 449915 edh        20   0 45.7G 34.8G  112M R 99.1 55.9     101h /usr/bin/python /usr/bin/ipython -i script/playground.py
 449912 edh        20   0 45.7G 34.8G  112M R 98.4 55.9     102h /usr/bin/python /usr/bin/ipython -i script/playground.py
 449910 edh        20   0 45.7G 34.8G  112M R 97.8 55.9     101h /usr/bin/python /usr/bin/ipython -i script/playground.py
 565502 klm        20   0 41.6G  313M  151M S  2.0  0.5  0:58.04 /usr/lib/brave-bin/brave --type=renderer --field-trial-handle=5497587067748927688,
 563438 klm        20   0 1379M  107M 71204 S  0.7  0.2  0:07.56 /usr/lib/Xorg vt7 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -nolisten tcp -
 563557 klm        20   0 1138M  385M  182M S  0.7  0.6  0:06.74 /usr/lib/brave-bin/brave
 564279 klm        20   0  848M 69248 55692 S  0.7  0.1  0:02.30 /usr/lib/brave-bin/brave --type=utility --utility-sub-type=audio.mojom.AudioServic
 564290 klm         9 -11  671M 13736  9412 S  0.7  0.0  0:05.52 /usr/bin/pulseaudio --daemonize=no --log-target=journal
 565585 klm        20   0 41.6G  313M  151M S  0.7  0.5  0:05.11 /usr/lib/brave-bin/brave --type=renderer --field-trial-handle=5497587067748927688,
 566103 klm        20   0  9772  5356  3536 R  0.7  0.0  0:00.73 htop
      1 root       20   0  169M  7760  4664 S  0.0  0.0  1:48.04 /sbin/init

In my case, using Saaze on a heavily loaded machine, runtimes previously were:

$ time php saaze build
Building static site in /home/klm/tmp/sndsaaze/build...
        execute(): filePath()=/home/klm/tmp/sndsaaze/content/blog.yml, entries=1, totalPages=11, entries_per_page=30
        execute(): filePath()=/home/klm/tmp/sndsaaze/content/music.yml, entries=1, totalPages=1, entries_per_page=30
        execute(): filePath()=/home/klm/tmp/sndsaaze/content/pages.yml, entries=1, totalPages=1, entries_per_page=30
Finished creating 3 collections and 315 entries (3.46 secs / 10.79MB), md2html=2.1529788970947, MathParser=0.13162446022034
        real 3.58s
        user 2.92s
        sys 0
        swapped 0
        total space 0

The time for MD4C is roughly two seconds. The MathParser for handling Twitter, YouTube, etc. needs extra 0.1 seconds.

Now this CPU time went down:

$ time php saaze build
Building static site in /home/klm/tmp/sndsaaze/build...
        execute(): filePath()=/home/klm/tmp/sndsaaze/content/blog.yml, entries=1, totalPages=11, entries_per_page=30
        execute(): filePath()=/home/klm/tmp/sndsaaze/content/music.yml, entries=1, totalPages=1, entries_per_page=30
        execute(): filePath()=/home/klm/tmp/sndsaaze/content/pages.yml, entries=1, totalPages=1, entries_per_page=30
Finished creating 3 collections and 315 entries (1.99 secs / 10.29MB), md2html=0.27019762992859, MathParser=0.13629722595215
        real 2.12s
        user 1.27s
        sys 0
        swapped 0
        total space 0

The time for MD4C is roughly 0.27 seconds for 315 entries. That is almost 8-times faster than previously.