John Leidegren John Leidegren - 3 months ago 42
Markdown Question

How would you go about parsing Markdown?

You can find the syntax here.

The source that follows with the download is written in Perl, which I have no intentions of honoring. It is riddled with regular expressions, and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!

I'm about to hard code a parser for Markdown. What is experience with this?

If you don't have anything meaningful to say about the actual parsing of Markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution, that is, a third-party library).

To help a bit with the answers, regular expressions are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.

  • If you think about Markdown, it's fundamentally based around the concept of paragraphs.

  • As such, a reasonable approach might be to split the input into paragraphs.

  • There are many kinds of paragraphs, for example, heading, text, list, blockquote, and code.

  • The challenge is thus to identify these paragraphs and in what context they occur.

I'll be back with a solution, once I find it's worthy to be shared.

Answer Source

The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg.

EDIT: Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in OCaml, it is extremely simple and short (268 SLOC for the parser, 43 SLOC for the HTML emitter), yet blazingly fast (20% faster than discount (written in hand-optimized C) and sixhundred times faster than BlueCloth (Ruby)), despite the fact that it isn't even optimized for performance yet. Because it is only intended for internal use by Mauricio himself for his weblog, there are a few deviations from the official Markdown specification, but Mauricio has created a branch which reverts most of those changes.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download