Paul Rougieux Paul Rougieux - 1 month ago 7
Markdown Question

How to pass the correct encoding parameter to markdown_py?

I'm using markdown_py to convert markdown documents to html because I like the table of content extension.

I sometimes use accented characters, my source file is encoded as utf-8, so it shouldn't be a problem, but the output is garbage.
For example if I run

echo "é & à" | markdown_py -e utf-8 > output.html


I see this in output.html

enter image description here

Answer Source

You need to tell the browser which encoding the output is using.

Markdown only outputs an HTML fragment, not an entire document. When a browser only receives an HTML fragment, it needs to guess on the missing pieces and appears to have guessed incorrectly. Wrap your output in the appropriate tags, being sure to include a meta tag defining the encoding.

<!DOCTYPE html>
<html>

<head>
  <meta charset="UTF-8">
  <title>Page Title</title>
</head>

<body>
  {{ Markdown output goes here }}
</body>

</html>

It is not possible for Python-Markdown to output anything other than an HTML fragment. A feature request was rejected with the following explanation:

Markdown produces an (X)HTML fragment. It was never intended to do any more than that. If you want a complete HTML document, then it is your responsibility to wrap it. This has been the behavior of the original Markdown implementation from the beginning, which we mirror with no intention of changing.

More details can be found in the discussion of the feature request. In short, this is the normal expected behavior of most Markdown implementations. To add such a feature would require the tool to be more than a Markdown parser, which it is not. For simple tasks, a quick shell script may serve you just fine. For more complex jobs, a static site generator is a more appropriate tool.

The projects documentation doesn't directly address this. However it does list its first goal as:

Maintain a Python 2 and Python 3 library ... as an implementation of the markdown parser that follows the syntax rules and the behavior of the original (markdown.pl) implementation as reasonably as possible.

In fact, the behavior in this respect is exactly the same as the reference implementation (markdown.pl). As Python-Markdown's documentation only covers how to use the library and differences in behavior, there is no discussion of a feature that behaves the same.