sbay2 sbay2 - 1 year ago 76
Ruby Question

Format Email Reply in Ruby

I am making my own email client in ruby, it can currently parse/read-in messages. It also can create a reply to a message, set the headers, and send the message to the original sender.

How do I add the original quoted message to the reply?

How should I go about formatting the original message in the reply? Is there a best practice or format? MIME/RFC? I know there should a string for HTML and text. Just unsure on how to make these strings.

Right now my replies don't have original message below and makes it complicated to understand on its own.

Answer Source

Composing email replies is quite a challenge, especially at the very beginning where you have no clue where to start.

Recently I've had to compose such emails and send them programmatically. What I did in the first place was to see how over email clients do this, like Thunderbird. It requires some experimentation and patience though.

The overall structure of the message I used was heavily based on this Stack Overflow answer:

1. HTML part

Note, that you have few options: either compose the HTML fragment (the contents of a typical <body> tag) or a whole HTML document (with <html>, <head> and <body> tags). I took a look at how Thunderbird does it. Turns out it create the whole document, which is generated roughly like this:

  1. Create the HTML document
  2. Add the meta information <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> in the <head> section (replacing the charset with the one you prefer)
  3. In the <body> section add the HTML fragment you composed, add the caption of the quote (like: "<div>Few days ago, John Smith wrote: </div>") and add the <blockquote> block right after it: <blockquote cite="" type="cite">. Note that there's a message id of the original message.

And here's the part which I don't really like about Thunderbird:

  1. Copy the HTML content of the original message and paste it into the <blockquote> block.

Thunderbird doesn't really check whether the copied HTML is a fragment or a document. However if it is a document, it strips the <html> and <head> tags ... while leaving their content. As the result you can see <style> and <title> tags from the <head> section of the original message sitting in the <body> tag of the new message. That's messy.

Additionally Thunderbird doesn't cope with global styling. You can easily compose a tricky mail with global styling instead of the inline one and when the recipient of the message starts composing the reply, the styling bleeds over the entire message.

You can do the same thing. It doesn't really hurt anybody, they're rather quirks that normally are not observed for typical mails. Plus it's easy. Or you can go a little bit further and clean up this mess.

Firstly you have to get yourself any HTML parser. I'm using Nokogiri, and the way I use it is like this:

  1. It automatically converts any fragment to the HTML document, so there's no need to analyze fragments and documents separately
  2. Find the <body> tag in the document and copy its contents
  3. Delete any <style> tags you find
  4. Copy the result where it's needed

It would roughly look like this:

doc = Nokogiri::HTML.parse(strHTML)
body = doc.css('body')[0]
body.css('style').each { |node|

puts body.inner_html

Nokogiri has also one more benefit - if you have any inline images in the HTML message, you can easily find them, replace the URL with the "cid:..." scheme and add the image as an inline attachment.

2. Plain text part

Right, and there's also the plain text version of the mail in the multipart/alternative part. The most crucial procedure here is the ability to convert any HTML text to the plain text version. That's even trickier than composing the HTML part. After all, you'd have to write a simplistic rendering engine (just like any other web browser does). There could be gems just for that, unfortunately I couldn't find any at the time.

Few bullet points to get you started though:

  • All line breaks (\r\n or \n) should be replaced with a single space
  • All multiple spaces should be reduced to one only (unless they're non-breaking)
  • Certain tags preserve the contents while others don't (like <style> or <script> tags vs <b> or <div>)
  • Certain tags require a line break(s) after them (<br> and block tags like <p> and <div> being examples)
  • You'd have to properly format a table. You'd have to compute the width of the column, consider colspans and rowspans, pad the contents of the cells with spaces to align them etc.
  • You'd have to find an alternative markup for <b>, <i>, ... tags (like surrounding them with asterisks or whatnot)
  • You can also format the headings: <h1>, <h2>, ... tags by adding lines of dashes or asterisks below and/or above them
  • You'd have to properly format <a> tags, i.e. convert them into the format: Stack Overflow site []
  • You'd have to discard the <img> tags and perhaps replace them with the alternative text, if present
  • You'd also have to decode HTML entities (&gt; and the like). If not Nokogiri, the HTMLEntities gem might help in this case

The list can go on and on. Of course it's needless

There are some libraries and projects on the Internet that do this, however they're not written for Ruby and/or they're missing some of those features listed above. Examples being:

Once you have that out of your way, the structure of the text/plain part is practically the same as the HTML part. At the very beginning there goes your reply. Then, the quote caption and then the cited message. It's usually formatted so that each line is preceded with the '>' character. Now, there's a question of what exactly you should paste in there.

First option is to convert the HTML part of the original message (by the methods above) and paste it as the quoted message. Second is to use the text/plain part of the original message (if it exists) and paste it without any conversion whatsoever. The latter option has the benefit, that '>' characters from a long conversation will be accumulated after time in a tree manner. Plus it preserves the plain text formatting the sender might have assembled manually for it to be more accurate.

3. Summary

Depending on your actual needs and the level of quality you want to achieve, the difficulty of composing such mail can range from easy/tricky to hard, especially if you'd have to code all of it by yourself. If you happen to find any Ruby gems that would help you at at least some of these tasks, don't hesitate and use them.

Composing the HTML part can be as easy as copying and pasting HTML pieces into each other, preferably with some tags stripped beforehand. Composing the plain text part can be as easy as deleting few tags entirely (<head>, <script>, <style>, ...), stripping all tags while leaving their content and decoding all HTML entities, in that order.

Deleting HTML tags can be done with a regular expression, but it's strongly discouraged and is considered a tool in a poor man's toolbox. So I'd suggest using Nokogiri or anything similar for that purpose.

And while that was not really a part of a question, I have to stress one aspect of writing an email client. You should always remember to sanitize your HTML messages, especially the ones that you receive. There's nothing good in suspiciously looking iframes or scripts in the incoming mail, which, when not blocked/filtered by spam filters right away, might be a part of an XSS attack. In this case, Sanitize gem might prove useful.