Carlos Vergara Carlos Vergara - 6 months ago 26
PHP Question

How do I extract e-mail attachments from raw e-mail without IMAP functions?

The title pretty much says it all, but I'll try to flesh the issue a bit.

A PHP application of mine needs to read e-mails from a socket (this was a requirement) and then use some of those e-mails (having an api token) as articles in the application (it's a cms).

I've been able to get the reading part kind of going, but now we're stuck in parsing them; concretely our issue is that an e-mail I might receive will 99% of the time look like this:

MIME-Version: 1.0\r\n
Received: by {ip_number} with {protocol}; {iso_date}\r\n
Date: {iso_date}\r\n
Delivered-To: {destination}\r\n
Message-ID: {sample_message_id}\r\n
Subject: {subject}\r\n
From: {sender}\r\n
To: {destination}\r\n
Content-Type: multipart/mixed; boundary={sample_boundary}\r\n
\r\n
--{sample_boundary}\r\n
Content-Type: multipart/alternative; boundary={sample_boundary_2}\r\n
\r\n
--{sample_boundary_2}\r\n
Content-Type: text/plain; charset={charset}\r\n
\r\n
{file_content}\r\n
--\r\n
{signature}\r\n
\r\n
--{sample_boundary_2}\r\n
Content-Type: text/html; charset={charset}\r\n
\r\n
{content_html}\r\n
{signature_html}\r\n
--{sample_boundary_2}--\r\n
--{sample_boundary}\r\n
Content-Type: image/jpeg; name="{file_name}"\r\n
Content-Disposition: attachment; filename="{file_name}"\r\n
Content-Transfer-Encoding: base64\r\n
X-Attachment-Id: {sample_attachment_id}\r\n
\r\n
{quoted_printable_file_contents}\r\n
--{sample_boundary}--\r\n


And while I've been trying to regex them out I simply haven't been able to. The fact that standard e-mails should end their lines in
\n
but some
do in \r\n
combined with the nesting thing is too much for me to handle.

There's a library in PHPClasses that splits e-mails into MIME parts (along with a bunch of other things), written by some Manuel Lemos guy who clearly knew what he was doing since it's really efficient and returns nicely formatted and parsed, but it doesn't cut it for me.

The library itself consists of +2500 lines of unintelligible gibberish I can't make any sense of (it being written in 3 different camelCases and using assorted indentation styles along with different types of ifs (like
if():
and
if()
and
if(){}
and loops like
for(;;)
,
for(){}
and
for():
does not make it much simpler)

Could anyone please give me a hand here?

Thank you very much!

-- Edited to add

Following Sjoern's advice I started building a solution to my own question (thanks!!). I'm still open to more suggestions though; surely there's better ways of doing it)

class MimePartsParser{
protected function hasContentType($string){
return strtolower(trim(substr($string,0,14))) == 'content-type';
}
protected function hasTransferEncoding($string){
return strpos($string, 'Content-Transfer-Encoding')!==false;
}
protected function getBoundary($from){
preg_match('/boundary="(?P<boundary>(.*))"/', $from, $matches);
if(isset($matches['boundary']) AND count($matches['boundary']>0)){
return $matches['boundary'];
}
}
protected function cleanMimePart($msg){
$msg = trim($msg);
return trim(substr(trim($msg),0,strlen(trim($msg))-3));
}
protected function parseMessage($msg){
$parts = array();
if($boundary = $this->getBoundary($msg)){
$msgs = explode($boundary, $msg);
foreach($msgs as $msg){
if($msg = $this->parseMessage($msg)){
$parts []= $msg;
}
}
}
else{
if($this->hasContentType($msg) AND $this->hasTransferEncoding($msg)){
$parts []= $this->cleanMimePart($msg);
}
}
return $parts;
}
protected function flattenArray($array){
$flat = array();
foreach(new RecursiveIteratorIterator(new RecursiveArrayIterator($array)) as $key => $item){
$flat []= $item;
}
return $flat;
}
public function parse($string){
return $this->flattenArray($this->parseMessage($string));
}
}
/*Usage example*/
$mimeParser = new MimePartsParser;
var_dump($mimeParser->parse(file_get_contents('sample.txt')));

Answer

Make a function which parses a message and recursively call it.

First, parse the whole message. If you encounter this:

Content-Type: multipart/mixed; boundary={sample_boundary}

Split the message on {sample_boundary}. Then parse each submessage.

function parseMessage($message) {
    // Put some code here to determine the split
    $messages = explode($boundary, $message);
    $result = array();
    foreach ($messages as $message) {
        $result[] = parseMessage($message);
    }
    return $result;
}
Comments