Khurshid Alam Khurshid Alam - 4 months ago 10
PHP Question

Extract plain-text xhtml data in php (regex)?

I was automating remember the milk tasks with zapier which trigger if anything changes in atom feed. The problem is zapier sends xhtml formatted data in plain text which I am catching using

php://input


<?php
$xhtml = file_get_contents('php://input');
?>


The raw data looks like this:

@class: rtm_due
span: [{'#text': 'Due:', '@class': 'rtm_due_title'}, {'#text': 'Sat 16 Jul 16', '@class': 'rtm_due_value'}]

@class: rtm_priority
span: [{'#text': 'Priority:', '@class': 'rtm_priority_title'}, {'#text': '1', '@class': 'rtm_priority_value'}]

@class: rtm_tags
span: [{'#text': 'Tags:', '@class': 'rtm_tags_title'}, {'#text': 'gcal-work, github', '@class': 'rtm_tags_value'}]

@class: rtm_location
span: [{'#text': 'Location:', '@class': 'rtm_location_title'}, {'#text': 'none', '@class': 'rtm_location_value'}]

@class: rtm_list
span: [{'#text': 'List:', '@class': 'rtm_list_title'}, {'#text': 'Work', '@class': 'rtm_list_value'}]


Lets say I want to extract the due-date Sat 16 Jul 16 under @class: rtm_due; How can I extract this? Will regex (
preg_match
) be any help? If so how?

Answer

Perhaps you may want to do this in a twisted & convoluted fashion (ie. within a function that uses Regex and a Looping Construct to fetch the data you need). Consider this Function below. It is worth noting that though it may appear twisted & convoluted, you are not limited to just getting the Date value. This means you also have access to all the key-value Pairs in that file: in case you need to at some point...

    <?php
        $file   = __DIR__ . "/file.txt";   //<== THE NAME OF THE FILE CONTAINING YOUR DATA


        /*************** BEGIN FUNCTIONS ***************/
        function parseFile($file){
            $arrFileContent    = [];

            // IF THE FILE DOES NOT EXIST RETURN NULL 
            if(!file_exists($file)){
                return null;
            }
            // GET THE DATA FROM THE FILE & STORE IT IN A VARIABLE
            $strFileDataContent = file_get_contents($file);

            // IF THE FILE CONTAINS NOTHING RETURN NULL AS WELL  
            if(empty($strFileDataContent)){
                return null;
            }

            // SPLIT THE CONTENTS OF THE FILE (STRING) AT THE END OF EACH LINE
            // THUS CREATING AN ARRAY OF LINES OF TEXT-DATA
            $arrFileDataLines   = explode("\n", $strFileDataContent);

            // LOOP THROUGH THE ARRAY PRODUCED ABOVE & PERFORM SOME PATTERN MATCHING
            // AND TEXT EXTRACTION WITHIN THE LOOP

            foreach($arrFileDataLines as $iKey=>$lineData){
                $arrSubLines   = explode("\n", $lineData);

                foreach($arrSubLines as $intKey=>$strKeyInfo){
                    $rxClass    = "#(^@class:)(\s*)(.*$)#i";
                    $rxSpan     = "#(^span:)(\s*)?(.+$)#si";

                    preg_match($rxClass, $strKeyInfo, $matches);
                    preg_match($rxSpan,  $strKeyInfo, $matches2);

                    if($matches) {
                        list(, $key, $null, $val) = $matches;
                        $keyA   = str_replace("rtm_", "", $val);
                        if (!array_key_exists($keyA, $arrFileContent)) {
                            $arrFileContent[$keyA] = $val;
                        }
                    }
                    if($matches2) {
                        list(, $key2, $null, $val2) = $matches2;
                        $keyB   = $keyA ."Data";
                        if (!array_key_exists($keyB, $arrFileContent)) {
                            $arrFileContent[$keyB] = parseSpanValues($val2, str_replace("rtm_", "", $keyA));
                        }
                    }
                }
            }
            return $arrFileContent;
        }

        function parseSpanValues($spanData, $prefix){
            $arrSpanData    = explode(", ",  preg_replace("#[\{\}\[\]\"\'\#\@]#", "", $spanData));
            $objSpanData    = new stdClass();

            foreach($arrSpanData as $iKey=>&$spanVal){
                $arrSplit   = preg_split("#\:\s?#", $spanVal);
                $key        = "text";

                if($iKey == 0){
                    $key    = "{$prefix}Text";
                }else if($iKey == 1){
                    $key    = "{$prefix}TextClass";
                }else if($iKey == 2){
                    $key    = "{$prefix}Value";
                }else if($iKey == 3){
                    $key    = "{$prefix}ValueClass";
                }
                if(isset($arrSplit[1])){
                    $objSpanData->$key  = $arrSplit[1];
                }
            }
            return $objSpanData;
        }
        /*************** END OF FUNCTIONS ***************/



        var_dump(parseFile($file));
        // PRODUCES SOMETHING LIKE: 
        array (size=10)
          'due' => string 'rtm_due' (length=7)
          'dueData' => 
            object(stdClass)[1]
              public 'dueText' => string 'Due' (length=3)
              public 'dueTextClass' => string 'rtm_due_title' (length=13)
              public 'dueValue' => string 'Sat 16 Jul 16' (length=13)
              public 'dueValueClass' => string 'rtm_due_value' (length=13)
          'priority' => string 'rtm_priority' (length=12)
          'priorityData' => 
            object(stdClass)[2]
              public 'priorityText' => string 'Priority' (length=8)
              public 'priorityTextClass' => string 'rtm_priority_title' (length=18)
              public 'priorityValue' => string '1' (length=1)
              public 'priorityValueClass' => string 'rtm_priority_value' (length=18)
          'tags' => string 'rtm_tags' (length=8)
          'tagsData' => 
            object(stdClass)[3]
              public 'tagsText' => string 'Tags' (length=4)
              public 'tagsTextClass' => string 'rtm_tags_title' (length=14)
              public 'tagsValue' => string 'gcal-work' (length=9)
              public 'text' => string 'rtm_tags_value' (length=14)
          'location' => string 'rtm_location' (length=12)
          'locationData' => 
            object(stdClass)[4]
              public 'locationText' => string 'Location' (length=8)
              public 'locationTextClass' => string 'rtm_location_title' (length=18)
              public 'locationValue' => string 'none' (length=4)
              public 'locationValueClass' => string 'rtm_location_value' (length=18)
          'list' => string 'rtm_list' (length=8)
          'listData' => 
            object(stdClass)[5]
              public 'listText' => string 'List' (length=4)
              public 'listTextClass' => string 'rtm_list_title' (length=14)
              public 'listValue' => string 'Work' (length=4)
              public 'listValueClass' => string 'rtm_list_value' (length=14)

So as it is right now, if you wanted to get the date for the first instance in the Array [Element dueData], you can simply do something like this:

    <? php
        $data          = parseFile($file);  
        $dateDateValue = $data['dueData']->dueValue;        

        var_dump($dateDateValue);  // PRODUCES:: 'Sat 16 Jul 16'

Hope this attempts (at all) to give you a vague idea on how to improvise on your own.

Cheers & Good Luck!!!