Brad Brad - 3 months ago 7
PHP Question

PHP - Create a new array using list of array keys in dot notation

I'm in the process of building a REST API and all data is being returned as JSON. Each request is funneled through a single function that does the job of setting HTTP status codes, returning messages or data, setting headers, etc. I also allow users to add a

?fields=
parameter where they can specify what fields they want returned (e.g
?fields=id,hostnames,ip_addresses
), if the parameter is not present they of course get all the data returned. The function that does this is also part of the function mentioned earler that sets the headers/data/messages, etc. What I want to be able to do is allow the user to specify field names using dot notation so they can specify fields from something other than a top-level field. So for example, I have a structure like this:

{
"id": "8a2b449111b449409c465c66254c6fcc",
"hostnames": [
"webapp1-sfo",
"webapp1-sfo.example.com"
],
"ip_addresses": [
"12.26.16.10",
"ee80::ae56:2dff:fd89:7868"
],
"environment": "Production",
"data_center": "sfo",
"business_unit": "Operations",
"hardware_type": "Server",
"currently_in_maintenance": false,
"history": [
{
"id": 58,
"time_start_utc": "2013-01-27 00:40:00",
"time_end_utc": "2013-01-27 01:45:00",
"ticket_number": "CHG123456",
"reason": "January production maintenance",
"links": [
{
"rel": "self",
"link": "https://localhost/api/v1/maintenances/58"
}
]
},
{
"id": 104,
"time_start_utc": "2013-02-25 14:36:00",
"time_end_utc": "2013-02-25 18:36:00",
"ticket_number": "CHG456789",
"reason": "February production maintenance",
"links": [
{
"rel": "self",
"link": "https://localhost/api/v1/maintenances/104"
}
]
},
{
"id": 143,
"time_start_utc": "2013-03-17 00:30:00",
"time_end_utc": "2013-03-17 01:55:00",
"ticket_number": "CHG789123",
"reason": "March production maintenance",
"links": [
{
"rel": "self",
"link": "https://localhost/api/v1/maintenances/143"
}
]
}
]
}


Using this function, I can pull out top level fields (where
$mData
is the data structure above, and
$sParams
is the string of fields requested by the user):

private function removeFields($mData, $sParams){
$clone = $mData; // Clone the original data
$fields = explode(',', $sParams);

// Remove fields not requested by the user

foreach($mData as $key => $value){
if(!in_array((string)$key, $fields)){
unset($mData[$key]);
}
}

// If no fields remain, restore the original data
// Chances are the user made a typo in the fields list

if(count($mData) == 0){
$mData = $clone;
}

return $mData;
}


Note:
$sParams
comes in as a string and is what is provided by the user (comma separated list of fields they want to see).

So
?fields=hostnames,history
would return:

{
"hostnames": [
"webapp1-sfo",
"webapp1-sfo.example.com",
],
"history": [
{
"id": 58,
"time_start_utc": "2013-01-27 00:40:00",
"time_end_utc": "2013-01-27 01:45:00",
"ticket_number": "CHG123456",
"reason": "January production maintenance",
"links": [
{
"rel": "self",
"link": "https://localhost/api/v1/maintenances/58"
}
]
},
{
"id": 104,
"time_start_utc": "2013-02-25 14:36:00",
"time_end_utc": "2013-02-25 18:36:00",
"ticket_number": "CHG456789",
"reason": "February production maintenance",
"links": [
{
"rel": "self",
"link": "https://localhost/api/v1/maintenances/104"
}
]
},
{
"id": 143,
"time_start_utc": "2013-03-17 00:30:00",
"time_end_utc": "2013-03-17 01:55:00",
"ticket_number": "CHG789123",
"reason": "March production maintenance",
"links": [
{
"rel": "self",
"link": "https://localhost/api/v1/maintenances/143"
}
]
}
]
}


But if I want to return maybe just the
ticket_number
field from
history
I want the user to be able to do
?fields=history.ticket_number
or if they want the ticket number and link they could do this:
?fields=history.ticket_number,history.links.link
...which would return:

{
"history": [
{
"ticket_number": "CHG123456",
"links": [
{
"link": "https://localhost/api/v1/maintenances/58"
}
]
},
{
"ticket_number": "CHG456789",
"links": [
{
"link": "https://localhost/api/v1/maintenances/104"
}
]
},
{
"ticket_number": "CHG789123",
"links": [
{
"link": "https://localhost/api/v1/maintenances/143"
}
]
}
]
}


I've tried many different array access methods for dot notation from stack overflow but they all break when the value of
history
is a numeric array...so for instance, using the methods I've found online so far I would need to do something like this to achieve the same output above (which obviously is not good...especially when you have hundreds of records).

?fields=history.0.ticket_number,history.0.links.0.link,history.1.ticket_number,history.1.links.0.link,history.2.ticket_number,history.2.links.0.link,


I was also looking for something that was dynamic and recursive as each API endpoint returns a different data structure (for instance, when a collection is requested it returns a numeric array filled with associative arrays..or in json speak, an array of objects...and some of those objects may have arrays (numeric or associative)).

Thanks in advance

P.S. - I don't really care if the code creates a new data array containing the requested data or directly manipulates the original data (as it does in my removeFields() function).

UPDATE: I've created a PHPFiddle that should hopefully show the issue I've been running into. http://phpfiddle.org/main/code/tw1i-qu7s

Answer

thanks for your tips and help on this. I actually came up with a solution this morning that seems to work with every case I have tested so far. It may not be super elegant but works for what I need. I essentially flatten the array using dot notation for the keys in the flattened array. I then take each of the requested fields and build a regex (basically replacing any "." with a an optional .[digit]. to catch numeric indexes), then test each field name using the regex, keeping those that match. Finally, I re-expand the array back into a multi-dimensional array.

The flattened array turns into this:

Array
(
    [id] => 8a2b449111b449409c465c66254c6fcc
    [hostnames.0] => webapp1-sfo
    [hostnames.1] => webapp1-sfo.example.com
    [ip_addresses.0] => 12.26.16.10
    [ip_addresses.1] => ee80::ae56:2dff:fd89:7868
    [environment] => Production
    [data_center] => sfo
    [business_unit] => Operations
    [hardware_type] => Server
    [currently_in_maintenance] => 
    [history.0.id] => 58
    [history.0.time_start_utc] => 2013-01-27 00:40:00
    [history.0.time_end_utc] => 2013-01-27 01:45:00
    [history.0.ticket_number] => CHG123456
    [history.0.reason] => January production maintenance
    [history.0.links.0.rel] => self
    [history.0.links.0.link] => https://localhost/api/v1/maintenances/58
    [history.1.id] => 104
    [history.1.time_start_utc] => 2013-02-25 14:36:00
    [history.1.time_end_utc] => 2013-02-25 18:36:00
    [history.1.ticket_number] => CHG456789
    [history.1.reason] => February production maintenance
    [history.1.links.0.rel] => self
    [history.1.links.0.link] => https://localhost/api/v1/maintenances/104
    [history.2.id] => 143
    [history.2.time_start_utc] => 2013-03-17 00:30:00
    [history.2.time_end_utc] => 2013-03-17 01:55:00
    [history.2.ticket_number] => CHG789123
    [history.2.reason] => March production maintenance
    [history.2.links.0.rel] => self
    [history.2.links.0.link] => https://localhost/api/v1/maintenances/143
)

Below are the two functions for flattening and expanding the array:

function flattenArray($aArrayToFlatten, $sSeparator = '.', $sParentKey = NULL){
    if(!is_array($aArrayToFlatten)){
        return $aArrayToFlatten;
    }
    $_flattened = array();

    // Rewrite keys

    foreach($aArrayToFlatten as $key => $value){
        if($sParentKey !== NULL){
            $key = $sParentKey . $sSeparator . $key;
        }
        $_flattened[$key] = flattenArray($value, $sSeparator, $key);
    }

    // Flatten

    $flattened = array();
    foreach($_flattened as $key => $value){
        if(is_array($value)){
            $flattened = array_merge($flattened, $value);
        }else{
            $flattened[$key] = $value;
        }
    }

    return $flattened;
}

function expandArray($aFlattenedArray, $sSeparator = '.'){
    $result = array();
    foreach($aFlattenedArray as $key => $val){
        $keyParts = explode($sSeparator, $key);
        $currentArray = &$result;
        for($i = 0; $i < count($keyParts) - 1; $i++){
            if(!isset($currentArray[$keyParts[$i]])){
                $currentArray[$keyParts[$i]] = array();
            }
            $currentArray = &$currentArray[$keyParts[$i]];
        }
        $currentArray[$keyParts[count($keyParts)-1]] = $val;
    }

    return $result;
}

Example:

$mData = json_decode('{ "id": "8a2b449111b449409c465c66254c6fcc", "hostnames": [ "webapp1-sfo", "webapp1-sfo.example.com" ], "ip_addresses": [ "12.26.16.10", "ee80::ae56:2dff:fd89:7868" ], "environment": "Production", "data_center": "sfo", "business_unit": "Operations", "hardware_type": "Server", "currently_in_maintenance": false, "history": [ { "id": 58, "time_start_utc": "2013-01-27 00:40:00", "time_end_utc": "2013-01-27 01:45:00", "ticket_number": "CHG123456", "reason": "January production maintenance", "links": [ { "rel": "self", "link": "https:\/\/localhost\/api\/v1\/maintenances\/58" } ] }, { "id": 104, "time_start_utc": "2013-02-25 14:36:00", "time_end_utc": "2013-02-25 18:36:00", "ticket_number": "CHG456789", "reason": "February production maintenance", "links": [ { "rel": "self", "link": "https:\/\/localhost\/api\/v1\/maintenances\/104" } ] }, { "id": 143, "time_start_utc": "2013-03-17 00:30:00", "time_end_utc": "2013-03-17 01:55:00", "ticket_number": "CHG789123", "reason": "March production maintenance", "links": [ { "rel": "self", "link": "https:\/\/localhost\/api\/v1\/maintenances\/143" } ] } ] }', TRUE);

print_r($mData);   // Original Data

$fields = array("id", "hostnames", "history.id", "history.links.link");
$regexFields = array();

// Build regular expressions for each of the requested fields

foreach($fields as $dotNotatedFieldName){

    // Requested field has a dot in it -- it's not a top-level field
    // It may be part of a collection (0.fieldname.levelTwo, 1.fieldName.levelTwo,...) or be a collection (fieldName.0.levelTwo, fieldName.1.levelTwo, ...)

    if(preg_match('/\./', $dotNotatedFieldName)){
        $regexFields[] = "^\d*\.?" . str_replace(".", "\.\d*\.?", $dotNotatedFieldName);

    // Requested field does not have a dot in it -- it's a top-level field
    // It may be part of a collection (0.fieldname, 1.fieldName,...) or be a collection (fieldName.0, fieldName.1, ...)

    }else{
        $regexFields[] = "^\d*\.?" . $dotNotatedFieldName . "\.?\d*";
    }
}

// Flatten the array

$flattened = flattenArray($mData);

// Test each flattened key against each regular expression and remove those that don't match

foreach($flattened as $key => $value){
    $matchFound = FALSE;

    foreach($regexFields as $regex){
        if(preg_match('/' . $regex . '/', $key)){
            $matchFound = TRUE;
            break;
        }
    }

    if($matchFound === FALSE){
        unset($flattened[$key]);
    }

}

// Expand the array

$mData = expandArray($flattened);

print_r(json_encode($mData));  // New Data

Which outputs the following JSON:

{
   "id": "8a2b449111b449409c465c66254c6fcc",
   "hostnames": [
      "webapp1-sfo",
      "webapp1-sfo.example.com"
   ],
   "history": [
      {
         "id": 58,
         "links": [
            {
               "link": "https://localhost/api/v1/maintenances/58"
            }
         ]
      },
      {
         "id": 104,
         "links": [
            {
               "link": "https://localhost/api/v1/maintenances/104"
            }
         ]
      },
      {
         "id": 143,
         "links": [
            {
               "link": "https://localhost/api/v1/maintenances/143"
            }
         ]
      }
   ]
}