koc koc - 1 month ago 10
PHP Question

Splitting a string by multiple separators with preg_match in PHP

There is a string consisting of maximum three parts:

Writer
,
Director
, and
Producer
. Let's call them "categories". Each category consists of two parts separated by a colon:
Label : Names
, where
Label
is one of the mentioned category names, and
Names
is a list of names separated by slashes. E.g.:

Writer : Jeffrey Schenck / Peter Sullivan / Director : Brian Trenchard-Smith / jack / Producer : smith


I want to break the string into parts by the category names and the name lists with
preg_match
function. Here is what I have so far:

$pattern = '/Writer : (?P<Writer>[\s\S]+?)Director : (?P<Director>[\s\S]+?)Producer : (?P<Producer>[\s\S]+)/';
$sentence = 'Writer : Jeffrey Schenck / Peter Sullivan / Director : Brian Trenchard-Smith / jack / Producer : smith';
preg_match($pattern, $sentence, $matches);

foreach($matches as $cat => $match) {
// Do more
// echo "<b>" . $cat . "</b>" . $match . "<br />";
}


The script works well, if there are exactly all three categories in the string. It fails, if at least one of the categories is missing.

Answer

One way is to create optional groups with the well-known ? quantifier:

$pattern = '/^' .
  '(?:Writer *: *(?P<Writer>[^:]+))?' .
  '(?:Director *: *(?P<Director>[^:]+))?' .
  '(?:Producer *: *(?P<Producer>[^:]+))?' .
  '$/';
preg_match($pattern, $sentence, $matches);

where (?:) creates a non-capturing group. Note, the output array will be indexed by both numeric position indexes and names, e.g.:

Array
(
    [0] => Writer : Jeffrey Schenck / Peter Sullivan / Director : Brian Trenchard-Smith / jack / Producer : smith
    [Writer] => Jeffrey Schenck / Peter Sullivan / 
    [1] => Jeffrey Schenck / Peter Sullivan / 
    [Director] => Brian Trenchard-Smith / jack / 
    [2] => Brian Trenchard-Smith / jack / 
    [Producer] => smith
    [3] => smith
)

Another way is to use preg_match_all with extra processing:

$pattern = '/(?<=:)[^:]+/';
if (preg_match_all($pattern, $sentence, $matches)) {
  $keys = ['Writer', 'Director', 'Producer'];
  for ($i = 0; $i < count($matches[0]); ++$i)
    // The isset() checks are skipped for clarity's sake
    $a[$keys[$i]] = $matches[0][$i];

  print_r($a);
}

where (?<=:) is a positive lookbehind assertion for the : character. In this case, the resulting array will have a neat appearance:

Array
(
    [Writer] =>  Jeffrey Schenck / Peter Sullivan / Director 
    [Director] =>  Brian Trenchard-Smith / jack / Producer 
    [Producer] =>  smith
)
Comments