arvanaghi arvanaghi - 1 month ago 6
PHP Question

Regex to accept any set of characters, and then remember and find that same set that was provided

I want regex to accept any number of characters, and then remember that exact set of characters and then look for it later in the line.

For example, if Regex saw the line begin with 'TheseCharacters', then I would want it to match the line if it saw 'TheseCharacters' occur later in the line.

Examples (all these would match):

TheseCharacters, I really enjoy TheseCharacters.


Dog1, My favorite word is Dog1.


The following would not match:

Cakeman, oh I enjoy cakeboy.


Is this outside the scope of regex, or is there a way to dynamically do this?

Answer

It is a little hard to tell what you are trying to do, but from what I understand, you could use grouping and backreferences to accomplish this. Something like this:

<?php
$pattern = '/^(\b\w+\b).*\b\1\b.*/i';

//should match
$string = "TheseCharacters, I really enjoy TheseCharacters";
$result = preg_match($pattern, $string, $matches);
echo "String 1 matches {$result} times: ".print_r($matches,true)."\n";

//match only with case insensitive flag, not an exact match in case
$string = "TheseCharacters, I really enjoy thesecharacters";
$result = preg_match($pattern, $string, $matches);
echo "String 1 matches {$result} times: ".print_r($matches,true)."\n";

//should match, doesn't require TheseCharacters to be at the end of the string.
$string = "TheseCharacters, I really enjoy TheseCharacters and some others";
$result = preg_match($pattern, $string, $matches);
echo "String 2 matches {$result} times: ".print_r($matches,true)."\n";

//no match, TheseCharacters has been changed to TheseLetters
$string = "TheseCharacters, I really enjoy TheseLetters";
$result = preg_match($pattern, $string, $matches);
echo "String 3 matches {$result} times: ".print_r($matches,true)."\n";

//no match, additional letters has been added to TheseCharacters
$string = "TheseCharacters, I really enjoy TheseCharactersasdf";
$result = preg_match($pattern, $string, $matches);
echo "String 4 matches {$result} times: ".print_r($matches,true)."\n";

which produces this output:

String 1 matches 1 times: Array
(
    [0] => TheseCharacters, I really enjoy TheseCharacters
    [1] => TheseCharacters
)

String 1 matches 1 times: Array
(
    [0] => TheseCharacters, I really enjoy TheseCharacters
    [1] => TheseCharacters
)

String 2 matches 1 times: Array
(
    [0] => TheseCharacters, I really enjoy TheseCharacters and some others
    [1] => TheseCharacters
)

String 3 matches 0 times: Array
(
)

String 4 matches 0 times: Array
(
)

Demo: https://3v4l.org/upNhm

And explanation of the pattern here: https://regex101.com/r/DuTbyn/2

And it's not really a "variable" that is being stored. It is a group, which you can reference later on by it's group number. So initially I am matching the first group of letters/numbers from the first start of the string (^(\b\w+\b)). Then followed by any number of characters and later matching whatever was captured in that first group. The matching entire string will be available in $matches[0] and the repeating string will be available in $matches[1].

Without knowing more about what you are trying to do, this is pretty much the only way. Other ways might be to match or split each word into the individual words into an array and simply use array_count_values to get a count of each word.

Comments