Chris Chris - 4 months ago 22
PHP Question

PHP regex - replace but get numeric value from replaced string

I have some HTML that contains multiple HTML comments, within each comment is a form. I am trying to use preg_replace to replace these comments and the forms within with a tag in the form [CONTACT_FORM_X] where X is the numeric ID of the form.

$str = 'blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_1] -->some form goes here<!-- CONTACT FORM END 1 --> blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_2] -->another form goes here<!-- CONTACT FORM END 2 -->';

$replace = preg_replace('/<!-- CONTACT FORM START \[CONTACT_FORM_\d\] -->.*<!-- CONTACT FORM END \d -->/', '[CONTACT_FORM_X]', $str);
echo $replace;


So:

<!-- CONTACT FORM START [CONTACT_FORM_1] -->some form goes here<!-- CONTACT FORM END 1 -->


Should be replaced entirely with [CONTACT_FORM_1]

And ..

<!-- CONTACT FORM START [CONTACT_FORM_2] --> another form goes here<!-- CONTACT FORM END 2 -->


Should be replaced entirely with [CONTACT_FORM_2]

If I run my code above I get:

blah blah blah [CONTACT_FORM_X]


So my questions are:


  1. How can I get the value of \d and then use this in place of where I currently use X in my preg_replace

  2. My code only seems to replace one of the forms rather than both occurrences. How can I adapt preg_replace to allow multiple replaces


Answer

The preg_replace will replace all occurrences (it is global). The .* is greedy though and is matching everything after the <!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] until <!-- CONTACT FORM END \d -->. To capture a value use ().

So try:

.*?<!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] -->.*?<!-- CONTACT FORM END \d -->

or if you want to be sure you are matching the same closing contact form use the backreference:

.*?<!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] -->.*?<!-- CONTACT FORM END \1 -->

The leading .*? should be removed if the preceding content should be kept. It is unclear to me what the intent is with that bit. From the Should be replaced entirely with [CONTACT_FORM_2] I interpreted as that's the only content that should remain.

Regex demo: https://regex101.com/r/kS2nK6/1

PHP Usage:

<?php
$str = 'blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_1] -->some form goes here<!-- CONTACT FORM END 1 --> blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_2] -->another form goes here<!-- CONTACT FORM END 2 -->';

$replace = preg_replace('/.*?<!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] -->.*?<!-- CONTACT FORM END \d -->/', '[CONTACT_FORM_$1]', $str);
echo $replace;

PHP Demo: https://eval.in/611232