developer033 developer033 - 7 months ago 17
Java Question

Matches lookbehind / ahead multiple times

Code:

public static void main(String[] args) {
String mainTag = "HI";
String replaceTag = "667";
String text = "92<HI=/><z==//HIb><cHIhi> ";
System.out.println(strFormatted(mainTag, replaceTag, text));

mainTag = "aBc";
replaceTag = "923";
text = "<dont replacethis>abcabc< abcabcde >";
System.out.println(strFormatted(mainTag, replaceTag, text));
}

private static String strFormatted(String mainTag, String replaceTag, String text) {
return text.replaceAll("(?i)(?<=<)" + mainTag + "(?=.*>)", replaceTag);
}


So, I want to replace
mainTag
(variable) for
replaceTag
(variable) only inside tags (
<...>
).

In the example above I want to replace the mainTag
HI
(case insensitive) in all occurrences inside
<...>
with
667
, but my code only replaces the first occurrence.

Examples:



92<HI=/><z==//HIb><cHIhi>


Expected output:

92<667=/><z==//667b><c667667>


(mainTag = "HI", replaceTag = "667")

<dont replacethis>abcabc<abcabcde>


Expected output:

<dont replacethis>abcabc<923923de>


(mainTag = "aBc", replaceTag = "923");

Note: My code is wrong not only because he replaces only 1 time, but also because it only works if the "mainTag" succeeds the "<", in other words, the lookbehind only works for an unique situation.

Answer

You just need look-ahead here. The idea is to find all the mainTags, which are followed by a >, and then matching pairs of <>, and replace with replaceTag. The following regex would work:

text.replaceAll("(?i)" + mainTag + "(?=[^<>]*>(?:[^<>]*<[^<>]*>)*[^<>]*)", replaceTag);

Explanation:

(?i)               # Ignore Case
mainTag            # Match mainTag
(?=                # which is followed by
    [^<>]*         # Some 0 or more characters which are not < or >
    >              # Close the bracket (this ensures, mainTag is between closing bracket
    (?:            # Start a group (to match pair of bracket)
        [^<>]*     # non-bracket characters
        <          # Start a bracket 
        [^<>]*     # non-bracket characters
        >          # End the bracket
    )*             # Match the pair 0 or more times.
    [^<>]*         # Non-bracket characters 0 or more times.
)

The above regex really assumes that brackets are always balanced. For unbalanced regex, this might give unexpected results. But then regex is not really the tool for such job.

Otherwise a regex a simple as this would also work fine:

mainTag + (?=[^<>]*>)

that depends upon your use-case. This doesn't worry about balanced brackets. You can try the second one first, if it fits all scenario, then it's best.

Comments