developer033 developer033 - 1 year ago 47
Java Question

Matches lookbehind / ahead multiple times


public static void main(String[] args) {
String mainTag = "HI";
String replaceTag = "667";
String text = "92<HI=/><z==//HIb><cHIhi> ";
System.out.println(strFormatted(mainTag, replaceTag, text));

mainTag = "aBc";
replaceTag = "923";
text = "<dont replacethis>abcabc< abcabcde >";
System.out.println(strFormatted(mainTag, replaceTag, text));

private static String strFormatted(String mainTag, String replaceTag, String text) {
return text.replaceAll("(?i)(?<=<)" + mainTag + "(?=.*>)", replaceTag);

So, I want to replace
(variable) for
(variable) only inside tags (

In the example above I want to replace the mainTag
(case insensitive) in all occurrences inside
, but my code only replaces the first occurrence.



Expected output:


(mainTag = "HI", replaceTag = "667")

<dont replacethis>abcabc<abcabcde>

Expected output:

<dont replacethis>abcabc<923923de>

(mainTag = "aBc", replaceTag = "923");

Note: My code is wrong not only because he replaces only 1 time, but also because it only works if the "mainTag" succeeds the "<", in other words, the lookbehind only works for an unique situation.


You just need look-ahead here. The idea is to find all the mainTags, which are followed by a >, and then matching pairs of <>, and replace with replaceTag. The following regex would work:

text.replaceAll("(?i)" + mainTag + "(?=[^<>]*>(?:[^<>]*<[^<>]*>)*[^<>]*)", replaceTag);


(?i)               # Ignore Case
mainTag            # Match mainTag
(?=                # which is followed by
    [^<>]*         # Some 0 or more characters which are not < or >
    >              # Close the bracket (this ensures, mainTag is between closing bracket
    (?:            # Start a group (to match pair of bracket)
        [^<>]*     # non-bracket characters
        <          # Start a bracket 
        [^<>]*     # non-bracket characters
        >          # End the bracket
    )*             # Match the pair 0 or more times.
    [^<>]*         # Non-bracket characters 0 or more times.

The above regex really assumes that brackets are always balanced. For unbalanced regex, this might give unexpected results. But then regex is not really the tool for such job.

Otherwise a regex a simple as this would also work fine:

mainTag + (?=[^<>]*>)

that depends upon your use-case. This doesn't worry about balanced brackets. You can try the second one first, if it fits all scenario, then it's best.