dyesdyes dyesdyes - 5 months ago 7
Java Question

Find chars in string that are not between double qotes

I want to find the occurrences of (a) specific character(s) but this String to search can't be between quotes:

Example:

"this is \"my\" example string"


If you look for the char 'm', then it should only return the index of 'm' from "example" as the other ' is between double quotes.

Another example:

"th\"i\"s \"is\" \"my\" example string"


I'm expecting something like:

public List<Integer> getOccurrenceStartIndexesThatAreNotBetweenQuotes(String snippet,String stringToFind);


One "naive" way would be to:


  • get all the start indexes of stringToFind in snippet

  • get all the indexes of the quotes in snippet

  • Depending of the start index of stringToFind, because you have the positions of the quotes, you can know if you are between quotes or not.



Is there a better way to do this?

EDIT:

What do I want to retrieve? The indexes of the matches.

Few things:


  • There can be many quoted content in the string to search in: "th\"i\"s \"is\" \"my\" example string"

  • In the string : "th\"i\"s \"is\" \"my\" example string", "i", "is" and "my" are between quotes.

  • It's not limited to letters and digits, we can have ';:()_-=+[]{} etc...


Answer

Here's one solution:

Algorithm:

  1. Find all the "Dead Zone" regions within the String (e.g. regions that are off limits because they are within quotes)
  2. Find all the regions where the String contains the search string in question (hitZones in the code).
  3. Retain only the regions in the hitZones that are not contained in any deadZones. I will leave this part to you :)

import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class FindStrings
{
    // Just a simple model class for regions
    static class Pair
    {
        int s = 0;
        int e = 0;

        public Pair (int s, int e)
        {
            this.s = s;
            this.e = e;
        }

        public String toString ()
        {
            return "[" + s + ", " + e + "]";
        }
    }

    public static void main(String[] args)
    {
        String search = "other";

        String str = "this is \"my\" example other string. And \"my other\" this is my str in no quotes.";

        Pattern p = Pattern.compile("\"([^\"]*)\"");
        Matcher m = p.matcher(str);

        List<Pair> deadZones = new ArrayList<Pair>();
        while (m.find())
        {
            int s = m.start();
            int e = m.end();
            deadZones.add(new Pair(s, e - 1));
        }

        List<Pair> hitZones = new ArrayList<Pair>();
        p = Pattern.compile(search);
        m = p.matcher(str);
        while (m.find())
        {
            int s = m.start();
            int e = m.end();
            hitZones.add(new Pair(s, e - 1));
        }

        System.out.println(deadZones);
        System.out.println(hitZones);
    }
}

Note: The s component of all Pairs in the hitZones, that are not within deadZones, will ultimately be what you want.

Comments