Magesh Kumaar Magesh Kumaar - 2 months ago 4x
Java Question

Regex to extract values from Insert Query

Let's say I have this query


I'm using the following method to extract the values alone

public String extractValues(String queryLine){
return queryLine.substring(queryLine.indexOf('(') + 1,queryLine.lastIndexOf(')'));

But later I realised that the
Insert query
can take the form of following,

INSERT INTO `TABLENAME` VALUES(xxx,yyy,zzz),(aaa,bbb,ccc);

How can I extract the values now?

I'm thinking maybe
is the way to go and split to occur on
but that does not seem right.What if one of the values has those? Therefore am not sure.

Please guide.

Case I'm struggling to handle

INSERT INTO `TABLENAME` VALUES(xxx,yyy,zzz),(aaa,'bb(,)b',ccc);

Required output:
xxx,yyy,zzz and aaa,'bb(,)b',ccc

P.S: The commas in between the values does not matter


Recognize a certain pattern in language that support char literals (that has to be excluded from the recognition) is solved generally with this scheme:

  • find and substitute the pattern you're searching inside char literals, for example 'b),(bb' should become 'b###bb'
  • perform recognition on the modified string
  • restore the pattern inside char literals, so 'b###b' should return 'b),(bb'

I don't know how to accomplish this with a single statement or a single regular expression. Just for sake of this argument I write the following code.

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ValuesListParse {

  public static void main(String[] args) {

    String stmt = " INSERT INTO `TABLENAME` VALUES(xxx,yyy,zzz),(aaa,'bb),(b',ccc),(aaa,bbb,ccc); ";
    System.out.printf("%s\n", stmt);

    //extract values list
    stmt = stmt.split("(?i)\\s+values\\s*\\(|\\);")[1];
    System.out.printf("values list: %s\n", stmt);

    // identify pattern between '' (char literal) and replace it with another pattern: the 2nd group is what we want 
    /* UPDATE modified regexp from ('.*)(\\)\\,\\()(.*') to ('\\w*)(\\)\\,\\()+(\\w*') to manage multiple literals containig the pattern */
    Matcher m = Pattern.compile("('\\w*)(\\)\\,\\()+(\\w*')").matcher(stmt);
    while (m.find()) {
        stmt = stmt.substring(0, m.start(2)) + "###" + stmt.substring(m.end(2), stmt.length());
        System.out.printf("values list with pattern subst inside char literal: %s\n", stmt);

    // split multiple values
    String[] multiValues = stmt.split("\\)\\,\\(");

    // adjust char literal
    for (int i = 0; i < multiValues.length ; i++) {
        multiValues[i] = multiValues[i].replaceAll("###", "),(");