Anmol_Sharma Anmol_Sharma - 3 months ago 12
Python Question

Regular expression for anything except ']]' characters

I had posted a question earlier but it wasn't very clear to understand so it here goes again:

I have a string which looks like this :

{
"1000":[ [some whitespace and nonwhitespace characters],
[some whitespace and nonwhitespace characters],
....
[some whitespace and nonwhitespace characters]],

"1001":[ [some whitespace and nonwhitespace characters],
[some whitespace and nonwhitespace characters],
....
[some whitespace and nonwhitespace characters]],
...
}


and I want to extract a record like the one shown below using regex :

"1000":[ [some whitespace and nonwhitespace characters],
[some whitespace and nonwhitespace characters],
....
[some whitespace and nonwhitespace characters]]


I'm doing this in python using re module

Now for this I have in my mind the pattern :

' "[0-9]{4}":(anything except ]] ) '


but I can't figure out what will be the pattern for anything except ']]'

could anyone help?

Answer

A regex solution can be achieved using something like:

\d{4}":(.*?)]]

But you really don't want to use regex here if your string is a valid JSON. It's very natural for Python to work with JSONs. Assuming your data is:

data = {key1: [[str1], [str2], ...], ...}

You can simply grab the value of key1 by accessing the corresponding key:

data[key1]

this will give you: