M.D M.D - 2 months ago 7
Python Question

Extracting string before the quotations

I am trying to parse a pdf in python and extract string in quotations. I am able to extract the text in quotations but I also want to extract the name before the quotation starts.
For example:
Consider this

Ziblatt, Daniel. 2004. "Rethinking the Origins of Federalism: Puzzle, Theory, and Evidence from Nineteenth-Century Europe,"

I am able to extract everything quotations but I want the name to be extracted as well .
This is the code I am using.. Please help

def quotes(x):
quoted = re.compile('"[^"]*"')
for value in quoted.findall(x):
print value

Answer

Capturing data before a double-quote should work:

def quotes(x):
    quoted = re.compile('(.+)"[^"]+"')
    for value in quoted.findall(x):
        print value.strip()

I get this ouput:

>>> quotes(text)
'Ziblatt, Daniel. 2004.'
Comments