I am trying to parse a pdf in python and extract string in quotations. I am able to extract the text in quotations but I also want to extract the name before the quotation starts.
Ziblatt, Daniel. 2004. "Rethinking the Origins of Federalism: Puzzle, Theory, and Evidence from Nineteenth-Century Europe,"
I am able to extract everything quotations but I want the name to be extracted as well .
This is the code I am using.. Please help
quoted = re.compile('"[^"]*"')
for value in quoted.findall(x):
Capturing data before a double-quote should work:
def quotes(x): quoted = re.compile('(.+)"[^"]+"') for value in quoted.findall(x): print value.strip()
I get this ouput:
>>> quotes(text) 'Ziblatt, Daniel. 2004.'