I need to extract the data within double quotes from a string.
<a href="Networking-denial-of-service.aspx">Next Page →</a>
atag = '<a href="Networking-denial-of-service.aspx">Next Page →</a>'
start = 0
end = 0
for i in range(len(atag)):
if atag[i] == '"' and start==0:
start = i
elif atag[i] == '"' and end==0:
end = i
nxtlink = atag[start+1:end]
I am taking the question exactly as written - how to get data between two double quotes. I agree with the comments that an HTMLParser might be better...
Using regular expression might help, particularly if you want to find more than one. For example, this is a possible set of code
import re string_with_quotes = 'Some "text" "with inverted commas"\n "some text \n with a line break"' Find_double_quotes = re.compile('"([^"]*)"', re.DOTALL|re.MULTILINE|re.IGNORECASE) # Ignore case not needed here, but can be useful. list_of_quotes = Find_double_quotes.findall(string_with_quotes) list_of_quotes ['text', 'with inverted commas', 'some text \n with a line break']
If you have an odd number of double quotes, then the last double quote is ignored. If none are found, then an empty list is produced.
http://www.regular-expressions.info/ is really good for learning regular expressions
Regex - Does not contain certain Characters gave me how not to do a character
https://docs.python.org/2/library/re.html#re.MULTILINE tells you what re.MULTILINE and re.DOTALL (underneath) do.