I am trying to parse some docstrings.
An example docstrings is:
Test if a column field is larger than a given value
This function can also be called as an operator using the '>' syntax
- DbColumn self
- string or float value: the value to compare to
in case of string: lexicographic comparison
in case of float: numeric comparison
m = re.search('(.*)(Arguments:.*)(Returns:.*)', s, re.DOTALL)
re.search('^(.*?)(Arguments:.*?)?(Returns:.*)?$', s, re.DOTALL)
Just making the second and third groups optional by appending a
?, and making the qualifiers of the first two groups non-greedy by (again) appending a
? on them (yes, confusing).
Also, if you use the non-greedy modifier on the first group of the pattern, it'll match the shortest possible substring, which for
.* is the empty string. You can overcome this by adding the end-of-line character (
$) at the end of the pattern, which forces the first group to match as few characters as possible to satisfy the pattern, i.e. the whole string when there's no
Arguments and no
Returns sections, and everything before those sections, when present.
Edit: OK, if you just want to capture the text after the
Returns: tokens, you'll have to tuck in a couple more groups. We're not going to use all of the groups, so naming them —with the
<?P<name> notation (another question mark, argh!)— is starting to make sense:
>>> m = re.search('^(?P<description>.*?)(Arguments:(?P<arguments>.*?))?(Returns:(?P<returns>.*))?$', s, re.DOTALL) >>> m.groupdict()['description'] "Test if a column field is larger than a given value\n This function can also be called as an operator using the '>' syntax\n\n " >>> m.groupdict()['arguments'] '\n - DbColumn self\n - string or float value: the value to compare to\n in case of string: lexicographic comparison\n in case of float: numeric comparison\n ' >>> m.groupdict()['returns'] '\n DbWhere object' >>>