sam sam - 1 year ago 64
Python Question

parsing python file with re

I have a python file as

import os
class test():

def __init__(self):

def add(num1, num2):
return num1+num2

I am reading this file in a string as :

with open('', 'r') as myfile:

print data

Now, my data contains the string with all lines and new lines.
I need to find lines with start of class and def.

for example:

I need the output to be printed as :

class test():
def __init__(self):
def add(num1, num2):

How can I process this using regular expressions?

Answer Source

If you want to follow a regex approach, use

re.findall(r'(?m)^[ \t]*((?:class|def)[ \t].*)', data)


re.findall(r'^[ \t]*((?:class|def)[ \t].*)', data, flags=re.M)

See regex demo

The point is that you should use ^ as the beginning of the line anchor (hence, (?m) at the start or re.M flag are necessary), then you match horizontal whitespaces (with [ \t]), then either class or def (with (?:class|def)), and then again a space or tab and then 0+ chars other than a newline (.*).

If you plan to also handle Unicode whitespace, you need to replace [ \t] with [^\S\r\n\f\v] (and use the re.UNICODE flag).

Python demo:

import re
p = re.compile(r'^[ \t]*((?:class|def)[ \t].*)', re.MULTILINE)
s = " \n\nimport os\nclass test():\n\n    def __init__(self):\n        pass\n\n    def add(num1, num2):\n        return num1+num2"
# => ['class test():', 'def __init__(self):', 'def add(num1, num2):']