sam sam - 4 months ago 7
Python Question

parsing python file with re

I have a python file as

test.py

import os
class test():

def __init__(self):
pass

def add(num1, num2):
return num1+num2


I am reading this file in a string as :

with open('test.py', 'r') as myfile:
data=myfile.read()

print data


Now, my data contains the string with all lines and new lines.
I need to find lines with start of class and def.

for example:

I need the output to be printed as :

class test():
def __init__(self):
def add(num1, num2):


How can I process this using regular expressions?

Answer

If you want to follow a regex approach, use

re.findall(r'(?m)^[ \t]*((?:class|def)[ \t].*)', data)

or

re.findall(r'^[ \t]*((?:class|def)[ \t].*)', data, flags=re.M)

See regex demo

The point is that you should use ^ as the beginning of the line anchor (hence, (?m) at the start or re.M flag are necessary), then you match horizontal whitespaces (with [ \t]), then either class or def (with (?:class|def)), and then again a space or tab and then 0+ chars other than a newline (.*).

If you plan to also handle Unicode whitespace, you need to replace [ \t] with [^\S\r\n\f\v] (and use the re.UNICODE flag).

Python demo:

import re
p = re.compile(r'^[ \t]*((?:class|def)[ \t].*)', re.MULTILINE)
s = "test.py \n\nimport os\nclass test():\n\n    def __init__(self):\n        pass\n\n    def add(num1, num2):\n        return num1+num2"
print(p.findall(s))
# => ['class test():', 'def __init__(self):', 'def add(num1, num2):']
Comments