I have a data set, and I want to grab certain aspects of the data. For the first line and the first word if it is equal to
regex = re.compile(r'\A([A-Z][a-z][A-Z]\w*[-]\w*')
AbD000000-10
DeD000000-10
888888-10
-------------------------------------------------------------------------------
AbD000000-10
Issue 1
Issue 2 Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
-------------------------------------------------------------------------------
888888-10
Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
-------------------------------------------------------------------------------
DeD000000-10
Issue 1
Issue 2 Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
AbD000000-10
Issue 1
Issue 2 Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
-------------------------------------------------------------------------------
DeD000000-10
Issue 1
Issue 2 Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
-------------------------------------------------------------------------------
I think your regex is broken (that \A
doesn't belong).
In this approach, I assume that the separator will always be the same. I assume you don't want to break the blocks down any further. This grabs only the blocks you want. You can format them however is convenient (including printing the separator back out when you print the blocks).
import re
r = re.compile(r'([A-Z][a-z][A-Z]\w*[-]\w*')
sep = "#-------------------------------------------------------------------------------#"
input_text = """
#------------------------------------------------------------------------------#
AbD000000-10
Issue 1
Issue 2 Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
#------------------------------------------------------------------------------#
888888-10
Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
#------------------------------------------------------------------------------#
DeD000000-10
Issue 1
Issue 2 Q Q Q
ID: 2 MsEhdiehsla2 MsEhasdhsla2 hiGndiehsla2
ID: 3
#------------------------------------------------------------------------------#
"""
s = input_text.split(sep)
keep = [x for x in s if re.search(r , x)]
for v in keep:
print(v)
Really, though, if you can help it, it would be good to consume this data from a better source. If this is a log file, you may not have a lot of control over it. But if you can, see if you can get a cleaner source of the data (csv maybe?).