ukbaz ukbaz - 1 month ago 7
Python Question

Log file to Pandas Dataframe

I have log files, which have many lines in the form of :

LogLevel [13/10/2015 00:30:00.650] [Message Text]


My goal is to convert each line in the log file into a nice Data frame. I have tired to do that, by splitting the lines on the [ character, however I am still not getting a neat dataframe.

My code:

level = []
time = []
text = []

with open(filename) as inf:
for line in inf:
parts = line.split('[')
if len(parts) > 1:
level = parts[0]
time = parts[1]
text = parts[2]
print (parts[0],parts[1],parts[2])

s1 = pd.Series({'Level':level, 'Time': time, 'Text':text})
df = pd.DataFrame(s1).reset_index()


Heres my printed Data frame:

Info 10/08/16 10:56:09.843] In Function CCatalinaPrinter::ItemDescription()]

Info 10/08/16 10:56:09.843] Sending UPC Description Message ]


How can I improve this to strip the whitespace and the other ']' character

Thank you

Answer

You can use read_csv with separator \s*\[ - whitespaces with [:

import pandas as pd
from pandas.compat import StringIO

temp=u"""LogLevel    [13/10/2015 00:30:00.650]  [Message Text]
LogLevel    [13/10/2015 00:30:00.650]  [Message Text]
LogLevel    [13/10/2015 00:30:00.650]  [Message Text]
LogLevel    [13/10/2015 00:30:00.650]  [Message Text]"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep="\s*\[", names=['Level','Time','Text'], engine='python')

Then remove ] by strip and convert column Time to_datetime:

df.Time = pd.to_datetime(df.Time.str.strip(']'), format='%d/%m/%Y %H:%M:%S.%f')
df.Text = df.Text.str.strip(']')

print (df)
      Level                    Time          Text
0  LogLevel 2015-10-13 00:30:00.650  Message Text
1  LogLevel 2015-10-13 00:30:00.650  Message Text
2  LogLevel 2015-10-13 00:30:00.650  Message Text
3  LogLevel 2015-10-13 00:30:00.650  Message Text

print (df.dtypes)
Level            object
Time     datetime64[ns]
Text             object
dtype: object