transposed messenger transposed messenger - 3 months ago 10
Python Question

What's a flexible way to represent log data in Python?

I have data of the form:

8/23/2016,

2:00pm-5:00pm; something1, something2, something3

And I would like to analyze the data and create graphs like "Total hours by month", "Frequency of something1 by time of day", etc. What would be the best way to work with this data to make the analysis part (with matplotlib) easier? My current idea is making a namedtuple from the collections library:

>>> aug23 = Date(datetime.now().strftime("%m%d"), hours = [14,15,16], activities=['running','jumping','sprinting'])
>>> aug23
Date(date='0823', hours=[14, 15, 16], activities=['running', 'jumping', 'sprinting'])


All the data is less than a year old, so the year isn't an important data attribute. There are some troubles with the data being am/pm time, but I have some functions that convert to military time and find the amount of time elapsed in a given entry. Furthermore, I'd like to make at least some basic comparison between the activities. I came across fuzzywuzzy here. Would that be appropriate for semantic similarity? Thanks for any help or ideas!

Answer

IMO, the best way to store data is in a matrix. It is also very simple to generate one from the data form you have.

Matrix

A matrix comes from linear algebra and is a two dimensional representation of data but, to keep it simple, lets just say that in python, a matrix is simply a list of lists.

This is how it would look in python:

matrix = [[1,2,3] , [4,5,6] , [7,8,9]]

In "real life" it would look something like this:

matrix =
1 2 3
4 5 6
7 8 9

You can access data from the matrix very easly by using simple slicing. As you can see, numbers have (y,x) coordinates (read from top to bottom and from left to right), which happen to be the same values each element has if you acces it thru slicing.

For example, to access number 8, you would use (2,1). In python, it would look like this:

matrix[2][1]

Notes: Remember, it's matrix[y][x]. Also remember, python starts counting from 0.


Analyzing Your Data

You can store your data in the matrix in any way you want. I suggest:

yourData =
date time something1 something2 something3
date time something1 something2 something3
date time something1 something2 something3

Which would look like this in python:

yourData = [[date1,time,something1,something2,something3] , [date2,time,something1,something2,something3] , [date3,time,something1,something2,something3]]

Or, to make it prettier (and easier to read):

yourData =
[[date1,time,something1,something2,something3],
[date2,time,something1,something2,something3],
[date3,time,something1,something2,something3]]

As you can see, the data for every date will be stored in the same y coordinate, and the specific information about that date is stored in the different x values.

For example, to access something3 of date3 you would simply use:

yourData[2][4]

Generating a Matrix from the data form you have

Your data is in CSV form (sort of). That meas that it is sepparated by comams, in fact, CSV stands for "Comma Sepparated Values".

This makes it very easy to generate a matrix from the data form you have. I suggest you use a for loop to do so.


Sources

Definition of matrix: https://www.quora.com/What-is-the-difference-between-an-array-and-a-matrix