lupejuares lupejuares - 8 days ago 6
Python Question

matplotlib graphing data problems

i just finished my prediction for a project i was working. I want to try and use a graph for some visualization , but im having trouble finding one thats suitable , the data i have is pretty big. Ill leave my code and an example of what my results look like for 1 column at the bottom. this is only 1 column , i would like to graph 1 column first to see how it works. ive tried to use a bar graph , and it comes out kind of weird, just one whole solid blue bar. so im not sure what graphs are good for this kind of info.

reading in the test and target data



training and test must match column wise



train=pd.read_csv('C:/Users/Michael/Desktop/train.csv/train.csv',parse_dates = ['Dates'])
test=pd.read_csv('C:/Users/Michael/Desktop/test.csv/test.csv',parse_dates = ['Dates'])

# TRAINING data
#Convert crime labels to numbers
df_crime = preprocessing.LabelEncoder()
crime = df_crime.fit_transform(train.Category)
#Get binarized weekdays, districts, and hours using dummy variables
days = pd.get_dummies(train.DayOfWeek)
district = pd.get_dummies(train.PdDistrict)
hour = train.Dates.dt.hour
hour = pd.get_dummies(hour)
#Build new array
train_data = pd.concat([hour, days, district], axis=1)
train_data['crime']=crime
#train_data.head()

#Repeat for test data
days = pd.get_dummies(test.DayOfWeek)
district = pd.get_dummies(test.PdDistrict)

hour = test.Dates.dt.hour
hour = pd.get_dummies(hour)

test_data = pd.concat([hour, days, district], axis=1)

features = ['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday',
'Wednesday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']

training, testing = train_test_split(train_data, train_size=.60)





#bernoulliNB
# predicting only on the training data
model_B = BernoulliNB()
model_B.fit(training[features], training['crime'])
predicted2 = np.array(model_B.predict_proba(testing[features]))
log_loss(testing['crime'], predicted2)
# predictingon the test data, using bernoulli model
predicted3 = model_B.predict_proba(test_data[features])

#Write results
result=pd.DataFrame(predicted3, columns=df_crime.classes_)

# this is an example of 1 of my columns that i would like to graph
result['SUICIDE']
0 0.000432
1 0.000432
2 0.000760
3 0.000903
4 0.000903
5 0.001089
6 0.000903
7 0.000903
8 0.000550
9 0.000744
10 0.000903
11 0.000550
12 0.000550
13 0.000744
14 0.000744
15 0.000219
16 0.001089
17 0.000903
18 0.000760
19 0.000760
20 0.000760
21 0.000550
22 0.000744
23 0.000903
24 0.000760
25 0.000787
26 0.000760
27 0.000265
28 0.000903
29 0.001089

Answer

You're very vague in terms of what you expect the output to be, but I think you should check out the seaborn package, and particularly the tutorial section on visualising univariate datasets, which should give you a couple of examples and ideas as to what you can do to visualise your outputs.

Comments