Roy Roy - 17 days ago 6
Python Question

Python - Writing data to a data frame is rewriting the empty sets from the next available location

I have a large text file of almost 600 properties. I am trying to convert these into a csv file. So, I loaded the text file and created a data frame and loaded them into a csv file. Below is the list of my data which has 2 properties( I do have missing values in my property set - here price per quarter is missing in the 2nd set),

Name of the Property : North Kensington Upcycling Store and Cafe
Availability : Now
Area : 1,200 sqft
Retail Type : No
Bar & Restaurant Type : No
Event Type : Yes
Shop Share Type : No
Unique Type : No
Price Per Day : £360
Price Per Week : £1,260
Price Per Month : £5,460
Price Per Quarter : £16,380
Price Per Year : £65,520
[Latitude, Longitude] : [51.5235108631773, -0.206594467163086]
Name of the Property : Old Charlton Pub
Availability : Now
Area : 1,250 sqft
Retail Type : No
Bar & Restaurant Type : Yes
Event Type : No
Shop Share Type : No
Unique Type : No
Price Per Day : £70
Price Per Week : £490
Price Per Month : £2,129
Price Per Year : £25,550
[Latitude, Longitude] : [51.4926332979245, 0.0449645519256592]


This is the code that I wrote -

import pandas
import csv

txt_file = r"sa4.txt"
txt = open(txt_file, "r")
txt_string = txt.read()
txt_lines = txt_string.split("\n")
txt_dict = {}

for txt_line in txt_lines:
k,v = txt_line.split(":")
k = k.strip()
v = v.strip()
if k in txt_dict:
list = txt_dict.get(k)
else:
list = []
list.append(v)
txt_dict[k]=list
print(df)
df.to_csv("MYFILE2.csv")


and this is my output csv file - Picture I don't know why that value in Price per Quarter for the 2nd property came from the 5th property price per quarter [ which is the next available location ]? IT has to be NULL, but it has become £135000 . Can anyone see the problem in my code? Thanks in advance.

Answer

That is simply because your list does not have a placeholder for Price per Quarter when it goes missing. To solve it, you will have to keep track of the property record number. Something like this should work for you:

import pandas as pd

txt_file = r"sa4.txt"
txt = open(txt_file, "r")
txt_string = txt.read()
txt_lines = txt_string.split("\n")
df = pd.DataFrame()
idx = -1 # This will make sense in the `if` block below

for txt_line in txt_lines:
    k,v = txt_line.split(":")
    k = k.strip()
    v = v.strip()
    if k == 'Name of the Property':
        idx += 1 # Now, idx will be 0 for the first run
    df.loc[idx, k] = v
print(df)
df.to_csv("MYFILE2.csv")
Comments