I have recently started my job as an ETL Developer and as a part of my exercise, I am extracting data from a text file containing raw data. My raw data looks like this as shown in the image.
My Raw Data
Now I want to add delimiters to my data file. Basically after every line, I want to add a comma (
with open ('new_locations.txt', 'w') as output:
with open('locations.txt', 'r') as input:
for line in input:
new_line = line+','
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3724: character maps to
Note: The characters in raw data are not all ASCII characters. Some are Latin characters as well.
When you open a file in python 3 in "text" mode then reading and writing convert the bytes in the file to python (unicode) strings. The default encoding is platform dependent, but is usually UTF-8.
If you file uses latin-1 encoding, you should open with
with open('locations.txt', 'r', encoding='latin_1') as input
You should probably also do this with the output if you want the output also to be in latin-1.
In the longer term, you should probably consider converting all your data to a unicode format in the data files.