Ryan.Patrick Ryan.Patrick - 4 months ago 12
Python Question

Rewriting .csv data to file creates artifacted first column in python

I'm taking data in csv from multiple REST endpoints, trying to add column headers to the data, and change the delimiters in the data from semicolons (;) to commas (,) so they can be formatted correctly by excel.

For note: x in the code below is a list of the link URIs and the ellipsis replaces the code used to discover and pull the links.

from bs4 import BeautifulSoup
import openpyxl
from html.parser import HTMLParser
from urllib.request import urlopen
from urllib import parse
import logging
import csv
...
...
for link in links:
linkContent = urlopen(link)
htmlBytes = linkContent.read()
htmlString = htmlBytes.decode("utf-8")
targetFile = open(str(x[link]) + '.csv', 'w')
targetFile.write('Date;StartTime;EndTime;Environment;Domain;DeployID;Module;Status;BuildVersion;DeployType;DeployStart\n')
targetFile.write(htmlString)
targetFile.close()

for file in x:
with open(str(x[file]) + '.csv', newline='') as csvFile:
reader = csv.reader(csvFile,delimiter=";")
data = [line for line in csvFile]

datalen= len(data)
for i in range(datalen):
data[i] = data[i].replace(";", ",")
data[i] = data[i].replace(" ","")

with open(str(x[file]) + '.csv', 'w') as csvFile:
w=csv.writer(csvFile,delimiter=',',quoting=csv.QUOTE_NONE,escapechar=' ')
w.writerows([data])


Unfortunately this produces an unnecessary comma before every row following the first and I'm not sure why.

i.e.

Date ,StartTime ,EndTime ...

, 2016-07-12 ,03:11 ,03:21, ...

This has excel display all blanks for the first row, and starts displaying the data in the row after. Essentially shifting all data one row to the right.

There's probably a few unnecessary parameters and/or arguments in the code from my trying to get it to parse correctly.

Thanks in advance.

Answer

Manually adding , to the file will cause column alignment problems as the manually inserted ,s will clash with the ones inserted by the csv.writer as delimiter.

Changing the delimiter from ; to , will not require an extra replace(";", ","). It is sufficient to change the delimiters in the csv.reader and writer objects.

The following should do what you intend:

for file in x:
    with open(str(x[file]) + '.csv', newline='') as csvFile:
        reader = csv.reader(csvFile, delimiter=";")
        data = [row for row in reader]
        #                      ^^^^^^ use reader not csvFile

    with open(str(x[file]) + '.csv', 'w') as csvFile:
        w=csv.writer(csvFile, delimiter=',', quoting=csv.QUOTE_NONE,escapechar=' ')
        w.writerows(data)