DBS DBS - 7 months ago 11
Python Question

Python: Append first match of filename to a row in csv file

Update for clarity: I'm trying to append the value of the first match of a file name to a csv file. I would like to append the first

fname
match in
file_label2
used to apply the
found
value to the
Suggested Label
row. This information is retrieved from GitHub using github3.py.

In the code I have below, I do not receive an error, but I don't think it's the right way to accomplish getting the first file name match.

Sample output returned from GitHub:

PR Number: 123
Login: dbs
Files:
files/file-folder/media/figure01
file_label2 = figure01
files/file-folder/jsfile-to-checkin
file_label2 = jsfile
Suggested Label: Value1
PR Number: 567
Login: dba
Files:
files/file-folder/media/figure01
file_label2 = figure01
files/file-folder/csfile-to-checkin
file_label2 = csfile
Suggested Label: Value2


Desired csv output:

PR Number, Login, First File Found, Suggested Label
123,dbs,files/file-folder/jsfile-to-checkin, Value1
567,dba,files/file-folder/csfile-to-checkin, Value2


List used to match fname prefix after file split:

list1=["jsfile","csfile"]
list2=["css","html"]


Code:

with open(inputFile,'w') as f:
for prs in repo.pull_requests():
getlabels = repo.issue(prs.number).as_dict()

labels = [labels['name'] for labels in getlabels['labels']]
tags = ["Bug", "Blocked", "Investigate"]
enterprisetag = [tagsvalue for tagsvalue in labels if tagsvalue in tags]
found = "No file match"
if enterprisetag:
pass
else:
f.write("PR Number: %s" %getlabels['number'] + '\n' + "Login: %s" %getlabels['user']['login'] + '\n' + "Files: \n")
for data in repo.pull_request(prs.number).files():
fname, extname = os.path.splitext(data.filename)
f.write(fname+'\n')
file_label = fname.rsplit('/',1)[-1]
if file_label.count("-") == 1:
file_label2 = file_label.split("-")[0]
f.write("file_label2: %s" %file_label2 + '\n')
else:
file_label2 = "-".join(file_label.split("-",2)[:2])
f.write("file_label2: %s" %file_label2 + '\n')

if [emlabel for emlabel in list1 if emlabel in file_label2]:
found = "Value1"
break
elif [mk_label for mk_label in list2 if mk_label in file_label2]:
found = "Value2"
break
else:
found = (str(None))

f.write("Suggested Label: %s" %found + '\n')

prNum, login, firstFileFound, label = None,None,None,None
multiLineFlag = False

with open(outputFile, 'w') as w:
w.write("PR Number, Login, First File Found, Suggested Label\n")
for line in open(inputFile):
line = line.strip()
if multiLineFlag and not(firstFileFound):
if line.startswith('file_label') and any(fileType in line for fileType in enterprise_mobility + marketplace + modern_apps + pnp + tdc + tdc_abr + unlock_insights):
firstFileFound = prevLine
multiLineFlag = False
else:
prevLine = line

if not multiLineFlag:
if line.startswith('PR Number: '):
prNum = line[len('PR Number: '):]
elif line.startswith('Login: '):
login = line[len('Login: '):]
elif line.startswith('Suggested Label: '):
label = line[len('Suggested Label: '):]

elif line.startswith('Files:'):
multiLineFlag = True

if all([prNum, login, firstFileFound, label]):
w.write("%s,%s,%s,%s\n" %(prNum, login, firstFileFound, label))
prNum, login, firstFileFound, label = None,None,None,None

Answer

The general idea is to separate data that is spread over multiple lines or a single-line, you scan for individual properties. Once they all are found, you start over with the next record.

prNum, login, firstFileFound, label = None,None,None,None
multiLineFlag = False
list1 = ["jsfile","csfile"]
inputFile = '' # Provide your input filename here
outputFile = '' # Provide your output filename here
labelFound = False
with open(outputFile, 'w') as w:
    w.write("PR Number, Login, First File Found, Suggested Label\n")
    for line in open(inputFile):
        line = line.strip()
        if multiLineFlag and not(firstFileFound):
            if line.startswith('file_label') and any(fileType in line for  fileType in list1):
                firstFileFound = prevLine
                multiLineFlag = False
            else:
                prevLine = line

        if not multiLineFlag:
            if line.startswith('PR Number:'):
                prNum = line[len('PR Number: '):]
            elif line.startswith('Login:'):
                login = line[len('Login: '):]
            elif line.startswith('Suggested Label:'):
                labelFound = True
                label = line[len('Suggested Label: '):]
                print "label is %s "%label

            elif line.startswith('Files:'):
                multiLineFlag = True

        if all([prNum, login, firstFileFound, labelFound]):
            w.write("%s,%s,%s,%s\n" %(prNum, login, firstFileFound, label))
            prNum, login, firstFileFound, label = None,None,None,None
            labelFound=False

The following will work if a number of assumptions regarding your data are true.

So, for an input file that looks like:

PR Number: 123
Login: dbs
Files:
files/file-folder/media/figure01
file_label2 = figure01
files/file-folder/jsfile-to-checkin
file_label2 = jsfile
Suggested Label: Value1
PR Number: 423
Login: ddo
Files:
files/file-folder/media/figure01
file_label2 = figure01
files/file-folder/csfile2-to-checkin
file_label2 = csfile
Suggested Label:
PR Number: 567
Login: dba
Files:
files/file-folder/media/figure01
file_label2 = figure01
files/file-folder/csfile-to-checkin
file_label2 = csfile
Suggested Label: Value2

this will return:

PR Number, Login, First File Found, Suggested Label
123,dbs,files/file-folder/jsfile-to-checkin, Value1
423,ddo,files/file-folder/csfile2-to-checkin,
567,dba,files/file-folder/csfile-to-checkin, Value2

Adjustments may be necessary to cover for edge conditions.