Gaius Augustus Gaius Augustus - 1 month ago 9
Python Question

Python3 - Loop over rows, then some columns to print text

I have a pandas dataframe that looks something like this:

0 1 2 3 \
0 UICEX_0001 path/to/bam_T.bam path/to/bam_N.bam chr1:10000
1 UICEX_0002 path/to/bam_T2.bam path/to/bam_N2.bam chr54:4958392

4
0 chr4:4958392
1 NaN


I am trying to loop through each row and print text to output for another program. I need to print the first three columns (with some other text), then go through the rest of the columns and print something different depending on if they are NaN or not.

This works mostly:

Current code

def CreateIGVBatchScript(x):
for row in x.iterrows():
print("\nnew")
sample = x[0]
bamt = x[1]
bamn = x[2]
print("\nload", bamt.to_string(index=False), "\nload", bamn.to_string(index=False))
for col in range(3, len(x.columns)):
position = x[col]
if position.isnull().values.any():
print("\n")
else:
position = position.to_string(index=False)
print("\ngoto ", position, "\ncollapse\nsnapshot ", sample.to_string(index=False), "_", position,".png\n")

CreateIGVBatchScript(data)


but the output looks like this:

Actual Output

new
load path/to/bam_T.bam
path/to/bam_T2.bam
load path/to/bam_N.bam
path/to/bam_N2.bam

goto chr1:10000
chr54:4958392
collapse
snapshot UICEX_0001 **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png

new

load path/to/bam_T.bam
path/to/bam_T2.bam
load path/to/bam_N.bam
path/to/bam_N2.bam

goto chr1:10000
chr54:4958392
collapse
snapshot UICEX_0001 **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png


The first part seems fine, but when I start to iterate over the columns, all rows are printed. I can't seem to figure out how to fix this. Here's what I want one of those parts to look like:

Partial Wanted Output

goto chr1:10000
collapse
snapshot UICEX_0001_chr1:10000.png
goto chr54:4958392
collapse
snapshot UICEX_0001_chr54:495832.png


Extra information
Incidentally, I'm actually trying to adapt this from an R script in order to better learn Python. Here's the R code, in case that helps:

CreateIGVBatchScript <- function(x){
for(i in 1:nrow(x)){
cat("\nnew")
sample = as.character(x[i, 1])
bamt = as.character(x[i, 2])
bamn = as.character(x[i, 3])
cat("\nload",bamt,"\nload",bamn)
for(j in 4:ncol(x)){
if(x[i, j] == "" | is.na(x[i, j])){ cat("\n") }
else{
cat("\ngoto ", as.character(x[i, j]),"\ncollapse\nsnapshot ", sample, "_", x[i,j],".png\n", sep = "")
}
}
}
cat("\nexit")
}
CreateIGVBatchScript(data)

Answer

I've come up with the answer. There are a few problems here:

  1. I was using iterrows() incorrectly.

The iterrows object actually holds the information from the rows, and then you can use the index to save values from that Series.

for index, row in x.iterrows():
    sample = row[0]

will save the value in that row in column 0.

  1. Iterate over the columns

At this point, you can use a simple for loop, as I was doing to iterate over the columns.

for col in range(3, len(data.columns)):
    position = row[col]

lets you save a value from that column.

The final Python code is:

def CreateIGVBatchScript(x):
    x=x.fillna(value=999)
    for index, row in x.iterrows():
        print("\nnew", sep="")
        sample = row[0]
        bamt = row[1]
        bamn = row[2]
        print("\nload ", bamt, "\nload ", bamn, sep="")
        for col in range(3, len(data.columns)):
            position = row[col]
            if position == 999:
                print("\n")
            else:
                print("\ngoto ", position, "\ncollapse\nsnapshot ", sample, "_", position, ".png\n", sep="")

CreateIGVBatchScript(data)

Answers were guided by the following posts: