Bhishan Poudel Bhishan Poudel - 5 months ago 14
Python Question

How to split a dataframe into multiple parts with copying comments in python pandas?

I have a datafile like this:

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.531000 0.0618000 0.938200
14.2000 0.532000 0.0790500 0.920950
14.2000 0.533000 0.0998900 0.900110
# it has lots of other lines
# datafile can be obtained from pastebin


The link to input datafile is:
http://pastebin.com/NaNbEm3E

I like to create 20 files from this input such that each files have the comments line.

That is :

#out1.txt
#comments
first part of one-twentieth data

# out2.txt
# given comments
second part of one-twentieth data

# and so on upto out20.txt


How can we do so in python?

My intitial attempt is like this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author : Bhishan Poudel
# Date : May 23, 2016


# Imports
from __future__ import print_function
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# read in comments from the file
infile = 'filecopy_multiple.txt'
outfile = 'comments.txt'
comments = []
with open(infile, 'r') as fi, open (outfile, 'a') as fo:
for line in fi.readlines():
if line.startswith('#'):
comments.append(line)
print(line)
fo.write(line)


#==============================================================================
# read in a file
#
infile = infile
colnames = ['angle', 'wave','trans','refl']
print('{} {} {} {}'.format('\nreading file : ', infile, '','' ))
df = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
comment='#',names=colnames,usecols=(0,1,2,3))
print('{} {} {} {}'.format('length of df : ', len(df),'',''))


# write 20 files
df = df
nfiles = 20
nrows = int(len(df)/nfiles)
groups = df.groupby( np.arange(len(df.index)) / nrows )
for (frameno, frame) in groups:
frame.to_csv("output_%s.csv" % frameno,index=None, header=None,sep='\t')


Till now I have twenty splitted files. I just want to copy the comments lines to each of the files. But the question is:
how to do so?


There should be some easier method than creating another 20 output files with comments only and appending twenty_splitted_files to them.

Some useful links are following:

How to split a dataframe column into multiple columns

How to split a DataFrame column in python

Split a large pandas dataframe

Answer

This ought to do it

# Store comments in this to use for all files
comments = []

# Create a new sub list for each of the 20 files
data = []
for _ in range(20):
    data.append([])

# Track line number
index = 0

# open input file
with open('input.txt', 'r') as fi:
    # fetch all lines at once so I can count them.
    lines = fi.readlines()

    # Loop to gather initial comments
    line = lines[index]
    while line.split()[0] == '#':
        comments.append(line)
        index += 1
        line = lines[index]

    # Calculate how many lines of data
    numdata = len(lines) - len(comments)

    for i in range(index, len(lines)):
        # Calculate which of the 20 files I'm working with
        filenum = (i - index) * 20 / numdata
        # Append line to appropriately tracked sub list
        data[filenum].append(lines[i])

for i in range(1, len(data) + 1):
    # Open output file
    with open('output{}.txt'.format(i), 'w') as fo:
        # Write comments
        for c in comments:
            fo.write(c)
        # Write data
        for line in data[i-1]:
            fo.write(line)