Bhishan Poudel Bhishan Poudel - 5 months ago 21
Python Question

How to split a datafile into multiple parts along with comments in each splitted files?

I have a datafile like this:

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2000 0.531000 0.0618000 0.938200
14.2000 0.532000 0.0790500 0.920950
14.2000 0.533000 0.0998900 0.900110
# it has lots of other lines
# datafile can be obtained from pastebin


The link to input datafile is:
http://pastebin.com/NaNbEm3E

I like to create 20 files from this input such that each files have the comments line.

That is :

#out1.txt
#comments
first part of one-twentieth data

# out2.txt
# given comments
second part of one-twentieth data

# and so on upto out20.txt


How can we do so in python?

My intitial attempt is like this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author : Bhishan Poudel
# Date : May 23, 2016


# Imports
from __future__ import print_function
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# read in comments from the file
infile = 'filecopy_multiple.txt'
outfile = 'comments.txt'
comments = []
with open(infile, 'r') as fi, open (outfile, 'a') as fo:
for line in fi.readlines():
if line.startswith('#'):
comments.append(line)
print(line)
fo.write(line)


#==============================================================================
# read in a file
#
infile = infile
colnames = ['angle', 'wave','trans','refl']
print('{} {} {} {}'.format('\nreading file : ', infile, '','' ))
df = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
comment='#',names=colnames,usecols=(0,1,2,3))
print('{} {} {} {}'.format('length of df : ', len(df),'',''))


# write 20 files
df = df
nfiles = 20
nrows = int(len(df)/nfiles)
groups = df.groupby( np.arange(len(df.index)) / nrows )
for (frameno, frame) in groups:
frame.to_csv("output_%s.csv" % frameno,index=None, header=None,sep='\t')


Till now I have twenty splitted files. I just want to copy the comments lines to each of the files. But the question is:
how to do so?


There should be some easier method than creating another 20 output files with comments only and appending twenty_splitted_files to them.

Some useful links are following:

How to split a dataframe column into multiple columns

How to split a DataFrame column in python

Split a large pandas dataframe

Answer

UPDATE: optimized code

with open('input.txt', 'r') as f:
    data = f.readlines()

comments_lines = 0
for line in data:
    if line.strip().startswith('#'):
        comments_lines += 1
    else:
        break

i=0
for x in range(0, len(data)-comments_lines, (len(data)-comments_lines)//20):
    with open('output_{:02d}.txt'.format(i), 'w') as f:
        f.write(''.join(data[:comments_lines] + data[comments_lines+x:comments_lines+x+20]))
        i += 1

Original answer:

comments = []
data = []

with open('input.txt', 'r') as f:
    data = f.readlines()

i = 0
for line in data:
        if line.strip().startswith('#'):
            comments.append(line)
            i += 1
        else:
            break

data[:] = data[i:]

i=0
for x in range(0, len(data), len(data)//20):
    with open('output_{:02d}.txt'.format(i), 'w') as f:
        f.write(''.join(comments + data[x:x+20]))
        i += 1