erhan erhan - 1 month ago 13
Python Question

how to edit a specific column of a file using python

I have multiple input files as the below. I need to edit "atom" column (5th column) by deleting the number that presents at the end of atom names. I don't know how to complete my code. How can I do it?

The code:

with open('input.txt', mode='r') as f:
for lines in f:
columns = lines.split()


Input file:

[ atomtypes ]
;name bond_type mass charge ptype sigma epsilon Amb
br br 0.00000 0.00000 A 3.59923e-01 1.75728e+00 ; 2.02 0.4200
cl cl 0.00000 0.00000 A 3.47094e-01 1.10876e+00 ; 1.95 0.2650
s s 0.00000 0.00000 A 3.56359e-01 1.04600e+00 ; 2.00 0.2500
p5 p5 0.00000 0.00000 A 3.74177e-01 8.36800e-01 ; 2.10 0.2000
os os 0.00000 0.00000 A 3.00001e-01 7.11280e-01 ; 1.68 0.1700
ca ca 0.00000 0.00000 A 3.39967e-01 3.59824e-01 ; 1.91 0.0860
c3 c3 0.00000 0.00000 A 3.39967e-01 4.57730e-01 ; 1.91 0.1094
ha ha 0.00000 0.00000 A 2.59964e-01 6.27600e-02 ; 1.46 0.0150
h1 h1 0.00000 0.00000 A 2.47135e-01 6.56888e-02 ; 1.39 0.0157

[ moleculetype ]
;name nrexcl
LIG 3

[ atoms ]
; nr type resi res atom cgnr charge mass ; qtot bond_type
1 br 1 LIG BR1 1 -0.040100 79.90000 ; qtot -0.040
2 cl 1 LIG CL1 2 -0.040400 35.45000 ; qtot -0.081
3 cl 1 LIG CL2 3 -0.046400 35.45000 ; qtot -0.127
4 s 1 LIG S1 4 -0.576001 32.06000 ; qtot -0.703
5 p5 1 LIG P1 5 1.207199 30.97000 ; qtot 0.504
6 os 1 LIG O1 6 -0.442500 16.00000 ; qtot 0.062
7 os 1 LIG O2 7 -0.517201 16.00000 ; qtot -0.455
8 os 1 LIG O3 8 -0.517201 16.00000 ; qtot -0.973
9 ca 1 LIG C1 9 0.143100 12.01000 ; qtot -0.830
10 ca 1 LIG C2 10 0.012400 12.01000 ; qtot -0.817
11 ca 1 LIG C3 11 -0.127000 12.01000 ; qtot -0.944
12 ca 1 LIG C4 12 0.045400 12.01000 ; qtot -0.899
13 ca 1 LIG C5 13 -0.082000 12.01000 ; qtot -0.981
14 ca 1 LIG C6 14 -0.019900 12.01000 ; qtot -1.001
15 c3 1 LIG C7 15 0.125200 12.01000 ; qtot -0.875
16 c3 1 LIG C8 16 0.125200 12.01000 ; qtot -0.750
17 ha 1 LIG H1 17 0.178000 1.00800 ; qtot -0.572
18 ha 1 LIG H2 18 0.174000 1.00800 ; qtot -0.398
19 h1 1 LIG H3 19 0.066367 1.00800 ; qtot -0.332
20 h1 1 LIG H4 20 0.066367 1.00800 ; qtot -0.265
21 h1 1 LIG H5 21 0.066367 1.00800 ; qtot -0.199
22 h1 1 LIG H6 22 0.066367 1.00800 ; qtot -0.133
23 h1 1 LIG H7 23 0.066367 1.00800 ; qtot -0.066
24 h1 1 LIG H8 24 0.066367 1.00800 ; qtot -0.000

[ bonds ]
; ai aj funct r k
1 14 1 1.8970e-01 2.2560e+05 ; BR1 - C6
2 10 1 1.7290e-01 2.7012e+05 ; CL1 - C2
3 12 1 1.7290e-01 2.7012e+05 ; CL2 - C4
4 5 1 1.9220e-01 2.0987e+05 ; S1 - P1
5 6 1 1.6020e-01 2.8660e+05 ; P1 - O1
5 7 1 1.6020e-01 2.8660e+05 ; P1 - O2
5 8 1 1.6020e-01 2.8660e+05 ; P1 - O3
6 9 1 1.3730e-01 3.1162e+05 ; O1 - C1
7 15 1 1.4390e-01 2.5230e+05 ; O2 - C7
8 16 1 1.4390e-01 2.5230e+05 ; O3 - C8
9 10 1 1.3870e-01 4.0033e+05 ; C1 - C2
9 11 1 1.3870e-01 4.0033e+05 ; C1 - C3
10 13 1 1.3870e-01 4.0033e+05 ; C2 - C5
11 12 1 1.3870e-01 4.0033e+05 ; C3 - C4
11 17 1 1.0870e-01 2.8811e+05 ; C3 - H1
12 14 1 1.3870e-01 4.0033e+05 ; C4 - C6
13 14 1 1.3870e-01 4.0033e+05 ; C5 - C6
13 18 1 1.0870e-01 2.8811e+05 ; C5 - H2
15 19 1 1.0930e-01 2.8108e+05 ; C7 - H3
15 20 1 1.0930e-01 2.8108e+05 ; C7 - H4
15 21 1 1.0930e-01 2.8108e+05 ; C7 - H5
16 22 1 1.0930e-01 2.8108e+05 ; C8 - H6
16 23 1 1.0930e-01 2.8108e+05 ; C8 - H7
16 24 1 1.0930e-01 2.8108e+05 ; C8 - H8


Desired output file:

[ atomtypes ]
;name bond_type mass charge ptype sigma epsilon Amb
br br 0.00000 0.00000 A 3.59923e-01 1.75728e+00 ; 2.02 0.4200
cl cl 0.00000 0.00000 A 3.47094e-01 1.10876e+00 ; 1.95 0.2650
s s 0.00000 0.00000 A 3.56359e-01 1.04600e+00 ; 2.00 0.2500
p5 p5 0.00000 0.00000 A 3.74177e-01 8.36800e-01 ; 2.10 0.2000
os os 0.00000 0.00000 A 3.00001e-01 7.11280e-01 ; 1.68 0.1700
ca ca 0.00000 0.00000 A 3.39967e-01 3.59824e-01 ; 1.91 0.0860
c3 c3 0.00000 0.00000 A 3.39967e-01 4.57730e-01 ; 1.91 0.1094
ha ha 0.00000 0.00000 A 2.59964e-01 6.27600e-02 ; 1.46 0.0150
h1 h1 0.00000 0.00000 A 2.47135e-01 6.56888e-02 ; 1.39 0.0157

[ moleculetype ]
;name nrexcl
LIG 3

[ atoms ]
; nr type resi res atom cgnr charge mass ; qtot bond_type
1 br 1 LIG BR 1 -0.040100 79.90000 ; qtot -0.040
2 cl 1 LIG CL 2 -0.040400 35.45000 ; qtot -0.081
3 cl 1 LIG CL 3 -0.046400 35.45000 ; qtot -0.127
4 s 1 LIG S 4 -0.576001 32.06000 ; qtot -0.703
5 p5 1 LIG P 5 1.207199 30.97000 ; qtot 0.504
6 os 1 LIG O 6 -0.442500 16.00000 ; qtot 0.062
7 os 1 LIG O 7 -0.517201 16.00000 ; qtot -0.455
8 os 1 LIG O 8 -0.517201 16.00000 ; qtot -0.973
9 ca 1 LIG C 9 0.143100 12.01000 ; qtot -0.830
10 ca 1 LIG C 10 0.012400 12.01000 ; qtot -0.817
11 ca 1 LIG C 11 -0.127000 12.01000 ; qtot -0.944
12 ca 1 LIG C 12 0.045400 12.01000 ; qtot -0.899
13 ca 1 LIG C 13 -0.082000 12.01000 ; qtot -0.981
14 ca 1 LIG C 14 -0.019900 12.01000 ; qtot -1.001
15 c3 1 LIG C 15 0.125200 12.01000 ; qtot -0.875
16 c3 1 LIG C 16 0.125200 12.01000 ; qtot -0.750
17 ha 1 LIG H 17 0.178000 1.00800 ; qtot -0.572
18 ha 1 LIG H 18 0.174000 1.00800 ; qtot -0.398
19 h1 1 LIG H 19 0.066367 1.00800 ; qtot -0.332
20 h1 1 LIG H 20 0.066367 1.00800 ; qtot -0.265
21 h1 1 LIG H 21 0.066367 1.00800 ; qtot -0.199
22 h1 1 LIG H 22 0.066367 1.00800 ; qtot -0.133
23 h1 1 LIG H 23 0.066367 1.00800 ; qtot -0.066
24 h1 1 LIG H 24 0.066367 1.00800 ; qtot -0.000

[ bonds ]
; ai aj funct r k
1 14 1 1.8970e-01 2.2560e+05 ; BR1 - C6
2 10 1 1.7290e-01 2.7012e+05 ; CL1 - C2
3 12 1 1.7290e-01 2.7012e+05 ; CL2 - C4
4 5 1 1.9220e-01 2.0987e+05 ; S1 - P1
5 6 1 1.6020e-01 2.8660e+05 ; P1 - O1
5 7 1 1.6020e-01 2.8660e+05 ; P1 - O2
5 8 1 1.6020e-01 2.8660e+05 ; P1 - O3
6 9 1 1.3730e-01 3.1162e+05 ; O1 - C1
7 15 1 1.4390e-01 2.5230e+05 ; O2 - C7
8 16 1 1.4390e-01 2.5230e+05 ; O3 - C8
9 10 1 1.3870e-01 4.0033e+05 ; C1 - C2
9 11 1 1.3870e-01 4.0033e+05 ; C1 - C3
10 13 1 1.3870e-01 4.0033e+05 ; C2 - C5
11 12 1 1.3870e-01 4.0033e+05 ; C3 - C4
11 17 1 1.0870e-01 2.8811e+05 ; C3 - H1
12 14 1 1.3870e-01 4.0033e+05 ; C4 - C6
13 14 1 1.3870e-01 4.0033e+05 ; C5 - C6
13 18 1 1.0870e-01 2.8811e+05 ; C5 - H2
15 19 1 1.0930e-01 2.8108e+05 ; C7 - H3
15 20 1 1.0930e-01 2.8108e+05 ; C7 - H4
15 21 1 1.0930e-01 2.8108e+05 ; C7 - H5
16 22 1 1.0930e-01 2.8108e+05 ; C8 - H6
16 23 1 1.0930e-01 2.8108e+05 ; C8 - H7
16 24 1 1.0930e-01 2.8108e+05 ; C8 - H8

Answer

You can use this regular expression to remove any number at the end of the string

re.sub("[0-9]+$","", string)

For example:

import re
x = 'atom653'
print(re.sub("[0-9]+$","", x))

Output:

atom

You can slightly modify the code you posted to convert your input file to a nested list

input_list = []
with open('input.txt', mode='r') as f:
    for lines in f:
        columns = lines.split()
        input_list.append(columns)

You can then apply our regular expression to the 4th element of each list within your input_list

for sublist in input_list:
    sublist[4] = re.sub("[0-9]+$","", sublist[4])

As a side note, consider using pandas to work with tabular input in Python.

EDIT: The code below should work with the new version of your input

# Read the input
input_list = []
with open('input.txt', mode='r') as f:
    for lines in f:
        columns = lines.split()
        input_list.append(columns)
print(input_list)

# Extract the lines before and after the table with atoms
atoms_start = input_list.index(['[', 'atoms', ']'])
atoms_end = input_list.index(['[', 'bonds', ']'])-1

before_atoms = input_list[:atoms_start]
after_atoms = input_list[atoms_end:]
atoms = input_list[atoms_start:atoms_end]

# Modify the atoms table
for sublist in atoms[1:]:
    sublist[4] = re.sub("[0-9]+$","", sublist[4])

# Merge all parts back
final_list = before_atoms+atoms+after_atoms

EDIT 2: To save the file (from here)

with open('fname.txt', 'w') as file:
    file.writelines('\t'.join(i) + '\n' for i in final_list)
Comments