user8358234 user8358234 - 3 years ago 120
Python Question

How can I create a python dictionary using values from a text file that I'm reading in

I have a file 'peaks_ee.xpk' and I'm trying to create a dictionary in my python code using the values in that file.

j = 0;
contents_atom = []
atom_lines=[]
with open ("peaks_ee.xpk","r") as atomName:
for name in atomName.readlines():
float_str = re.findall("\d\.H\d'?", name)
if (len(float_str)>1):
j = j+1
value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
atom_lines.insert(-1,value1)
tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
tclust_atom.write(value1)
tclust_atom.close()


I'm reading in the file peaks_ee.xpk. This is what peaks_ee.xpk looks like:

peaks_ee

This is a sample snippet from peaks_ee.xpk:

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0


I want to make a dictionary which takes in the atom name as the key. The atom name in peaks_ee.xpk are "1.H1'","2.H8", etc.. and I would like the value to be the chemical shifts which are for example "5.82020" and "7.61004" (this is coming from the 0 line in peaks_ee.xpk)
So for example, I would want the dictionary to look like:

dict = { "1.H1'":"5.82020", "2.H8":"7.61004"...}


But the next line repeats by having 2.H8 and 1.H1' again, so it doesn't need to be added to the dictionary. The line after that (line 2) should add to the dictionary because it has a new atom called 1.H8, so it should be

dict = {"1.H1'":"5.82020", "2.H8":"7.61004", "1.H8:8.13712", ...}


How can I do this?

Edit: If I have another file "ee_pinkH1.xpk" and I want to read it in and see if the chemical shift values from there are in a certain range, then print out those values, would this be the code?

This is my entire code:

import os
import sys
import re

i = 0;
contents_peak = []
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
for PPM in peakPPM.readlines():
float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
if (len(float_num)>1):
i=i+1
value = ('Peak ' + str(i) + ' '+ str(float_num[0])+ ' 0.05 ' + str(float_num[1])+ ' 0.05 ' + '\n')
peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write('rbclust \n')
for value in peak_lines:
tclust_peak.write(value)
tclust_peak.close()

j = 0;
contents_atom = []
atom_lines=[]
result = {}
with open ("peaks_ee.xpk","r") as atomName:
for name in atomName.readlines():
for match in rex.finditer(line):
name,shift = match.groups()
if name not in result:
result[name] = float(shift)
float_str = re.findall("\d\.H\d'?", name)
if (len(float_str)>1):
j = j+1
if peakPPM = 'ee_pinkH1.xpk':
if 5<=float_num<=6.25:
value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
atom_lines.insert(-1,value1)

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
tclust_atom.write(value1)
tclust_atom.close()

Answer Source

You can extend your regex pattern to include the chemical shift and get what you need in each match. Put parenthesis around the parts of the pattern you want to keep so they will be captured.

pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

Iterate over all the matches; the name and shift will be in the match.groups() tuple; if the name hasn't been seen yet add it to the dictionary.

with open(filepath) as atom_name:
    data = atom_name.read()
result = {}
for match in rex.finditer(data):
    name, shift = match.groups()
    #print(name,shift)
    if name not in result:
        result[name] = float(shift)

If the file is too big to read at once, extract the info one line at a time.

with open(filepath) as atom_name:
    for line in atom_name:
        for match in rex.finditer(line):
            name, shift = match.groups()
            #print(name,shift)
            if name not in result:
                result[name] = float(shift)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download