Felipe Lira Felipe Lira - 9 days ago 7
Python Question

Eliminate text after last string in python script

To create a dictionary from a table file, where some columns have special characters, as "|" but I need to eliminate all the text after the last "|".

For example:

A this_is|my_A|best|result| 20
B this_is|my_B|best|result|mess 40
C this_is|my_C|best|result|me.. 32


I wrote this to create the dictionary:

for line in file:
query = line.strip().split('\t')[0]
data = line.strip().split('\t')[1:2]
subject = line.strip().split('\t')[1]
if query not in best_hit:
best_hit[subject] = data


Resulting in a mess dictionary like this:

d = {'A': 'this_is|my_A|best|result|, 20' ,'B': 'this_is|my_B|best|result|mess 40', 'C':'this_is|my_C|best|result|me.. 32' }


My intention is to eliminate the "mess" and "me.." text before to include it in the dictionary because I need this value to compare with other list without these texts.

A this_is|my_A|best|result| 20
B this_is|my_B|best|result| 40
C this_is|my_C|best|result| 32


My own solution:

old_result = line.strip().split('\t')[1]
new_result = old_result.split('|')
subject = new_result[0]+'|'+new_result[1]+'|'+new_result[2]+'|'+new_result[3]+'|

Answer

Split might be not performance-wise but the algorithm is simpler this way.

source = """

A   this_is|my_A|best|result|   20
B   this_is|my_B|best|result|mess   40
C   this_is|my_C|best|result|me..   32

"""

source = source.strip()
source = source.split('\n')

result = {}

for i in xrange(len(source)):
    asplit = source[i].split('\t')
    bsplit = asplit[1].split('|')
    bsplit[-1] = ''
    asplit[1] = '|'.join(bsplit)

    if not asplit[0] in result:
        result[asplit[0]] = asplit[1] + '\t' + asplit[2]

print result