Lelo Lelo - 6 months ago 27
Python Question

reading from file and manipulate in python

I have a text file:

it can change each time and the number of lines can be changed, and contains the following for each line:

string (can contain one word, two or even more) ^ string of one word

level country ^ layla
hello sandra ^ organization
hello people ^ layla
hello samar ^ organization

I want to create dataframe using pandas such that:

item0 ( country, people)
item1 (sandra , samar)

Because for example each time there layla, we are returning the most right name that belongs to it and added it as the second column just shown above which is in this case ( country, people), and we called layla as item0 and as the index of the dataframe. I can't seem to arrange this and I don't know how to do the logic for returning the duplicated of whatever after the "^" and returning the list of its belonged most right name. My trial so far which doesn't really do it is:

def text_file(file):

file_of_text = "text.txt"
with open(file_of_context) as f:
for l in f:
l_dict = l.split(" ")

def items(file_of_text):

list_of_items= text_file(file_of_text)
for a in list_of_items:
for b in a:
if a[-1]==

def main():

file_of_text = "text.txt"

if __name__ == "__main__":


Starting with pandas read_csv() Specifying '^' as your delimiter and using arbitrary column names

df = pd.read_csv('data.csv', delimiter='\^', names=['A', 'B'])
print (df)
                A              B
0  level country           layla
1  hello sandra     organization
2   hello people           layla
3   hello samar     organization

then we split to get the values we want. That expand arg is new in pandas 16 I believe

df['A'] = df['A'].str.split(' ', expand=True)[1]
         A              B
0  country          layla
1   sandra   organization
2   people          layla
3    samar   organization

then we group column B and apply the tuple function. Note: We're reseting the index so we can use it later

g = df.groupby('B')['A'].apply(tuple).reset_index()
              B                  A
0          layla  (country, people)
1   organization    (sandra, samar)

Creating a new column with the string 'item' and the index

   g['item'] = 'item' + g.index.astype(str)
    print (g[['item','A']])
        item                  A
    0  item0  (country, people)
    1  item1    (sandra, samar)