user2406718 user2406718 - 3 years ago 92
Python Question

How to use numpy where for finding multiple records

I am using numpy to read csv file. I am trying to add condition to match the records from csv file.
my sample team_file looks like this

43596,Team1,50,team1data,id1
43597,Team2,51,team2data,id2
43598,Team3,50,team2data,id2


Below is the code

import numpy as np
reader = np.genfromtxt(team_file, delimiter=',', usecols=np.arange(1, 3), dtype=None)


I want to fetch the column2 when the column3 is 50.
in this example Team1 and Team3 should be the output.

I want to use np.where without writing a for loop. Is there a way to achieve this using numpy? i cannot use pandas.
by doing reader[0][1] gives me the value as 50, but how do i achieve it for all the records in the file?

Appreciate any help

Answer Source
In [90]: txt=b"""43596,Team1,50,team1data,id1 
    ...: 43597,Team2,51,team2data,id2
    ...: 43598,Team3,50,team2data,id2
    ...: """
In [92]: data=np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[1,3],dtype=None)
In [93]: data
Out[93]: 
array([[b'Team1', b'team1data'],
       [b'Team2', b'team2data'],
       [b'Team3', b'team2data']],
      dtype='|S9')

Can't test for '50'; no such value any of the 'usecols'.


If I load all columns, I can test the 3rd field for '50', select the appropriate records:

In [94]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None)
In [95]: data
Out[95]: 
array([(43596, b'Team1', 50, b'team1data', b'id1'),
       (43597, b'Team2', 51, b'team2data', b'id2'),
       (43598, b'Team3', 50, b'team2data', b'id2')],
      dtype=[('f0', '<i4'), ('f1', 'S5'), ('f2', '<i4'), ('f3', 'S9'), ('f4', 'S3')])
In [96]: data['f2']
Out[96]: array([50, 51, 50])
In [97]: idx = np.where(data['f2']==50)
In [98]: idx
Out[98]: (array([0, 2], dtype=int32),)
In [99]: data[idx]
Out[99]: 
array([(43596, b'Team1', 50, b'team1data', b'id1'),
       (43598, b'Team3', 50, b'team2data', b'id2')],
      dtype=[('f0', '<i4'), ('f1', 'S5'), ('f2', '<i4'), ('f3', 'S9'), ('f4', 'S3')])
In [100]: data['f1'][idx]
Out[100]: 
array([b'Team1', b'Team3'],
      dtype='|S5')

Correction. using arange(1,3) instead of [1,3] to select columns

In [102]: data=np.genfromtxt(txt.splitlines(),delimiter=',',usecols=np.arange(1,
     ...: 3),dtype=None)
In [103]: data
Out[103]: 
array([(b'Team1', 50), (b'Team2', 51), (b'Team3', 50)],
      dtype=[('f0', 'S5'), ('f1', '<i4')])
In [104]: idx = np.where(data['f1']==50)
In [105]: data[idx]
Out[105]: 
array([(b'Team1', 50), (b'Team3', 50)],
      dtype=[('f0', 'S5'), ('f1', '<i4')])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download