nieka nieka - 1 month ago 13
Python Question

Delete rows in pandas which match your header

I'm kind of new with pandas and now i have a question.

I read a table from a html site and set my header according to the table on the website.

df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)


Now I have the my dataframe with a matching header BUT I have some rows that are the same as the header, like the example below.

RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG
1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06 253
2 John Tavares, C NYI 82 38 48 86 5 46 1.05 278
...
10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95 264
RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG
14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88 268


I know that it's possible to delete duplicate rows with panda but is it possible to delete rows that are duplicates of the header or a specific row?

Hope you can help me out !

Answer

You can use boolean indexing:

df = df[df.PLAYER != 'PLAYER']

If need also remove rows with PP in column PLAYER use isin:

Notice: I add [0] to the end of read_html, because it return list of dataframes an you need select first item of list:

df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)[0]
print (df)
     RK                  PLAYER  TEAM   GP    G    A  PTS  +/-  PIM  PTS/G  \
0     1          Jamie Benn, LW   DAL   82   35   52   87    1   64   1.06   
1     2         John Tavares, C   NYI   82   38   48   86    5   46   1.05   
2     3        Sidney Crosby, C   PIT   77   28   56   84    5   47   1.09   
3     4       Alex Ovechkin, LW   WSH   81   53   28   81   10   58   1.00   
4   NaN       Jakub Voracek, RW   PHI   82   22   59   81    1   78   0.99   
5     6    Nicklas Backstrom, C   WSH   82   18   60   78    5   40   0.95   
6     7         Tyler Seguin, C   DAL   71   37   40   77   -1   20   1.08   
7     8         Jiri Hudler, LW   CGY   78   31   45   76   17   14   0.97   
8   NaN        Daniel Sedin, LW   VAN   82   20   56   76    5   18   0.93   
9    10  Vladimir Tarasenko, RW   STL   77   37   36   73   27   31   0.95   
10  NaN                      PP    SH  NaN  NaN  NaN  NaN  NaN  NaN    NaN   
11   RK                  PLAYER  TEAM   GP    G    A  PTS  +/-  PIM  PTS/G   
12  NaN        Nick Foligno, LW   CBJ   79   31   42   73   16   50   0.92   
13  NaN        Claude Giroux, C   PHI   81   25   48   73   -3   36   0.90   
14  NaN         Henrik Sedin, C   VAN   82   18   55   73   11   22   0.89   
15   14       Steven Stamkos, C    TB   82   43   29   72    2   49   0.88   
...
...

mask = df['PLAYER'].isin(['PLAYER', 'PP'])
print (df[~mask])
     RK                  PLAYER TEAM  GP   G   A PTS  +/- PIM PTS/G  SOG  \
0     1          Jamie Benn, LW  DAL  82  35  52  87    1  64  1.06  253   
1     2         John Tavares, C  NYI  82  38  48  86    5  46  1.05  278   
2     3        Sidney Crosby, C  PIT  77  28  56  84    5  47  1.09  237   
3     4       Alex Ovechkin, LW  WSH  81  53  28  81   10  58  1.00  395   
4   NaN       Jakub Voracek, RW  PHI  82  22  59  81    1  78  0.99  221   
5     6    Nicklas Backstrom, C  WSH  82  18  60  78    5  40  0.95  153   
6     7         Tyler Seguin, C  DAL  71  37  40  77   -1  20  1.08  280   
7     8         Jiri Hudler, LW  CGY  78  31  45  76   17  14  0.97  158   
8   NaN        Daniel Sedin, LW  VAN  82  20  56  76    5  18  0.93  226   
9    10  Vladimir Tarasenko, RW  STL  77  37  36  73   27  31  0.95  264   
12  NaN        Nick Foligno, LW  CBJ  79  31  42  73   16  50  0.92  182   
13  NaN        Claude Giroux, C  PHI  81  25  48  73   -3  36  0.90  279   
14  NaN         Henrik Sedin, C  VAN  82  18  55  73   11  22  0.89  101   
15   14       Steven Stamkos, C   TB  82  43  29  72    2  49  0.88  268   
16  NaN        Tyler Johnson, C   TB  77  29  43  72   33  24  0.94  203   
17   16        Ryan Johansen, C  CBJ  82  26  45  71   -6  40  0.87  202   
18   17         Joe Pavelski, C   SJ  82  37  33  70   12  29  0.85  261   
19  NaN        Evgeni Malkin, C  PIT  69  28  42  70   -2  60  1.01  212   
20  NaN         Ryan Getzlaf, C  ANA  77  25  45  70   15  62  0.91  191   
21   20           Rick Nash, LW  NYR  79  42  27  69   29  36  0.87  304   
...
...