renakre renakre - 4 months ago 13
Python Question

Counting number of links for each row and adding the counts as a new column

I am trying to count the 'href' instances within the 'Body' column and add the count value as a new column corresponding for each row.

I can get the count of links using this:

dataframe1['Body'].str.contains('href').sum()


However, this finds the link count for all rows not per row, which is 1770. I tried the following, it assigned again the link count for all rows (i.e., 1770). So, it also did not work:

dataframe1['LinkCount'] = dataframe1['Body'].str.contains('href').sum()


I thought,
apply()
would work, but it returned NaN value as the count value:

dataframe1['LinkCount'] = dataframe1[['Body']].apply(lambda x: x.str.contains('href').sum())


Can anyone help me? What am I doing wrong?

Answer

try this:

In [134]: df
Out[134]:
                              Body
0                              aaa
1                      href...href
2                              bbb
3                             href
4  href aaa href bbb href ccc href

In [135]: df['count'] = df.Body.str.findall('href').apply(len)

In [136]: df
Out[136]:
                              Body  count
0                              aaa      0
1                      href...href      2
2                              bbb      0
3                             href      1
4  href aaa href bbb href ccc href      4
Comments