Matt H Matt H - 3 months ago 9
Python Question

Changing an exact value in csv using Pandas

I have a CSV file that looks similar to this:

id | item_name1 | item_name2
--------------------------------------------------
1 | unclassified | text
2 | SantaCruz unclassified | text
3 | text | text
4 | texttext | text
5 | unclassified | text
6 | unclassified text | unclassified
7 | text | unclassified text
8 | text | text
9 | text | text
.. | .. | ..
1000 | unclassified text | text


I'm trying to erase all cells that only say "unclassified"; i.e., cells such as "SantaCruz unclassified" should remain untouched.

I have found a lot of examples that use the replace function to remove a specific word, but have not found any examples that only replace cells where there is an exact match.

I'm using Pandas, and am able to open the csv, print, etc., but am having trouble solving this specific problem. Any help would be greatly appreciated!

Thanks

Answer

pandas.Series.str.replace can take regex as an argument. Assuming that there are no preceding and trailing spaces in a string "unclassified" that you want to replace, that regex should be ^unclassified$:

df['item_name1'].str.replace('^unclassified$', 'replaced_string')

0           replaced_string
1    SantaCruz unclassified
2                      text
3                  texttext
4           replaced_string
5         unclassified text
6                      text
7                      text
8                      text
9         unclassified text
Name: item_name1, dtype: object