user970155 user970155 - 5 months ago 11
Python Question

PANDAS DROP ROWS based on filtered items, my solution - not satisfied

I am working on cleaning a list domain names.

I want to drop certain rows that "fit" a criteria. I have succeeded in identifying the first criteria, the second will be easy to do.

However, I cannot drop the rows. I have tried several solution but the best I have is the following.

from wordsegment import segment
import pandas as pd

def assignname():
dfr = pd.read_csv('')

for domainwtld in dfr.domain:
dprice = dfr.price
domainwotld = domainwtld.replace(".net", "")
seperate = wordsegment.segment(domainwotld)
dlnt = (min(seperate, key=len))
slnt = len(dlnt)
if slnt <= 1:
baddomains = domainwtld
a = dfr.loc[dfr['domain'] < (baddomains)]
print (a)

When I run this code, I receive a output that after dropping the first item in "baddomains", prints the entire item in "dfr". It does this until the loop is complete.

How can I can filter the "original" csv file based on baddomains?

from wordsegment import segment
import pandas as pd

url = ''
dfr = pd.read_csv(url)
dfr['domain'] = dfr.domain.str.replace(".net", "")
dfr['words'] = df.domain.apply(segment)
good_domains = dfr[dfr.words.apply(lambda words: len(min(words, key=len))) > 1]
bad_domains = dfr[~dfr.domain.isin(good_domains.domain)]

>>> bad_domains
        domain  price           words
2        keeng    700       [keen, g]
14       ymall    777       [y, mall]
22       idisc    850       [i, disc]
26      borsen    877      [borse, n]
38    cellacom    895  [cell, a, com]
51     iwealth    999     [i, wealth]
96     iplayer   1500     [i, player]
116  mcommerce   2000   [m, commerce]
118      apico   2052       [a, pico]
134     epharm   2500      [e, pharm]
139     ionica   2579      [ionic, a]
153    kasiino   2999   [kasi, in, o]
155    alpadia   3000   [al, padi, a]
158   similans   3152    [similan, s]
163    ifuture   3499     [i, future]

>>> bad_domains.domain.tolist()