K.heer K.heer - 23 days ago 10
Python Question

Python format and pandas

I want to drop some columns using format.(want to drop columns: new_cost0, new_0_quantity, new_2_cost, and new_2_quantity) But not every column is dropped. Below is the data frame and codes.

dataFrame

|new_0_cost|new_0_quantity|new_2_cost|new_2_quantity|quality|weights|
0| 10 | 20 | 10 | 20 | good | 40 |


function

def drop_cost_and_quan(data):
# data is a dataframe described above
# try to drop new_cost0, new_0_quantity, new_2_cost, and new_2_quantity
data3 = data.copy()
for i, item in enumerate(data3.columns):
if item == 'new_{0}_cost'.format(i):
data3 = data3.drop(item, axis=1)
print('cost:',item == 'new_{0}_cost'.format(i))

for i, item in enumerate(data3.columns):
if item == 'new_{0}_quantity'.format(i):
data3 = data3.drop(item, axis=1)
print(item == 'item_{0}_quantity'.format(i))

return data3


Outptut:

data3 = drop_cost_and_quan(data):

cost: True
cost: False
cost: True
cost: False
cost: False
cost: False
quntity: True
quntity: False
quntity: False
quntity: False

data3
|new_2_quantity|quality| weights|
0| 20 |good |40

Answer

alternatively to @vinod's method you can also do it this way:

In [148]: df
Out[148]:
   new_0_cost  new_0_quantity  new_2_cost  new_2_quantity  new_0_total_cost  new_2_total_cost quality  weights
0          10              20          10              20              1111              2222    good       40

In [151]: df.drop(df.columns[df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')], 1, inplace=True)

In [152]: df
Out[152]:
   new_0_total_cost  new_2_total_cost quality  weights
0              1111              2222    good       40

Explanation:

In [148]: df
Out[148]:
   new_0_cost  new_0_quantity  new_2_cost  new_2_quantity  new_0_total_cost  new_2_total_cost quality  weights
0          10              20          10              20              1111              2222    good       40

In [149]: df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')
Out[149]: array([ True,  True,  True,  True, False, False, False, False], dtype=bool)

In [150]: df.columns[df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')]
Out[150]: Index(['new_0_cost', 'new_0_quantity', 'new_2_cost', 'new_2_quantity'], dtype='object')
Comments