A. Haidar :
I have a dataframe that consists of multiple columns. I want to select rows based on conditions in multiple columns. Assuming that I have four columns in a dataframe:
import pandas as pd
di={"A":[1,2,3,4,5],
"B":['Tokyo','Madrid','Professor','helsinki','Tokyo Oliveira'],
"C":['250','200//250','250//250//200','12','200//300'],
"D":['Left','Right','Left','Right','Right']}
data=pd.DataFrame(di)
I want to select Tokyo in column B, 200 in column C, Left in column D. By that, the first row will be only selected. I have to create a function to handle column C. Since I need to check the first value if the row contains a list with //
To handle this, I assume this can be done through the following:
def check_200(thecolumn):
thelist=[]
for i in thecolumn:
f=i
if "//" in f:
#split based on //
z=f.split("//")
f=z[0]
f=float(f)
if f > 200.00:
thelist.append(True)
else:
thelist.append(False)
return thelist
Then, I will create the multiple conditions:
selecteddata=data[(data.B.str.contains("Tokyo")) &
(data.D.str.contains("Left"))&(check_200(data.C))]
Is this the best way to do that, or there is an easier pandas function that can handle such requirements ?
Bruno Mello :
I don't think there is a most pythonic way to do this, but I think this is what you want:
bool_idx = ((data.B.str.contains("Tokyo")) &
(data.D.str.contains("Left")) & (data.C.str.contains("//")
& (data.C.str.split("//")[0].astype(float)>200.00))
selecteddata=data[bool_idx]
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=379464&siteId=1