如何在特定条件下过滤熊猫数据框中的列值？

Arefe 发表于 Dev

阿拉法

我创建了一个 Pandas 数据框并想过滤一些值。数据框即包含 4 列currency port supplier_id value，我希望拥有能够满足下面提供的条件的值，

* port – expressed as a portcode, a 5-letter string uniquely identifying a port. Portcodes consist of 2-letter country code and 3-letter city code.
* supplier_id - integer, uniquely identifying the provider of the information
* currency - 3-letter string identifying the currency
* value - a floating-point number

df =  df[ (len(df['port']) == 5 & isinstance(df['port'], basestring)) & \
  isinstance(df['supplier_id'], int) & \
  (len(df['currency']) == 3 & isinstance(df['currency'], basestring))\
  isinstance(df['value'], float) ]

代码片段应该很明显，并试图实现前面提到的条件，但它不起作用。df下面提供了印刷品，

     currency   port  supplier_id   value
0         CNY  CNAQG         35.0   820.0
1         CNY  CNAQG         19.0   835.0
2         CNY  CNAQG         49.0   600.0
3         CNY  CNAQG         54.0   775.0
4         CNY  CNAQG        113.0   785.0
5         CNY  CNAQG          5.0   790.0
6         CNY  CNAQG         55.0   770.0
7         CNY  CNAQG         81.0   810.0
8         CNY  CNAQG          2.0   770.0
9         CNY  CNAQG         10.0   825.0


print df[df.supplier_id.isnull()] # prints below 
Empty DataFrame
Columns: [currency, port, supplier_id, value]
Index: []



df.info() # prints below     
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6661 entries, 0 to 6660
Data columns (total 4 columns):
currency       6661 non-null object
port           6661 non-null object
supplier_id    6661 non-null float64
value          6661 non-null float64
dtypes: float64(2), object(2)
memory usage: 208.2+ KB
None

怎么写才合适？

耶斯列

如果在一列中有混合值，您可以使用 - 数字与字符串：

df = pd.DataFrame({'port':['aa789',2,3],
                   'supplier_id':[4,'s',6],
                   'currency':['USD',8,9],
                   'value':[1.7,3,5]})

print (df)
  currency   port supplier_id  value
0      USD  aa789           4    1.7
1        8      2           s    3.0
2        9      3           6    5.0

#for python 2 change str to basestring
m1 = (df.port.astype(str).str.len() == 5) & (df.port.apply(lambda x :isinstance(x, str)))
m2 = df.supplier_id.apply(lambda x : isinstance(x, int))
m3=(df.currency.astype(str).str.len() == 3)&(df.currency.apply(lambda x :isinstance(x, str)))
m4 = df.value.apply(lambda x : isinstance(x, float))
mask = m1 & m2 & m3 & m4
print (mask)
0     True
1    False
2    False
dtype: bool

print (df[mask])
  currency   port supplier_id  value
0      USD  aa789           4    1.7

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。