我需要过滤一个数据框以减少更新用户属性的时间。
+----------+------------+------------+
| userCol1 | dateCol1 | dateCol2 |
+----------+------------+------------+
| user1 | 2020-01-16 | 2019-12-30 |
| user2 | 2019-10-31 | 2020-01-12 |
| user3 | 2019-08-15 | 2019-09-30 |
| user4 | 2019-08-25 | NaN |
+----------+------------+------------+
以上是数据框的示例。我需要为datecol1
或的最新日期的任何用户过滤它datecol2 is <= today-90 days
。在上面的示例中,上述数据框应在该数据框中产生user2
并user4
保留在数据框中以进行处理。
我编写的代码(尚未经过测试,所以我不知道它是否有效)不会过滤数据帧,而是尝试遍历整个对象;这是代码。
for row in df3.itertuples() :
print(row.username)
print(row.Password_Last_Set)
print(row.Password_Last_forgot)
if row.Password_Last_Forgot is 'NaN' and row.Password_Last_Set <= today.timedelta(days=90) :
print('password expired based on last set, no forgot passwords')
elif row.Password_Last_Forgot is not 'NaN' and row.Password_Last_Forgot > row.Password_Last_Set and row.Password_Last_Forgot <= today.timedelta(days=90) :
print('password expired based on last forgot')
elif row.Password_Last_Forgot is not 'NaN' and row.Password_Last_Forgot < row.Password_Last_Set and row.Password_Last_Set <= today.timedelta(days=90) :
print('password expired based on last set')
在遍历用户以对其余用户执行操作之前,如何过滤?
使用boolean indexing
与max
用于最新的日期时间:
df[['dateCol1','dateCol2']] = df[['dateCol1','dateCol2']].apply(pd.to_datetime)
cols = ['dateCol1','dateCol2']
df1 = df.loc[df[cols].max(axis=1)<=pd.Timestamp.now() - pd.Timedelta(90, unit='d'), 'userCol1']
print (df1)
2 user3
3 user4
Name: userCol1, dtype: object
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句