使用日期和条件的条件的Python求和

yy zzz

我必须使用数据框,并且正在使用熊猫。我想从可变日期和按列中的值做累计和

我想在df2中添加第二列,以显示日期以了解在df2中date2之后AVG列的总和大于100的日期。

例如,以df1和df2是我开始的数据帧,而df3是我想要的数据,而df3 ['date100']是avg总和大于100的那一天:

df1 = pd.DataFrame({'date1': ['1/1/2014', '2/1/2014', '3/1/2014','1/1/2014', '2/1/2014', '3/1/2014','1/1/2014', '2/1/2014', '3/1/2014'],
 'Place':['A','A','A','B','B','B','C','C','C'],'AVG': [62,14,47,25,74,60,78,27,41]})

df2 = pd.DataFrame({'date2': ['1/1/2014', '2/1/2014'], 'Place':['A','C'])})

*Something*
df3 = pd.DataFrame({'date2': ['1/1/2014', '2/1/2014'], 'Place':['A','C'], 'date100': ['3/1/2014', '2/1/2014'], 'sum': [123, 105]})

我找到了一些答案,但大多数答案都使用groupby,而df2没有群组。

IMCoins

由于您的示例非常基础,如果您有一些极端的情况需要我照顾,请问一下。该解决方案意味着:

解决方案 :

#   For this solution your DataFrame needs to be sorted by date.
limit = 100
df = pd.DataFrame({
    'date1': ['1/1/2014', '2/1/2014', '3/1/2014','1/1/2014',
              '2/1/2014', '3/1/2014','1/1/2014', '2/1/2014', '3/1/2014'], 
    'Place':['A','A','A','B','B','B','C','C','C'],
    'AVG': [62,14,47,25,74,60,78,27,41]})

df2 = pd.DataFrame({'date2': ['1/1/2014', '2/1/2014'], 'Place':['A','C']})

result = []
for row in df2.to_dict('records'):
    #   For each date, I want to select the date that comes AFTER this one.
    #   Then, I take the .cumsum(), because it's the agg you wish to do.
    #   Filter by your limit and take the first occurrence.
    #   Converting this to a dict, appending it to a list, makes it easy
    #   to rebuild a DataFrame later.
    ndf = df.loc[ (df['date1'] >= row['date2']) & (df['Place'] == row['Place']) ]\
            .sort_values(by='date1')
    ndf['avgsum'] = ndf['AVG'].cumsum()
    final_df = ndf.loc[ ndf['avgsum'] >= limit ]

    #   Error handling, in case there is not avgsum above the threshold.
    try:
        final_df = final_df.iloc[0][['date1', 'avgsum']].rename({'date1' : 'date100'})
        result.append( final_df.to_dict() )
    except IndexError:
        continue

df3 = pd.DataFrame(result)

final_df = pd.concat([df2, df3], axis=1, sort=False)
print(final_df)
#       date2 Place  avgsum   date100
# 0  1/1/2014     A   123.0  3/1/2014
# 1  2/1/2014     C     NaN       NaN

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章