可能我的示例很大,我的代码在这里:
import pandas as pd
import numpy as np
import io
t = """
name date
a 2005-08-31
a 2005-09-20
a 2005-11-12
a 2005-12-31
a 2006-03-31
a 2006-06-25
a 2006-07-23
a 2006-09-28
a 2006-12-21
a 2006-12-27
a 2007-07-23
a 2007-09-21
a 2007-03-15
a 2008-04-12
a 2008-06-21
a 2008-06-11
b 2005-08-31
b 2005-09-23
b 2005-11-12
b 2005-12-31
b 2006-03-31
b 2006-06-25
b 2006-07-23
b 2006-09-28
b 2006-12-21
b 2006-12-27
b 2007-07-23
b 2007-09-21
b 2007-03-15
b 2008-04-12
b 2008-06-21
b 2008-06-11
"""
data=pd.read_csv(io.StringIO(t),delimiter=' ')#5 space here
data
我想做的是找到一年的最后一天,开始2005-7-1
,结束2006-06-30
,开始2006-7-1
和结束2007-6-30
...等等。我的预期输出在这里:
name date
a 2006-06-25 #the last day of the 2005/7/01 -2006/06/31
a 2007-03-15 #the last day of the 2006/7/01 -2007/06/31
a 2008-06-21 #the last day of the 2007/7/01 -2008/06/31
b 2006-06-25 #the last day of the 2005/7/01 -2006/06/31
b 2007-03-15 #the last day of the 2006/7/01 -2007/06/31
b 2008-06-21 #the last day of the 2007/7/01 -2008/06/31
如何解决呢?我想我应该使用custom
您可以使用单个groupby而无需回滚来做到这一点:
In [11]: data.date = pd.to_datetime(data.date, format="%Y-%m-%d")
In [12]: df.groupby(["name", pd.Grouper(key="date", freq="AS-JUL")])["date"].max()
Out[12]:
name date
a 2005-07-01 2006-06-25
2006-07-01 2007-03-15
2007-07-01 2008-06-21
b 2005-07-01 2006-06-25
2006-07-01 2007-03-15
2007-07-01 2008-06-21
Name: date, dtype: datetime64[ns]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句