如何获取自定义间隔中的最后日期？-熊猫

ileadall42

可能我的示例很大，我的代码在这里：

import pandas as pd
import numpy as np
import io
t = """
name     date
a     2005-08-31
a     2005-09-20
a     2005-11-12
a     2005-12-31
a     2006-03-31
a     2006-06-25
a     2006-07-23
a     2006-09-28
a     2006-12-21
a     2006-12-27
a     2007-07-23
a     2007-09-21
a     2007-03-15
a     2008-04-12
a     2008-06-21
a     2008-06-11
b     2005-08-31
b     2005-09-23
b     2005-11-12
b     2005-12-31
b     2006-03-31
b     2006-06-25
b     2006-07-23
b     2006-09-28
b     2006-12-21
b     2006-12-27
b     2007-07-23
b     2007-09-21
b     2007-03-15
b     2008-04-12
b     2008-06-21
b     2008-06-11
"""
data=pd.read_csv(io.StringIO(t),delimiter='     ')#5 space here
data

我想做的是找到一年的最后一天，开始2005-7-1，结束2006-06-30，开始2006-7-1和结束2007-6-30...等等。我的预期输出在这里：

name     date
a     2006-06-25  #the last day of the 2005/7/01 -2006/06/31
a     2007-03-15  #the last day of the 2006/7/01 -2007/06/31
a     2008-06-21  #the last day of the 2007/7/01 -2008/06/31
b     2006-06-25  #the last day of the 2005/7/01 -2006/06/31
b     2007-03-15  #the last day of the 2006/7/01 -2007/06/31
b     2008-06-21  #the last day of the 2007/7/01 -2008/06/31

如何解决呢？我想我应该使用custom

安迪·海登（Andy Hayden）

您可以使用单个groupby而无需回滚来做到这一点：

In [11]: data.date = pd.to_datetime(data.date, format="%Y-%m-%d")

In [12]: df.groupby(["name", pd.Grouper(key="date", freq="AS-JUL")])["date"].max()
Out[12]:
name  date
a     2005-07-01   2006-06-25
      2006-07-01   2007-03-15
      2007-07-01   2008-06-21
b     2005-07-01   2006-06-25
      2006-07-01   2007-03-15
      2007-07-01   2008-06-21
Name: date, dtype: datetime64[ns]

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。