应该是我的数据集
Name date
A 15-01-01
A 15-01-31
A 16-02-02
A 16-03-04
B 17-04-05
B 17-05-08
B 17-07-09
C 18-01-02
C 18-02-03
我想将每个人的第一个和最后一个日期添加到新列中。
结果我想要的
Name date startdate enddate
A 15-01-01 15-01-01 16-03-04
A 15-01-31 15-01-01 16-03-04
A 16-02-02 15-01-01 16-03-04
A 16-03-04 15-01-01 16-03-04
B 17-04-05 17-04-05 17-07-09
B 17-05-08 17-04-05 17-07-09
B 17-07-09 17-04-05 17-07-09
C 18-01-02 18-01-02 18-02-03
C 18-02-03 18-01-02 18-02-03
有没有办法使之成为可能?如果您让我知道,我将不胜感激。
谢谢你的阅读
GroupBy.transform
与GroupBy.first
或一起使用,GroupBy.last
并添加到新列中:
g = df.groupby('Name')['date']
df = df.assign(startdate = g.transform('first'), enddate = g.transform('last'))
print (df)
Name date startdate enddate
0 A 15-01-01 15-01-01 16-03-04
1 A 15-01-31 15-01-01 16-03-04
2 A 16-02-02 15-01-01 16-03-04
3 A 16-03-04 15-01-01 16-03-04
4 B 17-04-05 17-04-05 17-07-09
5 B 17-05-08 17-04-05 17-07-09
6 B 17-07-09 17-04-05 17-07-09
7 C 18-01-02 18-01-02 18-02-03
8 C 18-02-03 18-01-02 18-02-03
或者,如果需要使用GroupBy.min
和的最小值和最大值GroupBy.max
:
df['date'] = pd.to_datetime(df['date'], format='%y-%m-%d')
g = df.groupby('Name')['date']
df = df.assign(startdate = g.transform('min'), enddate = g.transform('max'))
print (df)
Name date startdate enddate
0 A 2015-01-01 2015-01-01 2016-03-04
1 A 2015-01-31 2015-01-01 2016-03-04
2 A 2016-02-02 2015-01-01 2016-03-04
3 A 2016-03-04 2015-01-01 2016-03-04
4 B 2017-04-05 2017-04-05 2017-07-09
5 B 2017-05-08 2017-04-05 2017-07-09
6 B 2017-07-09 2017-04-05 2017-07-09
7 C 2018-01-02 2018-01-02 2018-02-03
8 C 2018-02-03 2018-01-02 2018-02-03
替代与GroupBy.agg
具有DataFrame.join
:
df1 = df.groupby('Name').agg(startdate = ('date','first'), lastdate = ('date','last'))
df = df.join(df1, on='Name')
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句