我有一个输入数据的示例,可以在这里找到
我需要根据每行中的数据添加2列:“ Begin_date”和“ End_date”:
输出示例:
我将不胜感激:)谢谢
使用pd.melt()
按ID和日期对数据进行排序
import pandas as pd
import numpy as np
from pandas.tseries.offsets import MonthEnd
df = pd.read_excel("input.xlsx")
max_date = df.columns[-1]
res = pd.melt(df, id_vars=['id', 'region'], value_vars=df.columns[2:])
res.dropna(subset=['value'], inplace=True)
res.sort_values(by=['id', 'variable'], ascending=[True, True], inplace=True)
minimum_date = res.drop_duplicates(subset=['id'], keep='first')
maximum_date = res.drop_duplicates(subset=['id'], keep='last')
minimum_date.rename(columns={'variable': 'start_date'}, inplace=True)
maximum_date.rename(columns={'variable': 'end_date'}, inplace=True)
df = pd.merge(df, minimum_date[['id', 'start_date']], on=['id'], how='left')
df = pd.merge(df, maximum_date[['id', 'end_date']], on=['id'], how='left')
df['end_date'] = np.where(df['end_date']==max_date,
"99991231",df['end_date'])
df['start_date'] = (pd.to_datetime(df['start_date'],format="%Y%m",errors='coerce') +MonthEnd(1)).astype(str)
df['end_date'] = (pd.to_datetime(df['end_date'],format="%Y%m",errors='coerce') +MonthEnd(1)).astype(str)
df['end_date'] = np.where(df['end_date']=='NaT',
"99991231",df['end_date'])
print(df)
id region 201801 201802 ... 201905 201906 start_date end_date
0 100001 628 NaN NaN ... 26.0 23.0 2018-09-30 99991231
1 100002 1149 27.0 24.0 ... 26.0 24.0 2018-01-31 99991231
2 100003 1290 26.0 26.0 ... 27.0 25.0 2018-01-31 99991231
3 100004 955 25.0 26.0 ... NaN NaN 2018-01-31 2018-12-31
4 100005 1397 15.0 25.0 ... NaN NaN 2018-01-31 2018-11-30
5 100006 1397 15.0 25.0 ... NaN NaN 2018-01-31 2019-02-28
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句