如何在两个月中的两个日期之间划分日期

蚂蚁0

我有一个熊猫DataFrame。它包含一列“开始日期”和一列“结束日期”。我想计算开始和结束之间的工作日,将它们分成几个月,然后将它们添加到数据框中作为其他列。

目前,这就是我编写的代码。无论如何,我可以使它更快吗?

data = pd.DataFrame([
    {'start_date': '2020-03-03', 'end_date' : '2020-06-18'},
    {'start_date': '2020-06-03', 'end_date' : '2020-09-18'},
])


def days_month(df):
    days_month = pd.DatetimeIndex(pd.bdate_range(df['start_date'], df['end_date'])).month.value_counts()
    for i in range(1, 13):
        try:
            days_month[i] 
        except:
            days_month[i] = 0

    return days_month

data[['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']] = data.apply(days_month, axis=1,  result_type="expand")

print(data)
    end_date  start_date  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  \
0  2020-06-18  2020-03-03    0    0   21   22   21   14    0    0    0    0   
1  2020-09-18  2020-06-03    0    0    0    0    0   20   23   21   14    0   

   Nov  Dec  
0    0    0  
1    0    0  

尝试实施Ethan建议的更新:


def countWeekDays(df):  

    fromDate=df['PO Creation Date']
    toDate=df['PO Expected Delivery Date']

    d = np.arange(fromDate, toDate, dtype=np.datetime64)

    weekdays = d[np.is_busday(d, busdaycal=calendar())] 

    workDays = {m: np.array([i for i in weekdays if i.item().month==m]).size for m in range(1,13)}

    return workDays

def calendar(): 
    #set work week mask and optional holidays array
    return np.busdaycalendar(weekmask='1111100', holidays=['2020-01-01','2020-01-20','2020-02-17','2020-05-25','2020-07-03','2020-09-07','2020-10-12','2020-11-11','2020-11-26','2020-12-25'])


data[['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']] = data.apply(countWeekDays, axis=1,  result_type="expand")

但是,这给了我一个内存错误。不知道为什么...

伊森

您可以使用numpy datetime函数和一些包装:

import numpy as np

def countWeekDays(fromDate='2020-03-03', toDate='2020-06-03'):  
    d = np.arange(fromDate, toDate, dtype=np.datetime64)

    weekdays = d[np.is_busday(d, busdaycal=calendar())] 

    workDays = [(m, np.array([i for i in weekdays if i.item().month==m]).size) for m in range(1,13)]

    return workDays

def calendar(): 
    #set work week mask and optional holidays array
    return np.busdaycalendar(weekmask='1111100', holidays=['2020-01-01','2020-01-20','2020-02-17','2020-05-25','2020-07-03','2020-09-07','2020-10-12','2020-11-11','2020-11-26','2020-12-25'])

结果:

>>> countWeekDays()
[(1, 0), (2, 0), (3, 21), (4, 22), (5, 20), (6, 2), (7, 0), (8, 0), (9, 0), (10, 0), (11, 0), (12, 0)]

这是对您的代码的修改,以构建一个与我的函数一起使用的数据框,以获取工作日。我认为您收到的错误是由于数据帧的构建和修改方式引起的。我的经验是,修改数据框有点麻烦,最好将数据汇总在一起,然后从完整的数据集中创建数据框:

def applyDays():
    data = [{'start_date': '2020-03-03', 'end_date' : '2020-06-18'},
    {'start_date': '2020-06-03', 'end_date' : '2020-09-18'}]
    return countWeekDays(data)


def countWeekDays(lst): #, result_type):  
    months = ['start_date','end_date',  'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    data = []
    for row in lst:  
        fromDate = row['start_date'] # df['PO Creation Date']
        toDate = row['end_date'] #df['PO Expected Delivery Date']    
        d = np.arange(fromDate, toDate, dtype=np.datetime64)    
        weekdays = d[np.is_busday(d, busdaycal=calendar())] 

        data.append([fromDate, toDate] + [np.array([i for i in weekdays if i.item().month==m]).size for m in range(1,13)])
    return pd.DataFrame(data, columns=months)

数据框结果:

applyDays()
Out[6]: 
   start_date    end_date  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  \
0  2020-03-03  2020-06-18    0    0   21   22   20   13    0    0    0    0   
1  2020-06-03  2020-09-18    0    0    0    0    0   20   22   21   12    0   

   Nov  Dec  
0    0    0  
1    0    0  

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章