在这种情况下,最好的 Pandas 应用/循环方法是什么?

克里斯托弗

我正在转换一些申请人的交易数据,我需要创建一个新的标志列(在我的示例中标记为“DESIRED FLAG”)。但是,我无法找出正确的循环/应用方法,因为下面的逻辑可能有很多不同的变化。

在一个完美的世界中,连续的申请人流程历史看起来像这样,所有“状态”都设置为“已完成”:

  • 现场面试开始->安排面试->决策;或者
  • 电话面试开始->安排面试->决定

当然,申请人在申请过程中可以通过许多电话面试和现场面试。

如下例所示,有时会取消“安排面试”。在这些情况下,我需要删除该步骤以及与之相关的后续步骤。这些包括“安排面试”、“决定”和“现场面试开始”或“电话面试开始”。此外,有时可能还有其他“事件”,就像我们在手动跳过的事件中看到的那样。

我还有其他类型的场景需要为其创建标志,因此我需要将原始数据框保留在新列中。

import pandas as pd

data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
        'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
        'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
        'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
        'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))

df
Costa Huang

我认为以下代码可以解决您的问题

import pandas as pd

data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
        'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
        'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
        'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
        'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))


index_list_delete = []
start_deleting = False
for i in range(0, len(df)):
    if start_deleting == False:
        # whenever I see a "CANCELED", i know some following rows need to be deleted
        if df.iloc[i]['Event Status'] == 'CANCELED':
            index_list_delete += [i]
            start_deleting = True
    else:
        # whenever i see a "Schedule Interviews", i need to stop deleting. 
        # otherwise keep track of the rows that need to be deleted
        if df.iloc[i]['Event'] == 'Schedule Interviews':
            start_deleting = False
        else:
            index_list_delete += [i]

# deleting rows
df = df.drop(df.index[index_list_delete])
# reseting index
df = df.reset_index(drop = True)

你会得到以下结果

   Employee ID Completed On Date                       Event Event Status DESIRED FLAG
0          100        2009-01-01                    Decision    Completed         Keep
1          100        2010-01-01  On-Site Interview Kick Off    Completed         Keep
2          100        2014-01-01         Schedule Interviews    Completed         Keep
3          100        2015-01-01                    Decision    Completed         Keep
4          100        2016-01-01    Phone Interview Kick Off    Completed         Keep
5          100        2017-01-01         Schedule Interviews    Completed         Keep
6          100        2018-01-01                    Decision    Completed         Keep
7          200        2010-01-01  On-Site Interview Kick Off    Completed         Keep
8          200        2014-01-01         Schedule Interviews    Completed         Keep
9          200        2015-01-01                    Decision    Completed         Keep
10         300        2009-01-01                   Job Apply    Completed         Keep
11         300        2010-01-01    Phone Interview Kick Off    Completed         Keep
12         300        2014-01-01         Schedule Interviews    Completed         Keep
13         300        2015-01-01                    Decision    Completed         Keep

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

在这种情况下,最好地测试axios功能的正确方法是什么?

为什么在这种情况下最好使用StringComparison.Ordinal?

CSS-在这种情况下最好使用什么(px,%,vw,wh或em)?

优化-在这种情况下,工会是最好的方法吗?

在这种情况下,pandas中有没有更简单的方法可以替换空值而不是循环?

在这种情况下,什么是“生成”?

在这种情况下,避免混乱循环的最佳方法是什么?

> =在这种情况下是什么意思

在这种情况下 res 是什么意思?

在这种情况下,语法[]是什么意思?

在这种情况下,`typedef`是什么意思

在这种情况下,svn更新的流程是什么?

在这种情况下,括号是什么意思?

在这种情况下,运算符“(> =)”是什么?

C ++:在这种情况下引用的优点是什么?

jQuery:在这种情况下,.on()的正确用法是什么?

在这种情况下,熵是什么意思?

在这种情况下setViewControllers的目的是什么

在这种情况下,最好实现ObservableBase还是有另一种方法?

为什么在这种情况下创建循环?

在这种情况下如何循环数组?

在这种情况下,如何从列表理解转到循环?

在这种情况下如何使用for循环?

如何在这种情况下添加 for 循环?

属性“ fallthrough”不能在这种情况下应用

在这种情况下如何应用内部联接?

为什么在这种情况下需要引用/借用?

为什么在这种情况下转发参考无效?

在这种情况下,“重新加载”有什么作用?