如何在熊猫，python3中将条件与其他行（时间序列数据中的先前时刻）一起使用

AurA 发表于 Dev

光环

我有一个pandas.dataframe df。它是一个时间序列数据，具有1000行3列。我想要的在下面的伪代码中给出。

for each row
    if the value in column 'colA' at [this_row-1] is higher than 
    the value in column 'B' at [this_row-2] for more than 3%
        then set the value in 'colCheck' at [this_row] as True.

Finally, pickout all the rows in the df where 'colCheck' are True.

我将使用以下示例进一步说明我的目的。

 df =     
             'colA', 'colB', 'colCheck'
 Dates
 2017-01-01,     20,     30,      NAN
 2017-01-02,     10,     40,      NAN
 2017-01-03,     50,     20,     False
 2017-01-04,     40,     10,      True

首先，当this_row = 2（第三行，日期为colA2017年1月3日）时，[this_row-1]中10的值为colB，[this_row-2]中的值为30。因此(10-30)/30 = -67% < 3%，因此colCheck[this_row]中的值为False。

同样地，当this_row = 3，(50-40)/40 = 25% > 3%，所以在值colCheck在[this_row]为True。

最后但并非最不重要的一点colCheck是，由于计算需要访问中的[this_row-2] ，因此其中的前两行应为NAN colB。但是前两行没有[this_row-2]。

此外，的标准3%和[行-1]中colA，[行-2]在colB仅仅是示例。在我的实际项目中，它们是情境的，例如4%[row-3]。

我正在寻找简洁优雅的方法。我正在使用Python3。

谢谢。

海盗

您可以重新排列数学并使用 pd.Series.shift

df.colA.shift(1).div(df.colB.shift(2)).gt(1.03)

Dates
2017-01-01    False
2017-01-02    False
2017-01-03    False
2017-01-04     True
dtype: bool

使用pd.DataFrame.assign我们可以用新列创建一个副本

df.assign(colCheck=df.colA.shift(1).div(df.colB.shift(2)).gt(1.03))

            colA  colB  colCheck
Dates                           
2017-01-01    20    30     False
2017-01-02    10    40     False
2017-01-03    50    20     False
2017-01-04    40    10      True

如果您坚持将前两个保留为NaN，则可以使用iloc

df.assign(colCheck=df.colA.shift(1).div(df.colB.shift(2)).gt(1.03).iloc[2:])

            colA  colB colCheck
Dates                          
2017-01-01    20    30      NaN
2017-01-02    10    40      NaN
2017-01-03    50    20    False
2017-01-04    40    10     True

为了最大的清晰度：

# This creates a boolean array of when your conditions are met
colCheck = (df.colA.shift(1) / df.colB.shift(2)) > 1.03
# This chops off the first two `False` values and creates a new
# column named `colCheck` and assigns to it the boolean values
# calculate just above.
df.assign(colCheck=colCheck.iloc[2:])

            colA  colB colCheck
Dates                          
2017-01-01    20    30      NaN
2017-01-02    10    40      NaN
2017-01-03    50    20    False
2017-01-04    40    10     True

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。