熊猫集团按条件连续经营

Ryan 发表于 Dev

瑞安

我试图在特定条件匹配的情况下“组合”类似数据的连续行，而我尝试过的一切都是抛出错误或以意外方式将数据放在一起。

数据：

       open    high     low   close     volume       datetime
0    257.31  259.04  255.63  257.86  335889185  1510552800000
1    258.14  260.48  257.86  260.36  190142219  1511157600000
2    260.41  266.05  260.00  264.46  521044032  1511762400000
3    266.31  266.80  262.71  265.51  401716112  1512367200000
4    265.58  267.56  265.39  266.51  516455674  1512972000000
..      ...     ...     ...     ...        ...            ...
151  336.06  347.35  334.38  346.85  297612670  1601874000000
152  349.59  354.02  343.13  347.29  361462322  1602478800000
153  348.65  349.33  340.65  345.78  296595696  1603083600000
154  342.13  342.98  322.60  326.54  495607791  1603688400000
155  330.20  352.19  327.24  350.16  463334913  1604296800000

我想这是连续行合并open > close和close > open（这是股票数据），这样我就可以有一个大的蜡烛连续相同的蜡烛。

最初，我首先创建一列以表示它是哪种类型的行（可能不是必需的，并且可以在行合并期间以单线进行比较吗？）：

def green_or_red(self, row):
        if row['open'] > row['close']:
            val = 'R'
        elif row['open'] < row['close']:
            val = 'G'
        else:
            val = 'N'
        return val
df['candle_is'] = df.apply(green_or_red, axis=1)

哪个分配正确，但是合并连续行是我遇到的问题：

# merge the consecutive same types of candles
g = df['candle_is'].ne(df['candle_is'].shift()).cumsum()
dfn = df.groupby(['candle_is', g], sort=False).agg({'open': max, 'close': min, 'high': max, 'low': min, 'volume': sum})

产生：

                       open   close      high     low      volume
candle_is candle_is
G         1          260.41  257.86  266.0500  255.63  1047075436
R         2          266.31  265.51  266.8000  262.71   401716112
G         3          265.58  266.51  267.5600  265.39   516455674
R         4          268.10  266.86  268.6000  266.64   632660142
G         5          280.17  273.42  286.6285  267.40  1655227273
...                     ...     ...       ...     ...         ...
          73         342.12  326.52  350.7200  319.64  1280999271
R         74         350.35  330.65  358.7500  327.97  1257122392
G         75         336.06  328.73  347.3500  319.80  1099865805
R         76         349.59  326.54  354.0200  322.60  1153665809
G         77         330.20  350.16  352.1900  327.24   463334913

但是我需要将红色（R）和绿色（G）蜡烛之间的逻辑分开，以便agg（）的工作方式有所不同，因为每种类型的开/关值应在最小值/最大值之间交换：

# green
df.groupby(['candle_is', g], sort=False).agg({'open': max, 'close': min, 'high': max, 'low': min, 'volume': sum})
# red
df.groupby(['candle_is', g], sort=False).agg({'open': min, 'close': max, 'high': max, 'low': min, 'volume': sum})

但是，在找不到大量错误的情况下，我无法找到一种专门利用g或df['candle_is'] == 'G'针对这些目标的方法，因为一旦我过滤了数据，大小就不匹配了。怎样才能明智地做到这一点？谢谢！

广晃

如果您要交换您的商品min/max，可能会更容易注意到这一点max(-array) = -min(array)。因此，我们可以将数据与-1相乘然后相乘回去：

# use this instead of `apply`, which is not vectorized
candles = np.select([df['open']>df['close'], df['open']<df['close']],
                    ['R','G'], 'N')

# turn candles into series
candles =pd.Series(candles, index=df.index)

g = candles.ne(candles.shift()).cumsum()

# change sign of `red` candles so min becomes max and so on
multipliers = np.where(candles=='R', -1, 1)

# groupby as usual
# note that `'max'` is vectorize while `max` is not
ret = (df.mul(multipliers, axis='rows')
       .groupby([candles, g], sort=False)
       .agg({'open': 'max', 'close': 'min', 
             'high': 'max', 'low': 'min', 
             'volume': 'sum'})
)

# multiply the red candles by `-1`
# Since we are working with MultiIndex, we slice by the level values 
ret.loc[ret.index.get_level_values(0)=='R'] *= -1

样本数据的输出（注意第二R组中的值）：

               open   close    high     low      volume
  candle_is                                            
G 1          260.41  257.86  266.05  255.63  1047075436
R 2          266.31  265.51  266.80  262.71   401716112
G 3          336.06  266.51  347.35  265.39   814068344
R 4          342.13  347.29  342.98  343.13  1153665809
G 5          330.20  350.16  352.19  327.24   463334913

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。