Pandas - Dataframe 中列中重复值或列表值的列添加

Ira 发表于 Dev

爱尔兰共和军

假设一个数据框看起来像这样

  one  two 
a  1.0  1.0 
b  2.0  2.0 
c  3.0  3.0 
d  NaN  4.0

添加新的三列是这样的

df["three"] = df["one"] * df["two"]

结果

   one  two     three 
 a  1.0  1.0    1.0 
 b  2.0  2.0    4.0 
 c  3.0  3.0    9.0   
 d  NaN  4.0    NaN

包含重复列表或列表的列值怎么样，我需要创建一个新列并添加具有最高值的数字

例子

    one  two 
 a  1.0  [12,1]
         [12,1]
 b  2.0  2.0    
 c  3.0  3.0    
 d  NaN  4.0

所以我想要这样

    one  two        flag
 a  1.0  [12,1]      12
         [12,1]
 b  2.0  [200,400]   400
 c  3.0  3.0         3.0
 d  NaN  4.0         4.0

谢谢

耶兹瑞尔

如果有列表或嵌套列表或浮点数，您可以使用以下方法展平列表max：

df = pd.DataFrame({"two":  [[[12,1],[12,1]] ,[200,400] ,3.0,4.0 ]})
    
from typing import Iterable 
              
#https://stackoverflow.com/a/40857703/2901002
def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x
            
df['new'] = [max(flatten(x)) if isinstance(x, list) else x for x in df['two']]
print (df)
                  two    new
0  [[12, 1], [12, 1]]   12.0
1          [200, 400]  400.0
2                 3.0    3.0
3                 4.0    4.0

编辑：对于所有列的新 DataFrame 中的最大值，请使用聚合函数max：

df = df_orig.pivot_table(index=['keyword_name','volume'], 
                    columns='asin', 
                    values='rank', 
                    aggfunc=list)

df1 = df_orig.pivot_table(index=['keyword_name','volume'], 
                     columns='asin', 
                     values='rank', 
                     aggfunc='max')

out = pd.concat([df, df1.add_suffix('_max')], axis=1)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。