分组和聚合熊猫DataFrame以获取摘要DataFrame

万能

我有以下详细的DataFrame：

资源：

df_detailed = pd.DataFrame([
    ["Fail", "P1", "3 Failed Partition","X001, X002, X003"],
    ["Fail","P1","Late Backup","Late Backup"],
    ["Fail","P1","2 Failed Partition","X001, X002"],
    ["Fail","P2","2 Failed Partition","X001, X002"],
    ["Fail","P2","Late Backup","Late Backup"],
    ["Warn","P2","Huge Size","1GB"],
    ["Warn","P2","Huge Size","2GB"]
], columns = ["Severity", "Partition", "Status", "Comment"])

输出：

  Severity Partition              Status           Comment
0     Fail        P1  3 Failed Partition  X001, X002, X003
1     Fail        P1         Late Backup       Late Backup
2     Fail        P1  2 Failed Partition        X001, X002
3     Fail        P2  2 Failed Partition        X001, X002
4     Fail        P2         Late Backup       Late Backup
5     Warn        P2           Huge Size               1GB
6     Warn        P2           Huge Size               2GB

我想对其进行分组和汇总，并得到以下结果：

结果：

  Partition                                     Status
0        P1          3 Failed Partition, 2 Late Backup
1        P2  2 Failed Partition, 1 Late Backup, 2 Warn

注意：

关键字“ Late Backup”，“ Failed Partition”，“ Huge Size”是静态的，不会更改。
所有带有“失败”的严重性在摘要DataFrame中都应具有详细信息。
所有其他严重性，例如“警告”，“信息”等，应仅包含预期结果示例中的严重性计数
“详细数据帧”中的“失败分区”以“失败次数”作为前缀，但是在“摘要”中，每个分区（即P1，P2）的分区唯一值的计数都必须在“摘要数据帧”中出现

有人可以帮忙吗，我已经两天没睡了了:(

Artiom Kozyrev

感谢您完成有趣的任务，问题已解决，请在下面找到解决方案并关注评论，随时提出问题。

import pandas as pd
from collections import Counter

df_detailed = pd.DataFrame([
    ["Fail", "P1", "3 Failed Partition", "X001, X002, X003"],
    ["Fail", "P1", "Late Backup", "Late Backup"],
    ["Fail", "P1", "2 Failed Partition", "X001, X002"],
    ["Fail", "P2", "2 Failed Partition", "X001, X002"],
    ["Fail", "P2", "Late Backup", "Late Backup"],
    ["Warn", "P2", "Huge Size", "1GB"],
    ["Warn", "P2", "Huge Size", "2GB"]
], columns=["Severity", "Partition", "Status", "Comment"])


def change_warn(severity, status):
    """To create a new column where we remove real Status with just Warn message"""
    if severity == "Warn":
        return "Warn"
    else:
        return status


df_detailed["Status"] = df_detailed.apply(lambda row: change_warn(row["Severity"], row["Status"]), axis=1)


def remove_leading_digits(x):
    if x[0].isdigit():
        x = " ".join(x.split(" ")[1:])
    return x


df_detailed["Status"] = df_detailed["Status"].apply(lambda x: remove_leading_digits(x))

df_detailed["Comment"] = df_detailed["Comment"].apply(lambda x: x + ",")  # we need it since we will sum the columns then

# need to combine to distinguish P1 from P2:
df_detailed["TempStatus"] = df_detailed["Partition"] + " " + df_detailed["Status"]

gr_b = df_detailed[["Partition", "TempStatus", "Comment"]].groupby("TempStatus").sum()


def calculate_unique_comment(status, comment):
    comments = []
    if status.endswith("Failed Partition"):
        for c in comment.split(","):
            if c != "":
                comments.append(c.strip())
        counter = Counter(comments)
        return str(len(counter.keys()))
    else:
        return str(0)


del gr_b["Partition"]  # do not need it

gr_b = gr_b.reset_index()  # otherwise get problem

gr_b["CountUnCom"] = gr_b.apply(lambda row: calculate_unique_comment(row["TempStatus"], row["Comment"]), axis=1)

# let's find of unique comments per Partion for Failed partition and put them in dict
part_dict = {}
for i in range(len(gr_b)):
    if gr_b["TempStatus"][i].endswith("Failed Partition"):
        part_dict[gr_b["TempStatus"][i]] = gr_b["CountUnCom"][i]


# let's take only what we need to work with
df_small = pd.DataFrame(df_detailed[["Partition", "Status"]])

df_small["Status"] = df_small["Status"].apply(lambda x: x + ",")  # to sum and split later

gr_df_small = df_small.groupby("Partition").sum()

gr_df_small = gr_df_small.reset_index()


def convert_status_to_list(status):
    new_status = []
    for c in status.split(","):
        if c != "":
            new_status.append(c.strip())
    return new_status


gr_df_small["Status"] = gr_df_small["Status"].apply(lambda x: convert_status_to_list(x))


def calculate_status(partition, status, x):
    result = []
    for k, v in Counter(status).items():
        if k == "Failed Partition":
            v = x[partition + " " + "Failed Partition"]
        result.append(f"{v} {k}")
    return " ".join(result)


gr_df_small["Status"] = gr_df_small.apply(lambda row: calculate_status(row["Partition"], row["Status"], part_dict),  axis=1)


print(gr_df_small)

输出：

  Partition                                   Status
0        P1         3 Failed Partition 1 Late Backup
1        P2  2 Failed Partition 1 Late Backup 2 Warn

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-21

我来说两句

0 条评论

登录后参与评论

上一篇：如何迅速将核心数据管理对象数组转换为“可识别”列表？（Xcode 11，Beta 5）

分组和聚合熊猫DataFrame以获取摘要DataFrame

分组和聚合熊猫DataFrame以获取摘要DataFrame

Qt Creator Windows 10 - “使用 jom 而不是 nmake”不起作用

使用next.js时出现服务器错误，错误：找不到react-redux上下文值；请确保组件包装在<Provider>中

SQL Server中的非确定性数据类型

Swift 2.1-对单个单元格使用UITableView

如何避免每次重新编译所有文件？

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

应用发明者仅从列表中选择一个随机项一次

在 Avalonia 中是否有带有柱子的 TreeView 或类似的东西？

HttpClient中的角度变化检测

在Wagtail管理员中，如何禁用图像和文档的摘要项？

如何了解DFT结果

Camunda-根据分配的组过滤任务列表

错误：找不到存根。请确保已调用spring-cloud-contract：convert

为什么此后台线程中未处理的异常不会终止我的进程？

构建类似于Jarvis的本地语言应用程序

使用分隔符将成对相邻的数组元素相互连接

您如何通过 Nativescript 中的 Fetch 发出发布请求？

通过iwd从Linux系统上的命令行连接到wifi（适用于Linux的无线守护程序）

使用React / Javascript在Wordpress API中通过ID获取选择的多个帖子/页面

使用 text() 獲取特定文本節點的 XPath