如何在python中并排绘制堆叠的条形图？（最好是seaborn）

威利安班

我正在寻找一种方法来绘制并排堆积的条形图，以比较我的数据框中每个国家的阳性（条件== True）宿主总数和总病例数。

这是DataFrame的示例。

id  Location    Host    genus_name  #ofGenes    Condition
1   Netherlands Homo sapiens    Escherichia 4.0 True
2   Missing Missing Klebsiella  3.0 True
3   Missing Missing Aeromonas   2.0 True
4   Missing Missing Glaciecola  2.0 True
5   Antarctica  Missing Alteromonas 2.0 True
6   Indian Ocean    Missing Epibacterium    2.0 True
7   Missing Missing Klebsiella  2.0 True
8   China   Homo sapiens    Escherichia 0   False
9   Missing Missing Escherichia 2.0 True
10  China   Plantae kingdom Pantoea 0   False
11  China   Missing Escherichia 2.0 True
12  Pacific Ocean   Missing Halomonas   0   False

我需要类似于波纹管的东西，但我想按百分比绘制。

在此处输入图片说明

谁能帮我？

Cheng

我想您想要的是堆积的分类条形图，无法使用seaborn直接绘制。但是您可以通过自定义一个来实现。

导入一些必要的软件包。

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.patches import Patch

读取数据集。考虑到您的样本数据太小，我随机生成了一些样本以使绘图看起来不错。

def gen_fake_data(data, size=400):
    unique_values = []
    for c in data.columns:
        unique_values.append(data[c].unique())
    new_data = pd.DataFrame({c: np.random.choice(unique_values[i], size=size)
                            for i, c in enumerate(data.columns)})
    new_data = pd.concat([data, new_data])
    new_data['id'] = new_data.index + 1
    return new_data

data = pd.read_csv('data.csv')
new_data = gen_fake_data(data)

定义堆叠的分类条形图

def stack_catplot(x, y, cat, stack, data, palette=sns.color_palette('Reds')):
    ax = plt.gca()
    # pivot the data based on categories and stacks
    df = data.pivot_table(values=y, index=[cat, x], columns=stack, 
                          dropna=False, aggfunc='sum').fillna(0)
    ncat = data[cat].nunique()
    nx = data[x].nunique()
    nstack = data[stack].nunique()
    range_x = np.arange(nx)
    width = 0.8 / ncat # width of each bar
    
    for i, c in enumerate(data[cat].unique()):
        # iterate over categories, i.e., Conditions
        # calculate the location of each bar
        loc_x = (0.5 + i - ncat / 2) * width + range_x
        bottom = 0
        for j, s in enumerate(data[stack].unique()):
            # iterate over stacks, i.e., Hosts
            # obtain the height of each stack of a bar
            height = df.loc[c][s].values
            # plot the bar, you can customize the color yourself
            ax.bar(x=loc_x, height=height, bottom=bottom, width=width, 
                   color=palette[j + i * nstack], zorder=10)
            # change the bottom attribute to achieve a stacked barplot
            bottom += height
    # make xlabel
    ax.set_xticks(range_x)
    ax.set_xticklabels(data[x].unique(), rotation=45)
    ax.set_ylabel(y)
    # make legend
    plt.legend([Patch(facecolor=palette[i]) for i in range(ncat * nstack)], 
               [f"{c}: {s}" for c in data[cat].unique() for s in data[stack].unique()],
               bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
    plt.grid()
    plt.show()

让我们情节吧！

plt.figure(figsize=(6, 3), dpi=300)
stack_catplot(x='Location', y='#ofGenes', cat='Condition', stack='Host', data=new_data)

如果要以百分比表示，请在原始数据集中进行计算。

total_genes = new_data.groupby(['Location', 'Condition'], as_index=False)['#ofGenes'].sum().rename(
    columns={'#ofGenes': 'TotalGenes'})
new_data = new_data.merge(total_genes, how='left')
new_data['%ofGenes'] = new_data['#ofGenes'] / new_data['TotalGenes'] * 100

plt.figure(figsize=(6, 3), dpi=300)
stack_catplot(x='Location', y='%ofGenes', cat='Condition', stack='Host', data=new_data)