我想根据客户的性别、教育程度和默认付款状态绘制客户的详细信息。但是other
类别图显示的尺寸比其他条形图大。
# 数据链接“ https://archive.ics.uci.edu/ml/machine-learning-databases/00350/ ”
plot_data5 <- customer.data %>%
group_by(EDUCATION,SEX) %>%
mutate(group_size = n()) %>%
group_by(EDUCATION,SEX, DEFAULT_PAYMENT) %>%
summarise(perc = paste(round(n()*100/max(group_size), digits = 2),
"%", sep = ""))
ggplot(plot_data5, aes(x = plot_data5$EDUCATION, y = plot_data5$perc, fill = DEFAULT_PAYMENT))+
geom_bar(stat = "identity") +
geom_text(aes(label = plot_data5$perc),vjust=-.3) +
facet_wrap(DEFAULT_PAYMENT~SEX,scales = "free") +
theme(plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1)) +
labs(y = "% of Customer ") +
labs(x = "Default_Payment")
实际结果应该只是这些,但具有条形的真实大小和连续的 y 轴比例。
有没有必要重新指定要使用的数据帧aes
你的-call ggplot
。这会妨碍标签的正确分配。此外,由于您希望拥有连续的 y 轴,因此您需要将其perc
作为连续变量。
plot_data <- customer.data.small %>%
group_by(EDUCATION, SEX) %>%
mutate(group_size = n()) %>%
group_by(EDUCATION, SEX, DEFAULT_PAYMENT) %>%
summarise(perc = n()/max(group_size)) # Keep perc continuous
ggplot(plot_data, aes(x = EDUCATION, y = perc, fill = DEFAULT_PAYMENT)) +
geom_bar(stat = "identity") +
# Specify the labels with % and rounded in aes directly:
geom_text(aes(label = paste0(round(100*perc, 2), "%")), vjust = -.3) +
facet_wrap(DEFAULT_PAYMENT ~ SEX, scales = "free_y") +
# Use scales::percent to have percentages on the y-axis.
# Expand makes sure you can still read the labels
scale_y_continuous(labels = scales::percent, expand = c(0.075, 0)) +
theme(plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1)) +
labs(y = "% of Customer ") +
labs(x = "Default_Payment")
我发现数据的表示非常具有误导性!您将 x 轴标记为“Default_Payment”,尽管它显示EDUCATION
。从图中不清楚为什么每个分组的百分比加起来不是 100%,这让读者感到困惑。以下是如何改进情节的建议:
plot_data2 <- customer.data.small %>%
mutate_at(c("DEFAULT_PAYMENT", "EDUCATION", "SEX"), factor) %>%
group_by(EDUCATION, SEX) %>%
mutate(group_size = n()) %>%
group_by(EDUCATION, SEX, DEFAULT_PAYMENT) %>%
summarise(perc = n()/max(group_size))
ggplot(plot_data2, aes(x = EDUCATION, y = perc, fill = DEFAULT_PAYMENT)) +
geom_bar(stat = "identity",
position = position_dodge2(width = 0.9, preserve = "single")) +
geom_text(aes(label = paste0(round(100 * perc, 2), "%")),
vjust = -.3,
position = position_dodge(0.9)) +
facet_wrap( ~ SEX, labeller = label_both) +
scale_y_continuous(labels = scales::percent) +
theme(plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1)) +
labs(y = "% of Customer ") +
labs(x = "Education")
数据
我使用您以可重现格式提供的一小部分数据,每个人都可以复制并粘贴到他们自己的 R 会话中,而无需下载数据集。
customer.data.small <-
structure(list(ID = 1:100,
EDUCATION = c(2, 2, 2, 2, 2, 1, 1, 2, 3, 3, 3, 1, 2, 2, 1, 3, 1, 1, 1, 1, 3, 2, 2, 1, 1, 3, 1, 3, 3, 1, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 5, 2, 1, 3, 3, 2, 1, 1, 1, 3, 2, 1, 2, 3, 2, 1, 2, 2, 1, 2, 1, 3, 5, 1, 2, 2, 1, 1, 2, 3, 1, 2, 2, 3, 1, 3, 2, 3, 2, 1, 2, 1, 3, 1, 1, 1, 2, 2, 2, 1, 1, 3, 2),
SEX = c(2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 1),
DEFAULT_PAYMENT = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1)),
row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"))
这是我创建该数据的方式:
customer.data <- readxl::read_xls("default of credit card clients.xls", skip = 1)
customer.data.small <- customer.data %>%
select(ID, EDUCATION, SEX, DEFAULT_PAYMENT = `default payment next month`) %>%
slice(1:100)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句