Why is geom_bar y-axis unproportional to actual numbers?

ChefKochNut

Sorry if this question already exists - was googling for a while now already and didn't find anything. I am relatively new to R and learning while doing all of this. I'm supposed to create some PDF via r markdown that analyses patient-data with specific main-diagnosis and secondary-diagnosis. For this I'm supposed to plot some numbers via ggplot (geom_bar and geom_boxplot).

So what I do so far is, I retrieve data-sets that include both codes via SQL and load them into data.table-objects afterwards. Afterwards I join them to get the data I need. After this I add columns that consist sub-strings of those codes and others that consist the count of those certain sub-strings (so I can plot the occurrences of every code). I wanted now for example to put certain data.table into a geom_bar or geom_boxplot and make it visible. This actually works, but my y-axis has a weird scale that doesn't fit the numbers it actually should show. The proportions of the bars are also not accurate.

For example: one diagnoses appears 600 times and the other one 1000 times. The y-axis shows steps of 0 - 500.000 - 1.000.000 - 1.500.000 - .... The Bar that shows 600 is super small and the bar with 1000 goes up to 1.500.000

If I create a new variable before and count what I need via count() and plot this it just works. The rows I put for the y-axis have in both variable the same datatype (integer)

So here is just how I create the data.table that I use for plotting

exazerbationsHdComorbiditiesNd <- allExazerbationsHd[allComorbiditiesNd, on="encounter_num", nomatch=0]
exazerbationsHdComorbiditiesNd <- exazerbationsHdComorbiditiesNd[, c("i.DurationGroup", "i.DurationInDays", "i.start_date", "i.end_date", "i.duration", "i.patient_num"):=NULL]
exazerbationsHdComorbiditiesNd[ , IcdHdCodeCount := .N, by = concept_cd]
exazerbationsHdComorbiditiesNd[ , IcdHdCodeClassCount := .N, by = IcdHdClass]

If I want to bar-plot now for example IcdHdClass by IcdHdCodeClassCount I do following:

ggplot(exazerbationsHdComorbiditiesNd, aes(exazerbationsHdComorbiditiesNd$IcdHdClass, exazerbationsHdComorbiditiesNd$IcdHdCodeClassCount, label=exazerbationsHdComorbiditiesNd$IcdHdCodeClassCount)) + geom_bar(stat = "identity") + geom_text(vjust = 0, size = 5)

It outputs said bar-plot with weird proportions. If I do first:

plotTest <- count(exazerbationsHdComorbiditiesNd, exazerbationsHdComorbiditiesNd$IcdHdClass)

And then bar-plot it:

ggplot(plotTest, aes(plotTest$`exazerbationsHdComorbiditiesNd$IcdHdClass`, plotTest$n, label=plotTest$n)) + geom_bar(stat = "identity") + geom_text(vjust = 0, size = 5)

Its all perfect and works. I checked also data-types of the columns I needed:

sapply(exazerbationsHdComorbiditiesNd, class)
sapply(plotTest, class)

In both variables the columns I need are of the type character and integer

Edit: Unfortunately I cant post images. So here are just the links to those. Here is a screenshot of the plot with wrong y-axis: https://ibb.co/CbxX1n7 And here is a screenshot of the plot shown right: https://ibb.co/Xb8gyx1

Here is some example-data that I copied out the data.table object: Exampledata

Mikko Marttila

Since you added the class counts as an additional column--rather than aggregating--what’s happening is that for each row in your data, the class counts get stacked on top of each other:

library(tidyverse)

set.seed(42)

df <- tibble(class = sample(letters[1:3], 10, replace = TRUE)) %>% 
  add_count(class, name = "count")

df # this is essentially what your data looks like
#> # A tibble: 10 x 2
#>    class count
#>    <chr> <int>
#>  1 a         5
#>  2 a         5
#>  3 a         5
#>  4 a         5
#>  5 b         3
#>  6 b         3
#>  7 b         3
#>  8 a         5
#>  9 c         2
#> 10 c         2

ggplot(df, aes(class, count)) + geom_bar(stat = "identity")

You could use position = "identity" so that the bars don’t get stacked:

ggplot(df, aes(class, count)) +
  geom_bar(stat = "identity", position = "identity")

However, that creates a whole bunch of unnecessary layers in your plot that you can’t see. A better approach would be to drop the extra rows from your data before plotting:

df %>%
  distinct(class, count)
#> # A tibble: 3 x 2
#>   class count
#>   <chr> <int>
#> 1 a         5
#> 2 b         3
#> 3 c         2

df %>% 
  distinct(class, count) %>%
  ggplot(aes(class, count)) +
  geom_bar(stat = "identity")

Created on 2019-09-05 by the reprex package (v0.3.0.9000)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

geom_bar + geom_line: with different y-axis scale?

geom_bar tied exactly to x and y axis (without aggregating)

Scale geom_density to match geom_bar with percentage on y

Show percent of total on top of geom_bar in ggplot2 while showing counts on y axis

How to color geom_bar by y-axis values?

Plot ratio of geom_bar on second y-axis

ggplotly with geom_bar shows wrong y-axis values when moving the cursor onto the bar

scaling x and y axis (geom_bar)

Flip X & Y coordinates for geom_bar

Google Chart: Y-Axis numbers not visible

Why do geom_bar bars come from the y-axis and not the x-axis?

ggplot geom_bar with separate grouped variables on the x axis

How to create 5x1 matrix geom_bar facet and allowing negative value in bar to go down y-axis in ggplot?

Adjusting y axis origin for stacked geom_bar in ggplot2

Why is geom_bar not splitting categorical variables

Breaking y-axis in ggplot2 with geom_bar

Labelling geom_bar plot at a fixed distance from y-axis in ggplot

changing y axis numbers in ggplot2

R geom_bar not aligning with X axis

change decimal on y axis, geom_bar, "accuracy" not working

R geom_bar: How can I return the Y-axis title to my graph?

bar chart R geom_bar changing legend and x axis

R ggplot2 geom_bar y-axis sequence similar to respective column values sequence

In ggplot geom_bar, the y axis labels are cluttered at the bottom of bar chart

geom_bar(), Y-axis goes way above data value

order y axis by count of one particular value in column with geom_bar

R & ggplot2: 100% geom_bar + geom_line for average using secondary y axis

Tics on y axis in geom_bar not showing up

ggplot geom_bar: Reversing Y axis but I would like to have bars coming from down-up not from top-down