Plotting values in ggplot over time by group, with condition where one group is one line of the mean, and the other group as individual lines in R

Tom

I have a dataset with alcohol treatment rates for each state for each year from 2010 to 2015. Five of these states received an intervention and the rest did not. I would like to plot the treatment rates for each intervention state as a separate line and the non-intervention states (grouped as one line using the mean) on the same graph.

I would like to do this using ggplot in R. I have the following code below which graphs the treatment rates for each state independently, however, I am having trouble formatting the grouping variable to meet the condition I described above by including the intervention variable with the state variable. Any help would be appreciated. Thank you in advance!

I'm fairly new to R, so I hope I am explaining this correctly. The dataset is saved as a list, and below is some dummy data showing a snippet of the structure.

year    state   Intervention    rate
2010    Alabama 0   0.006575294
2011    Alabama 0   0.002244153
2012    Alabama 0   0.002519527
2013    Alabama 0   0.00333051
2014    Alabama 0   0.002385317
2015    Alabama 0   0.003080964
2010    Alaska  1   0.00338454
2011    Alaska  1   0.003457992
2012    Alaska  1   0.002784511
2013    Alaska  1   0.00356925
2014    Alaska  1   0.004599099
2015    Alaska  1   0.004204394
2010    Arizona 0   0.002336875
2011    Arizona 0   0.002808161
2012    Arizona 0   0.00299025
2013    Arizona 0   0.0022956
ggplot(data = data, aes(x = year, y = treatment_rate, group= state))+
  geom_line()

teunbrand

Probably the easiest way is to separate the data based on the status of Intervention. I've generated a somewhat larger dummy dataset that should have a similar shape to the data you provided.

library(ggplot2)

set.seed(1234)

states <- rownames(USArrests)
intervened <- sample(states, 10)

df <- expand.grid(year = 2010:2015, state = states)
df$Intervention <- as.numeric(df$state %in% intervened)
df$rate <- cumsum(rnorm(nrow(df)))
head(df)
#>   year   state Intervention      rate
#> 1 2010 Alabama            0 -0.574740
#> 2 2011 Alabama            0 -1.121372
#> 3 2012 Alabama            0 -1.685824
#> 4 2013 Alabama            0 -2.575862
#> 5 2014 Alabama            0 -3.053054
#> 6 2015 Alabama            0 -4.051441

It's easier to separate the data if you need to handle these seperately while plotting. You can subset the data in the data argument of a layer. As I understood you wanted to plot states with Intervention == 1 individually, so we do that with the regular geom_line(). Then, we want to summarize all states with Intervention == 0 and to do that we use the stat_summary() function. We need to set a common group for the summarised data as we want to summarise over different states.

ggplot(df, aes(x = year, y = rate, group = state)) +
  geom_line(
    data = ~ subset(., Intervention == 1),
    aes(colour = state)
  ) +
  stat_summary(
    data = ~ subset(., Intervention == 0),
    aes(group = -1),
    fun.data = mean_se,
    geom = "line", size = 2
  )

Created on 2021-02-24 by the reprex package (v1.0.0)

Follow up:

You'd need to repeat the stat_summary() layer for every geom. For example: adding a ribbon with mean +/- sd values:

  stat_summary(
    data = ~ subset(., Intervention == 0),
    aes(group = -1),
    fun.data = function(x) {
      mx <- mean(x)
      sd <- sd(x)
      data.frame(
        ymin = mx - sd,
        ymax = mx + sd
      )
    },
    geom = "ribbon", alpha = 0.5
  )

You can replace "ribbon" with "errorbar" if you prefer that.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

ggplot line plot with one group`s lines on top

ggplot2 each group consists of only one observation-plotting two lines on one graph

ggplot2: solid line for one group, points for the other

R: ggplot with 6 different groups: Have 5 solid lines and for one group a dashed line

add values of one group into another group in R

Pandas groupby where one of the group values is in a range

Converting group of values to one line by either pivot or matrix by group

Compare one variable to other variables by group in R

MySQL - Group by one value and concat other values in one column

python multiple lines as one group

Spark - Group on one Column and find Mean of other colums

looking over previous and subsequent rows where condition is met by group R

GROUP BY Create group if at least one value in group meets condition

Python group multiple lines into one vector line by a specific column value

Pandas Replace all values of column with the mean of only one group

GROUP BY and WHERE clause in one query

sed: replace one or more group values in lines that match

R: Calculate sum over a column based on groups for panel data where one group has no data

In ArrayList, group by a key and perform summation over one of the values

How to group values of one column based on condition in t-sql?

ggplot2 bar-chart order by values of one group

Using ggplot to map mean values by group

Pandas group by one column concatenate values of other column as delimited list

Compare one value with other values within a group SQL Server

Plotting multiple lines on a chart: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?

Filter dataframe within a group with one column meeting an AND condition in R

Plotting group distances in R

Combine groups into one group to display in boxplot (ggplot2, R)

Calculation by group in one column R