这是我要解决的问题。我想将表1移到表2。
表格1 :
df
# icustay_id starttime endtime vaso_rate vaso_amount
# 1 1 2019-09-10 13:20:00 2019-09-11 13:20:00 3 293.0896
# 2 1 2019-09-11 13:30:00 2019-09-12 01:20:00 9 602.9983
# 3 1 2019-09-14 16:40:00 2019-09-15 16:40:00 4 208.9360
# 4 2 2019-09-10 12:40:00 2019-09-13 13:20:00 2 864.1494
# 5 3 2019-09-10 01:20:00 2019-09-11 13:20:00 9 405.2939
表2:
df
# icustay_id starttime endtime vaso_rate vaso_amount
# 1 1 2019-09-10 13:20:00 2019-09-12 01:20:00 3 293.0896
# 2 1 2019-09-14 16:40:00 2019-09-15 16:40:00 4 208.9360
# 3 2 2019-09-10 12:40:00 2019-09-13 13:20:00 2 864.1494
# 4 3 2019-09-10 01:20:00 2019-09-11 13:20:00 9 405.2939
如您所知:我正在尝试构建一个函数,该函数将:
为此,我决定添加另一个列标识符,当满足条件并验证所有行时,其值为1,groupby(icustay_id和该新列)
但是,我编写的代码没有为条件指定适当的ID。
这是示例df创建代码:
set.seed(1)
df <- data.frame(
icustay_id = c(1, 1, 1, 2, 3),
starttime = as.POSIXct(c("2019-09-10 13:20", "2019-09-11 13:30", "2019-09-14 16:40", "2019-09-10 12:40", "2019-09-10 01:20")),
endtime = as.POSIXct(c("2019-09-11 13:20", "2019-09-11 01:20", "2019-09-15 16:40", "2019-09-13 13:20", "2019-09-11 13:20")),
vaso_rate = sample(1:10, 5, replace = TRUE),
vaso_amount = runif(5, 0, 1000)
)
这是我现在拥有的功能代码:
merge_pressor_doses <- function(df){
df %>% arrange(icustay_id,starttime)
for (i in unique(df$icustay_id))
{
for (j in which(df$icustay_id==i))
{
start <- df$starttime[as.numeric(j)+1]
end <- df$endtime[as.numeric(j)]
stopduration <- as.numeric(difftime(start, end, units = 'mins'))
bool <- stopduration < 60
df <- df%>%mutate(
group = case_when(
bool = TRUE ~ 1,
bool = FALSE ~ 0)
)
}
}
return(df)
}
这应该导致:
df
# icustay_id starttime endtime vaso_rate vaso_amount group
# 1 1 2019-09-10 13:20:00 2019-09-11 13:20:00 3 293.0896 1
# 2 1 2019-09-11 13:30:00 2019-09-12 01:20:00 9 602.9983 1
# 3 1 2019-09-14 16:40:00 2019-09-15 16:40:00 4 208.9360 0
# 4 2 2019-09-10 12:40:00 2019-09-13 13:20:00 2 864.1494 1
# 5 3 2019-09-10 01:20:00 2019-09-11 13:20:00 9 405.2939 1
但是在我的情况下,第三行的值是1 ...
如果我能够使这部分代码正常工作,那么我可以继续进行这部分代码以实现我的目标。
该代码的最终第二部分将是:
group_by(group, icustay_id) %>%
summarise(
starttime = min(starttime),
endtime = max(endtime),
vaso_rate = mean(vaso_rate),
sum_vaso_amount = sum(vaso_amount))
先感谢您!!
我将创建一个新列pause
,该列显示自上次服药以来已经过了多少时间。然后使用此列为药物分配组ID:cumsum(pause >= 1)
-从0开始,然后如果暂停时间> = 1小时,则为另一个组。
set.seed(1)
df <- data.frame(
icustay_id = c(1, 1, 1, 2, 3),
starttime = as.POSIXct(c("2019-09-10 13:20", "2019-09-11 13:30", "2019-09-14 16:40", "2019-09-10 12:40", "2019-09-10 01:20")),
endtime = as.POSIXct(c("2019-09-11 13:20", "2019-09-11 01:20", "2019-09-15 16:40", "2019-09-13 13:20", "2019-09-11 13:20")),
vaso_rate = sample(1:10, 5, replace = TRUE),
vaso_amount = runif(5, 0, 1000)
)
library(dplyr)
library(tidyr)
df <-
df %>%
group_by(icustay_id) %>%
mutate(pause = difftime(starttime, lag(endtime), units = "hours")) %>%
replace_na(list(pause = 0)) %>%
mutate(vaso_id = cumsum(pause >= 1))
# A tibble: 5 x 7
# Groups: icustay_id [3]
# icustay_id starttime endtime vaso_rate vaso_amount pause vaso_id
# <dbl> <dttm> <dttm> <int> <dbl> <drtn> <int>
# 1 1 2019-09-10 13:20:00 2019-09-11 13:20:00 9 898. 0.0000000 hours 0
# 2 1 2019-09-11 13:30:00 2019-09-11 01:20:00 4 945. 0.1666667 hours 0
# 3 1 2019-09-14 16:40:00 2019-09-15 16:40:00 7 661. 87.3333333 hours 1
# 4 2 2019-09-10 12:40:00 2019-09-13 13:20:00 1 629. 0.0000000 hours 0
# 5 3 2019-09-10 01:20:00 2019-09-11 13:20:00 2 61.8 0.0000000 hours 0
然后,我们可以使用您提供的代码。
df %>%
group_by(icustay_id, vaso_id) %>%
summarise(
starttime = min(starttime),
endtime = max(endtime),
vaso_rate = mean(vaso_rate),
sum_vaso_amount = sum(vaso_amount)
)
# A tibble: 4 x 6
# Groups: icustay_id [3]
# icustay_id vaso_id starttime endtime vaso_rate sum_vaso_amount
# <dbl> <int> <dttm> <dttm> <dbl> <dbl>
# 1 1 0 2019-09-10 13:20:00 2019-09-11 13:20:00 6.5 1843.
# 2 1 1 2019-09-14 16:40:00 2019-09-15 16:40:00 7 661.
# 3 2 0 2019-09-10 12:40:00 2019-09-13 13:20:00 1 629.
# 4 3 0 2019-09-10 01:20:00 2019-09-11 13:20:00 2 61.8
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句