在 R 中重塑一组列的多个副本 Wide > Long 和 Long > Wide

mmarks 发表于 Dev

标记

我正在从 STATA 转换为 R。我正在努力复制的一件事是 STATA 中的重塑功能。在 STATA 中，大致可以通过以下方式完成：

reshape wide variable names, i(Unique person ID) j(ID identifying each entry per unique ID i)

我有一个包含患者使用的静脉内管线的数据集（现在附上样本）。数据目前为长（每行 1 行）。对于每一行，您会看到有几列；行类型、插入日期、移除日期等。

我想了解如何最好地将布局 1 重塑为宽，将布局 2 重新塑造为长。每个患者都有一个唯一的 ID。我可以用唯一的 ID 标记每个人的每一行（即 ID_Var 1:n 中的行数）。下面是宽/长所需布局的示例。

这感觉与我在 Stackoverflow 上看到的 Reshaping 示例不同（并且在 dplyr 备忘单中进行了描述）-因为通常它们会根据 line 列中的值进行重塑-并且您将创建一个名为peripheral 和从插入的值中取出值并将其放在外围列中，然后创建另一个名为 Hickman 的列，然后将插入的值放入该列中等等。典型的 DPLYR 示例（不是这里的目标）

我很高兴有一个 base 或 dplyr（或实际上替代）解决方案......我试过在 R::base 中使用 reshape 并查看了 dplyr 中的传播但无法正常工作？因为我我试图一步完成这一切（这就是我在 STATA 中所做的）。

例如我试过

lines_wide <- reshape(lines,idvar=lines$Episode,timevar=lines$n,direction="wide")

但我得到：[.data.frame(data, , idvar) 中的错误：选择了未定义的列

我还尝试使用 v.names 指定要重塑的特定元素：

lines_wide <- reshape(lines,idvar=lines$Episode,timevar=lines$n,direction="wide", v.names = list(lines$Site,lines$Line.Type,lines$Removal.Reason))

但我得到了同样的错误。

长数据集的示例如下：https : //www.dropbox.com/s/h0lo910ix304qj3/reshape_example.xlsx?dl=0

蒂诺

你真的应该至少提供你的数据......无论如何，这里有一个tidyverse-solution，使用tidyr和dplyr：

library(tidyverse)

df <- tribble(~patient, ~line, ~inserted, ~complications,
          1,"peripheral", "01/02/17", "none",
          1,"peripheral", "04/02/17", "blocked")

# this prefix preserves the order of your variables:
names(df)[-1] <- c("[1]line", "[2]inserted", "[3]complications")

df_wide <- 
  df %>% 
  group_by(patient) %>% 
  mutate(nr = seq_len(n())) %>% # number of observations for each patient
  gather(key = key, value = value, -patient, -nr) %>% # make Long
  arrange(nr, key) %>% # sort by nr and variable name to keep you order
  unite(key, nr, key, sep = ".") %>% # paste variable number and variable name
  mutate(key = factor(key, levels = unique(key))) %>% # tells spread to preserve order
  spread(key = key, value = value) # make wide again

# remove the prefix from above
names(df_wide) <- names(df_wide) %>% 
               gsub(pattern = "\\[\\d{1}\\]", 
               replacement = "")

df_wide

  patient   `1.line` `1.inserted` `1.complications`   `2.line` `2.inserted` `2.complications`
*   <dbl>      <chr>        <chr>             <chr>      <chr>        <chr>             <chr>
1       1 peripheral     01/02/17              none peripheral     04/02/17           blocked

反过来说：

df_long <- 
  df_wide %>%
  gather(key = key, value = value, -patient) %>% 
  separate(key, into = c("nr", "key")) %>% 
  spread(key = key, value = value) %>% 
  select(patient, line, inserted, complications)

df_long

  patient       line inserted complications
*   <dbl>      <chr>    <chr>         <chr>
1       1 peripheral 01/02/17          none
2       1 peripheral 04/02/17       blocked

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-06-15

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

在 R 中重塑一组列的多个副本 Wide > Long 和 Long > Wide

在 R 中重塑一组列的多个副本 Wide > Long 和 Long > Wide

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException