为什么我没有在我的表中按字母顺序排序,在 R 中?只有 tidyverse

我正在尝试按字母顺序排列“吸烟状态”类别。这应该只适用于 tidyverse。

这是我尝试过的

smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
       dplyr::rename('Smoking Status' = smoking_status) %>%
       dplyr::arrange('Smoking status')
     smoking_gender_disch_piv_count_ren

正如你所看到的,我没有先得到当前吸烟者,然后是前吸烟者等。我认为在 dplyr 中安排功能会解决问题。但事实并非如此。

这是我拥有的数据:

structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", 
"Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"
), class = "factor"), Female = c(24.0601503759398, 9.02255639097744, 
35.3383458646617, 6.01503759398496, 25.5639097744361), Male = c(34.9753694581281, 
13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798
), NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 
24.0131578947368), STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
格雷格

除了拼写错误'Smoking Status''Smoking status',你遇到了其他两个问题。

变量名与字符串

我们使用单'引号( ) 或双引号 ( ") 来指定字符串:'my string'"my string"但是,要指定(不寻常的)变量名(符号)中带有空格,我们使用反引号( `): `my variable`由于键入这些反引号很麻烦,因此我们通常_在变量名称中使用下划线 ( ) 而不是空格。

(重新)命名列时,character字符串和符号一样好。那是

  # ... %>%
  dplyr::rename('Smoking Status' = smoking_status) # %>% ...
  #             |--------------|
  #             character string

相当于

  # ... %>%
  dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
  #             |--------------|
  #                  symbol

但是,当使用mutate()orfilter()执行矢量化操作时arrange(),任何字符串都将被视为简单的标character量值。那是

  # ... %>%
  mutate(test = 'Smoking Status') # %>% ...
  #             |--------------|
  #             character string

不会复制`Smoking Status`列(一factor

# A tibble: 5 x 6
  ... test                                
  ... <fct>                               
1 ... Ex smoker                           
2 ... Current smoker                      
3 ... Never smoked                        
4 ... Unknown                             
5 ... Non smoker - smoking history unknown

而是给你一个 ( character) 列填充文字字符串'Smoking Status'

# A tibble: 5 x 6
  ... test          
  ... <chr>         
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status

同样,你的

  # ... %>%
  dplyr::arrange('Smoking Status')
  #                       |----|
  #      Corrected typo: 'status'.

不在`Smoking Status`上排序,而是在填充了字符串的(临时)列上排序'Smoking Status'由于该列中的所有内容都相同,因此根本不会发生重新排列,并且smoking_gender_disch_piv_count数据集保持不变。

使固定

要解决此特定问题,请使用:

  # ... %>%
  dplyr::arrange(`Smoking Status`)

字符串与因素

即使解决了上述问题,您仍然会遇到问题。你的Smoking Status专栏是factor

[1] Ex smoker                            Current smoker                       Never smoked                         Unknown                              Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown

因此,当您对此列进行排序时,它会遵循factor级别的顺序,这些级别显然不是按字母顺序排列的。

使固定

要按字母顺序排序,请使用character形式`Smoking Status`

  # ... %>%
  dplyr::arrange(as.character(`Smoking Status`))

解决方案

鉴于smoking_gender_disch_piv_count您复制数据集

smoking_gender_disch_piv_count <-
  structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
                 Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
                 Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
                 NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
                 STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
            row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

以下dplyr工作流程

smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
  dplyr::rename(`Smoking Status` = smoking_status) %>%
  dplyr::arrange(as.character(`Smoking Status`))

会给你你想要的结果 smoking_gender_disch_piv_count_ren

# A tibble: 5 x 5
  `Smoking Status`                     Female  Male NSTEMI STEMI
  <fct>                                 <dbl> <dbl>  <dbl> <dbl>
1 Current smoker                         9.02 13.8   12.5   6.25
2 Ex smoker                             24.1  35.0   31.9  18.8 
3 Never smoked                          35.3  23.6   28.3  28.1 
4 Non smoker - smoking history unknown  25.6  25.6   24.0  40.6 
5 Unknown                                6.02  1.97   3.29  6.25

同时仍保留 中的factor信息`Smoking Status`

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章