我正在尝试按字母顺序排列“吸烟状态”类别。这应该只适用于 tidyverse。
这是我尝试过的
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename('Smoking Status' = smoking_status) %>%
dplyr::arrange('Smoking status')
smoking_gender_disch_piv_count_ren
正如你所看到的,我没有先得到当前吸烟者,然后是前吸烟者等。我认为在 dplyr 中安排功能会解决问题。但事实并非如此。
这是我拥有的数据:
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker",
"Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"
), class = "factor"), Female = c(24.0601503759398, 9.02255639097744,
35.3383458646617, 6.01503759398496, 25.5639097744361), Male = c(34.9753694581281,
13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798
), NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053,
24.0131578947368), STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
除了拼写错误'Smoking Status'
的'Smoking status'
,你遇到了其他两个问题。
我们使用单'
引号( ) 或双引号 ( "
) 来指定字符串:'my string'
或"my string"
。但是,要指定(不寻常的)变量名(符号)中带有空格,我们使用反引号( `
): `my variable`
。由于键入这些反引号很麻烦,因此我们通常_
在变量名称中使用下划线 ( ) 而不是空格。
当(重新)命名列时,character
字符串和符号一样好。那是
# ... %>%
dplyr::rename('Smoking Status' = smoking_status) # %>% ...
# |--------------|
# character string
相当于
# ... %>%
dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
# |--------------|
# symbol
但是,当使用mutate()
orfilter()
或执行矢量化操作时arrange()
,任何字符串都将被视为简单的标character
量值。那是
# ... %>%
mutate(test = 'Smoking Status') # %>% ...
# |--------------|
# character string
会不会复制`Smoking Status`
列(一factor
)
# A tibble: 5 x 6
... test
... <fct>
1 ... Ex smoker
2 ... Current smoker
3 ... Never smoked
4 ... Unknown
5 ... Non smoker - smoking history unknown
而是给你一个 ( character
) 列填充文字字符串'Smoking Status'
:
# A tibble: 5 x 6
... test
... <chr>
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status
同样,你的
# ... %>%
dplyr::arrange('Smoking Status')
# |----|
# Corrected typo: 'status'.
不在`Smoking Status`
列上排序,而是在填充了字符串的(临时)列上排序'Smoking Status'
。由于该列中的所有内容都相同,因此根本不会发生重新排列,并且smoking_gender_disch_piv_count
数据集保持不变。
要解决此特定问题,请使用:
# ... %>%
dplyr::arrange(`Smoking Status`)
即使解决了上述问题,您仍然会遇到问题。你的Smoking Status
专栏是factor
[1] Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
因此,当您对此列进行排序时,它会遵循factor
级别的顺序,这些级别显然不是按字母顺序排列的。
要按字母顺序排序,请使用列的character
形式`Smoking Status`
:
# ... %>%
dplyr::arrange(as.character(`Smoking Status`))
鉴于smoking_gender_disch_piv_count
您复制的数据集
smoking_gender_disch_piv_count <-
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
以下dplyr
工作流程
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename(`Smoking Status` = smoking_status) %>%
dplyr::arrange(as.character(`Smoking Status`))
会给你你想要的结果 smoking_gender_disch_piv_count_ren
# A tibble: 5 x 5
`Smoking Status` Female Male NSTEMI STEMI
<fct> <dbl> <dbl> <dbl> <dbl>
1 Current smoker 9.02 13.8 12.5 6.25
2 Ex smoker 24.1 35.0 31.9 18.8
3 Never smoked 35.3 23.6 28.3 28.1
4 Non smoker - smoking history unknown 25.6 25.6 24.0 40.6
5 Unknown 6.02 1.97 3.29 6.25
同时仍保留 中的factor
信息`Smoking Status`
。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句