在t.test
函数的非公式签名中t.test(x, y, paired=T)
,我假设数据在两个输入(文档中的x和y)中按有序配对。
但是,在公式签名中t.test(values ~ groups, df, paired=T)
,函数如何将两组中的观察结果成对关联?按命令?
在下面的reprex中,我创建了一个在数据前后配对的数据框。然后,我通过t.test
两种方式将其以长格式(适合该功能)放置:1)按观察顺序列出“之前”组,然后按观察顺序列出“之后”组。2)以无特定顺序列出所有数据。
我对两个数据集都进行了配对t检验。很明显,在情况2中,该函数绝对无法知道哪个“之后”观察与哪个“之前”观察相伴。我是否可以假定t.test
函数可以理解情况1中输入的数据,即“之前”和“之后”数据都是按观察顺序排列的?
我在文档或任何在线示例中都找不到关于此的任何信息。由于没有用于链接两组观察值的键的参数,因此该t.test
函数正在做出某种假设。
library(tidyverse)
df = data.frame(
observation = 1:20,
before = rnorm(20, 10, 2),
after = rnorm(20, 10.2, 2.3)
)
print.data.frame(df)
#> observation before after
#> 1 1 10.930157 11.818216
#> 2 2 10.870749 10.699232
#> 3 3 9.603120 14.384484
#> 4 4 9.615291 8.777045
#> 5 5 6.714043 9.506421
#> 6 6 9.063117 5.574887
#> 7 7 8.152260 10.357455
#> 8 8 8.256237 8.660646
#> 9 9 12.641977 7.511760
#> 10 10 11.010290 9.391047
#> 11 11 12.545197 9.072856
#> 12 12 12.606526 9.110687
#> 13 13 8.659088 12.445071
#> 14 14 8.958959 10.783168
#> 15 15 11.635443 6.926802
#> 16 16 6.922437 12.419453
#> 17 17 10.326176 10.416757
#> 18 18 7.680960 9.836573
#> 19 19 9.458365 8.083777
#> 20 20 7.235837 12.094290
df_long =
df %>%
pivot_longer(
cols = c("before", "after"),
names_to = "time",
values_to="fabulousness"
)
print.data.frame(df_long)
#> observation time fabulousness
#> 1 1 before 10.930157
#> 2 1 after 11.818216
#> 3 2 before 10.870749
#> 4 2 after 10.699232
#> 5 3 before 9.603120
#> 6 3 after 14.384484
#> 7 4 before 9.615291
#> 8 4 after 8.777045
#> 9 5 before 6.714043
#> 10 5 after 9.506421
#> 11 6 before 9.063117
#> 12 6 after 5.574887
#> 13 7 before 8.152260
#> 14 7 after 10.357455
#> 15 8 before 8.256237
#> 16 8 after 8.660646
#> 17 9 before 12.641977
#> 18 9 after 7.511760
#> 19 10 before 11.010290
#> 20 10 after 9.391047
#> 21 11 before 12.545197
#> 22 11 after 9.072856
#> 23 12 before 12.606526
#> 24 12 after 9.110687
#> 25 13 before 8.659088
#> 26 13 after 12.445071
#> 27 14 before 8.958959
#> 28 14 after 10.783168
#> 29 15 before 11.635443
#> 30 15 after 6.926802
#> 31 16 before 6.922437
#> 32 16 after 12.419453
#> 33 17 before 10.326176
#> 34 17 after 10.416757
#> 35 18 before 7.680960
#> 36 18 after 9.836573
#> 37 19 before 9.458365
#> 38 19 after 8.083777
#> 39 20 before 7.235837
#> 40 20 after 12.094290
df_long_not_paired =
df_long %>%
arrange(fabulousness)
print.data.frame(df_long_not_paired)
#> observation time fabulousness
#> 1 6 after 5.574887
#> 2 5 before 6.714043
#> 3 16 before 6.922437
#> 4 15 after 6.926802
#> 5 20 before 7.235837
#> 6 9 after 7.511760
#> 7 18 before 7.680960
#> 8 19 after 8.083777
#> 9 7 before 8.152260
#> 10 8 before 8.256237
#> 11 13 before 8.659088
#> 12 8 after 8.660646
#> 13 4 after 8.777045
#> 14 14 before 8.958959
#> 15 6 before 9.063117
#> 16 11 after 9.072856
#> 17 12 after 9.110687
#> 18 10 after 9.391047
#> 19 19 before 9.458365
#> 20 5 after 9.506421
#> 21 3 before 9.603120
#> 22 4 before 9.615291
#> 23 18 after 9.836573
#> 24 17 before 10.326176
#> 25 7 after 10.357455
#> 26 17 after 10.416757
#> 27 2 after 10.699232
#> 28 14 after 10.783168
#> 29 2 before 10.870749
#> 30 1 before 10.930157
#> 31 10 before 11.010290
#> 32 15 before 11.635443
#> 33 1 after 11.818216
#> 34 20 after 12.094290
#> 35 16 after 12.419453
#> 36 13 after 12.445071
#> 37 11 before 12.545197
#> 38 12 before 12.606526
#> 39 9 before 12.641977
#> 40 3 after 14.384484
df_long_paired =
df_long %>%
arrange(desc(time))
print.data.frame(df_long_paired)
#> observation time fabulousness
#> 1 1 before 10.930157
#> 2 2 before 10.870749
#> 3 3 before 9.603120
#> 4 4 before 9.615291
#> 5 5 before 6.714043
#> 6 6 before 9.063117
#> 7 7 before 8.152260
#> 8 8 before 8.256237
#> 9 9 before 12.641977
#> 10 10 before 11.010290
#> 11 11 before 12.545197
#> 12 12 before 12.606526
#> 13 13 before 8.659088
#> 14 14 before 8.958959
#> 15 15 before 11.635443
#> 16 16 before 6.922437
#> 17 17 before 10.326176
#> 18 18 before 7.680960
#> 19 19 before 9.458365
#> 20 20 before 7.235837
#> 21 1 after 11.818216
#> 22 2 after 10.699232
#> 23 3 after 14.384484
#> 24 4 after 8.777045
#> 25 5 after 9.506421
#> 26 6 after 5.574887
#> 27 7 after 10.357455
#> 28 8 after 8.660646
#> 29 9 after 7.511760
#> 30 10 after 9.391047
#> 31 11 after 9.072856
#> 32 12 after 9.110687
#> 33 13 after 12.445071
#> 34 14 after 10.783168
#> 35 15 after 6.926802
#> 36 16 after 12.419453
#> 37 17 after 10.416757
#> 38 18 after 9.836573
#> 39 19 after 8.083777
#> 40 20 after 12.094290
df_long_not_paired %>%
t.test(fabulousness ~ time, ., paired=T)
#>
#> Paired t-test
#>
#> data: fabulousness by time
#> t = 2.0289, df = 19, p-value = 0.05672
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -0.007878376 0.506318062
#> sample estimates:
#> mean of the differences
#> 0.2492198
df_long_paired %>%
t.test(fabulousness ~ time, ., paired=T)
#>
#> Paired t-test
#>
#> data: fabulousness by time
#> t = 0.3422, df = 19, p-value = 0.736
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -1.27509 1.77353
#> sample estimates:
#> mean of the differences
#> 0.2492198
由reprex软件包(v0.3.0)创建于2020-11-24
当我多次运行此命令时,在我扰乱了组顺序的情况下,经常会看到误报。
因此,要找出确切的完成方式,我们可以看一下源代码。
stats:::t.test.formula
给我们:
g <- factor(mf[[-response]])
哪里mf
是模型框架,response
是响应变量。g
然后是公式(LHS)中的分组变量。然后,稍后,我们将看到一个对象的创建,该对象DATA
是mf
根据分组变量进行拆分的g
。然后将这些数据传递到stats:::t.test.default
而不更改任何顺序。
DATA <- setNames(split(mf[[response]], g), c("x", "y"))
然后stats:::t.test.default
,我们可以研究,重点放在paired
提到数据的地方。
if (paired) { x <- x - y y <- NULL } nx <- length(x) mx <- mean(x) vx <- var(x)
从中我们可以看到,t.test.default
只需计算对之间的差异,然后对差异进行单样本t检验。
从所有这些一起我们可以理解,观察结果的顺序必须正确,以便具有正确的对。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句