我df
在列的数据框中有语音数据Orthographic
:
df <- data.frame(
Orthographic = c("this is it at least probably",
"well not probably it's not intuitive",
"sure no it's I mean it's very intuitive",
"I don't mean to be rude but it's anything but you know",
"well okay maybe"),
Repeat = c(NA, "probably", "it's,intuitive", "I,mean,it's", NA),
Repeat_pattern = c(NA, "\\b(probably)\\b", "\\b(it's|intuitive)\\b", "\\b(I,mean|it's)\\b",
NA))
我想要filter
基于动态模式的行,即在列中列出的任何单词之前出现no
, never
,not
作为单词 OR 。但是,将模式与column 中的交替模式一起使用,我收到此错误:n't
Repeat
\\b(no|never|not)\\b|n't\\b\\s
Repeat_pattern
df %>%
filter(grepl(paste0("\\b(no|never|not)\\b|n't\\b\\s", Repeat_pattern), Orthographic))
Orthographic Repeat Repeat_pattern
1 well not probably it's not intuitive probably \\b(probably)\\b
2 sure no it's I mean it's very intuitive it's,intuitive \\b(it's|intuitive)\\b
Warning message:
In grepl(paste0("\\b(no|never|not)\\b|n't\\b\\s", Repeat_pattern), :
argument 'pattern' has length > 1 and only the first element will be used
我不知道为什么“只使用第一个元素”,因为这两个模式组件似乎连接得很好:
paste0("\\b(no|never|not)\\b|n't\\b\\s", df$Repeat_pattern)
[1] "\\b(no|never|not)\\b|n't\\b\\sNA" "\\b(no|never|not)\\b|n't\\b\\s\\b(probably)\\b"
[3] "\\b(no|never|not)\\b|n't\\b\\s\\b(it's|intuitive)\\b" "\\b(no|never|not)\\b|n't\\b\\s\\b(I,mean|it's)\\b"
[5] "\\b(no|never|not)\\b|n't\\b\\sNA"
该预期的输出是这样的:
2 well not probably it's not intuitive probably \\b(probably)\\b
3 sure no it's I mean it's very intuitive it's,intuitive \\b(it's|intuitive)\\b
4 I don't mean to be rude but it's anything but you know I,mean,it's \\b(I,mean|it's)\\b
这里看起来像是矢量化问题,您需要在stringr::str_detect
此处使用而不是grepl
.
此外,您没有很好地将否定词替代品分组,所有这些都必须位于一个组中,并且您n't
现在必须在一个字符串中。
另外,NA
值被强制为文本并添加到正则表达式模式中,而您似乎想丢弃Repeat_pattern
is所在的项目NA
。
您可以使用以下方法修复您的代码
df %>%
filter(ifelse(is.na(Repeat_pattern), FALSE, str_detect(Orthographic, paste0("(?:\\bno|\\bnever|\\bnot|n't)\\b.*", Repeat_pattern))))
输出:
Orthographic Repeat Repeat_pattern
1 well not probably it's not intuitive probably \\b(probably)\\b
2 sure no it's I mean it's very intuitive it's,intuitive \\b(it's|intuitive)\\b
3 I don't mean to be rude but it's anything but you know I,mean,it's \\b(I|mean|it's)\\b
我也认为最后一个模式一定是\\b(I|mean|it's)\\b
,不是\\b(I,mean|it's)\\b
。
如果“no”单词和Repeat
列中的单词之间只能有空格.*
,请\\s+
在我的模式中替换为。我过去常常.*\b
确保“否”词右侧的任何地方都有匹配项。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句