我有讲故事的笔录,有很多重叠的语音实例,用方括号将重叠的语音包裹起来。我想提取这些重叠的实例。在以下模拟示例中,
ovl <- c("well [yes right]", "let's go", "oh [ we::ll] i do n't (0.5) know", "erm [°well right° ]", "(3.2)")
该代码可以正常工作:
pattern <- "\\[(.*\\w.+])*"
grep(pattern, ovl, value=T)
matches <- gregexpr(pattern, ovl)
overlap <- regmatches(ovl, matches)
overlap_clean <- unlist(overlap); overlap_clean
[1] "[yes right]" "[ we::ll]" "[°well right° ]"
但是在更大的文件中,没有数据帧。这是由于模式错误还是由于数据帧的结构导致的?df的前六行如下所示:
> head(df)
Story
1 "Kar:\tMind you our Colin's getting more like your dad every day
2 June:\tI know he is.
3 Kar:\tblack welding glasses on,
4 \tand he turned round and he made me jump
5 \t“O:h, Colin”,
6 \tand then ( )
尽管在某些情况下可能会奏效,但您的模式对我而言还是遥不可及的。我认为应该是这样的:
pattern <- "(\\[.*?\\])"
matches <- gregexpr(pattern, ovl)
overlap <- regmatches(ovl, matches)
overlap_clean <- unlist(overlap)
overlap_clean
[1] "[yes right]" "[ we::ll]" "[°well right° ]"
这将匹配并捕获一个带括号的术语,使用Perl懒点确保我们在第一个结束括号处停止。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句