仅将向量R中找到的字保留在数据框中

控制论

我需要从数据框中删除所有非英语单词，如下所示：

ID     text
1      they all went to the store bonkobuns and bought chicken
2      if we believe no exomunch standards are in order then we're ok
3      living among the calipodians seems reasonable  
4      given the state of all relimited editions we should be fine

我想这样结束一个数据框：

 ID     text
 1      they all went to the store and bought chicken
 2      if we believe no standards are in order then we're ok
 3      living among the seems reasonable  
 4      given the state of all editions we should be fine

我有一个包含所有英语单词的向量：word_vec

我可以使用tm包从数据框中删除向量中的所有单词

for(k in 1:nrow(frame){
    for(i in 1:length(word_vec)){
        frame[k,] <- removeWords(frame[i,],word_vec[i])
    }
}

但我想相反。我只想“保留”向量中找到的单词。

多米尼克·科托伊斯

这是一种简单的方法：

txt <- "Hi this is an example"
words <- c("this", "is", "an", "example")
paste(intersect(strsplit(txt, "\\s")[[1]], words), collapse=" ")
[1] "this is an example"

当然，细节在于魔鬼，因此您可能需要稍微调整一下内容，以考虑撇号和其他标点符号。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-10-28

我来说两句

0 条评论

登录后参与评论

上一篇：ANSI转义序列保存/恢复光标位置支持

仅将配对的行保留在数据框中

仅将单词保留在数据框元素的列表中

仅将向量R中找到的字保留在数据框中

仅将向量R中找到的字保留在数据框中

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序