对于R中的文本挖掘，如何将DocumentTermMatrix与原始数据帧结合在一起？

djacobs1216

我想要做的是创建允许我对推文进行分类的代码。因此，在下面的示例中，我想讨论有关信用卡的推文，并确定它们是否与旅行问题有关。

这是初始数据集：

id<- c(123,124,125,126,127) 
text<- c("Since I love to travel, this is what I rely on every time.", 
        "I got this card for the no international transaction fee", 
        "I got this card mainly for the flight perks",
        "Very good card, easy application process",
        "The customer service is outstanding!") 
travel_cat<- c(1,0,1,0,0) 
df_all<- data.frame(id,text,travel)

输出1：

id  text                                                        travel_cat
123 Since I love to travel, this is what I rely on every time.  1
124 I got this card for the no international transaction fee    0
125 I got this card mainly for the flight perks                 1
126 Very good card, easy application process                    0
127 The customer service is outstanding!                        0

然后，我仅使用文本字段创建一个数据框，然后进行文本分析：

myvars<- c("text")
df<- df_all[myvars]

library(tm)
corpus<- Corpus(DataframeSource(df))
corpus<- tm_map(corpus, content_transformer(tolower))
corpus<- tm_map(corpus, removePunctuation)
corpus<- tm_map(corpus, removeWords, stopwords("english"))
corpus<- tm_map(corpus, stripWhitespace)
dtm<- as.matrix(DocumentTermMatrix(corpus))

输出2（dtm）：

Docs    application card    customer    easy    every ... etc.
1       0           0       0           1       0
2       0           1       0           0       1
3       0           1       0           0       0
4       1           1       0           0       0
5       0           0       1           0       0

然后如何将其绑定到原始数据，以便包含原始数据集和矩阵中的字段（输出1 +输出2）：id，text，travel_cat + application，card，customer，easy，every ...

Dhiraj

只是尝试一个 cbind()