Many thanks for the Quanteda Package, so powerful to use.
I have three questions :
below is an example to see that (in french sorry), and my question is : can we plot for each document the DFM (FCM) categorize by dictionary ? it seems all the time for all docs.
is there a way to link features (only the main) to the keys on the graph ?
and more theoretically: is it possible (and useful) to address the influence of the number of features in a dictionary regarding the number of times the features are present in document ? in other words: influence or not of few features in a keys compare to many in other one ?
Thank you R
library(quanteda)
library(quanteda.textplots)
s <- c("Je suis certain de trouve philosophie et de tristesse", " je trouve la vie belle et tristesse", "je suis blanc et je suis peureux",
"Je suis belle et pleine de tristesse")
toks <- tokens(s)
dfm <- dfm(toks)
dict1 <- dictionary(list(emotion=c("philosophie", "tristesse","belle", "blanc","trouve"),
peu=c("je"),
tout= c("Je", 'suis',"et","de")))
dict_dtm2 <- dfm_lookup(dfm, dict1, nomatch="_unmatched")
tail(dict_dtm2)
dict_sel <- dfm_select(dfm, pattern = dict1)
tail(dict_sel)
fcm_dtm2 <-fcm(dict_dtm2)
size <- log(colSums(fcm_dtm2))
fcm_dtm2 %>%
textplot_network( min_freq = 1,
omit_isolated= TRUE,
vertex_size = size / max(size) * 5 , edge_alpha =0.5)