Code to compare Facebook and Youtube's comments30 May 2017
Working with Facebook comments
Cleaning and tidying the data
Here I replicate the work done on Youtube comments.
fb_comments <- fb_comments %>% filter(com_text != "") %>% left_join(videos_fb, by = c("post_id_fb" = "id")) %>% group_by(short_title) %>% mutate(n = n(), com_created = as.Date(com_created)) %>% ungroup() %>% filter(n >= 100) %>% select(short_title, video_id = ids, post_id_fb, com_text, com_id, com_created) tidy_fb_comments <- fb_comments %>% tidytext::unnest_tokens(word, com_text) %>% anti_join(stop_words, by = "word")
Plot the most positive and most negative words
Once I have a tidy dataframe, I plot the most positive and most negative words on Facebook to compare them in the original article with the Youtube ones.
fb_pos_neg_words <- tidy_fb_comments %>% inner_join(get_sentiments("bing"), by = "word") %>% count(word, sentiment, sort = TRUE) %>% ungroup() %>% group_by(sentiment) %>% top_n(10) %>% ungroup() %>% mutate(word = reorder(word, n)) %>% ggplot(aes(word, n, fill = sentiment)) + geom_col(show.legend = FALSE) + scale_fill_manual(values = c("red2", "green3")) + facet_wrap(~sentiment, scales = "free_y") + ylim(0, 2500) + xlab(NULL) + ylab(NULL) + coord_flip() + theme_minimal()
Sentiment by comment and by video
As I did for the Youtube videos, I calculate the sentiment for every comment and then for every video.
fb_comment_sent <- tidy_fb_comments %>% inner_join(get_sentiments("bing"), by = "word") %>% count(com_id, sentiment) %>% spread(sentiment, n, fill = 0) %>% mutate(sentiment = positive - negative) %>% ungroup() %>% left_join(fb_comments, by = "com_id") fb_title_sent <- fb_comment_sent %>% group_by(short_title) %>% summarise(pos = sum(positive), neg = sum(negative), sent_mean = mean(sentiment), sentiment = pos - neg) %>% ungroup() %>% arrange(-sentiment)
Joining Facebook and Youtube comments
I join the Youtube and Facebook’s sentiment by video tables to compare comments. I have to filter the videos present in both platforms to make a fair comparison.
comments_by_title <- yt_title_sent %>% inner_join(fb_title_sent, by = c("short_title" = "short_title")) %>% select(vid_created, short_title, mean_sent_yt = sent_mean.x, mean_sent_fb = sent_mean.y) %>% ungroup() %>% mutate(diff = mean_sent_fb - mean_sent_yt, short_title = reorder(short_title, -diff)) %>% arrange(desc(diff))
And now I can plot the sentiment for every video on each platforms, ordered by published date.
library(plotly) ggplotly(comments_by_title %>% ggplot(aes(x = reorder(short_title, vid_created), text = paste(short_title, "<br />", vid_created))) + geom_line(aes(y = mean_sent_fb, group = 1), color = "blue") + geom_line(aes(y = mean_sent_yt, group = 1), color = "red") + geom_hline(yintercept = 0) + xlab(NULL) + ylab(NULL) + theme_minimal() + theme(axis.text.x = element_blank()), tooltip = "text")