Code to compare Facebook and Youtube's comments
May 30, 2017 · 1079 words · 6 minute read
Working with Facebook comments
Cleaning and tidying the data
Here I replicate the work done on Youtube comments.
fb_comments <- fb_comments %>%
filter(com_text != "") %>%
left_join(videos_fb, by = c("post_id_fb" = "id")) %>%
group_by(short_title) %>%
mutate(n = n(),
com_created = as.Date(com_created)) %>%
ungroup() %>%
filter(n >= 100) %>%
select(short_title, video_id = ids, post_id_fb, com_text, com_id, com_created)
tidy_fb_comments <- fb_comments %>%
tidytext::unnest_tokens(word, com_text) %>%
anti_join(stop_words, by = "word")
Plot the most positive and most negative words
Once I have a tidy dataframe, I plot the most positive and most negative words on Facebook to compare them in the original article with the Youtube ones.
fb_pos_neg_words <- tidy_fb_comments %>%
inner_join(get_sentiments("bing"), by = "word") %>%
count(word, sentiment, sort = TRUE) %>%
ungroup() %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
scale_fill_manual(values = c("red2", "green3")) +
facet_wrap(~sentiment, scales = "free_y") +
ylim(0, 2500) +
xlab(NULL) +
ylab(NULL) +
coord_flip() +
theme_minimal()
Sentiment by comment and by video
As I did for the Youtube videos, I calculate the sentiment for every comment and then for every video.
fb_comment_sent <- tidy_fb_comments %>%
inner_join(get_sentiments("bing"), by = "word") %>%
count(com_id, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>%
ungroup() %>%
left_join(fb_comments, by = "com_id")
fb_title_sent <- fb_comment_sent %>%
group_by(short_title) %>%
summarise(pos = sum(positive),
neg = sum(negative),
sent_mean = mean(sentiment),
sentiment = pos - neg) %>%
ungroup() %>%
arrange(-sentiment)
Joining Facebook and Youtube comments
I join the Youtube and Facebook's sentiment by video tables to compare comments. I have to filter the videos present in both platforms to make a fair comparison.
comments_by_title <- yt_title_sent %>%
inner_join(fb_title_sent, by = c("short_title" = "short_title")) %>%
select(vid_created,
short_title,
mean_sent_yt = sent_mean.x,
mean_sent_fb = sent_mean.y) %>%
ungroup() %>%
mutate(diff = mean_sent_fb - mean_sent_yt,
short_title = reorder(short_title, -diff)) %>%
arrange(desc(diff))
And now I can plot the sentiment for every video on each platforms, ordered by published date.
library(plotly)
ggplotly(comments_by_title %>%
ggplot(aes(x = reorder(short_title, vid_created),
text = paste(short_title, "<br />", vid_created))) +
geom_line(aes(y = mean_sent_fb, group = 1), color = "blue") +
geom_line(aes(y = mean_sent_yt, group = 1), color = "red") +
geom_hline(yintercept = 0) +
xlab(NULL) +
ylab(NULL) +
theme_minimal() +
theme(axis.text.x = element_blank()),
tooltip = "text")