UCSB-Library-Research-Data-Services
diff --git a/‎_quarto.yml‎
Lines changed: 2 additions & 0 deletions b/‎_quarto.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎chapters/3.SentimentAnalysis/considerations.qmd‎
Lines changed: 30 additions & 0 deletions b/‎chapters/3.SentimentAnalysis/considerations.qmd‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎chapters/3.SentimentAnalysis/emotion.qmd‎
Lines changed: 168 additions & 3 deletions b/‎chapters/3.SentimentAnalysis/emotion.qmd‎
Lines changed: 168 additions & 3 deletions
diff --git a/‎chapters/3.SentimentAnalysis/images/Screenshot 2025-11-12 at 8.03.51 PM.png‎
264 KB b/‎chapters/3.SentimentAnalysis/images/Screenshot 2025-11-12 at 8.03.51 PM.png‎
264 KB
diff --git a/‎chapters/3.SentimentAnalysis/images/Screenshot 2025-11-12 at 8.05.27 PM.png‎
142 KB b/‎chapters/3.SentimentAnalysis/images/Screenshot 2025-11-12 at 8.05.27 PM.png‎
142 KB
diff --git a/‎chapters/3.SentimentAnalysis/images/barchart-emotions.png‎
166 KB b/‎chapters/3.SentimentAnalysis/images/barchart-emotions.png‎
166 KB
diff --git a/‎chapters/3.SentimentAnalysis/images/emotion-correlation-heatmap.png‎
236 KB b/‎chapters/3.SentimentAnalysis/images/emotion-correlation-heatmap.png‎
236 KB
diff --git a/‎chapters/3.SentimentAnalysis/images/emotion-counts.png‎
46.6 KB b/‎chapters/3.SentimentAnalysis/images/emotion-counts.png‎
46.6 KB
diff --git a/‎chapters/3.SentimentAnalysis/images/emotion-score.png‎
267 KB b/‎chapters/3.SentimentAnalysis/images/emotion-score.png‎
267 KB
diff --git a/‎chapters/3.SentimentAnalysis/images/emotion_wheel.jpg‎
41.6 KB b/‎chapters/3.SentimentAnalysis/images/emotion_wheel.jpg‎
41.6 KB
@@ -65,6 +65,8 @@ website:
             text: Polarity Classification
           - href: chapters/3.SentimentAnalysis/emotion.qmd
             text: Emotion Detection
+          - href: chapters/3.SentimentAnalysis/considerations.qmd
+            text: Final Considerations
       - about.qmd
 
   page-footer:
 
@@ -0,0 +1,30 @@
+---
+title: "Final Considerations"
+editor: visual
+---
+
+Sentiment analysis, while a powerful method to extract insights from data, is far from perfect or straightforward. After all, it seeks to interpret natural language, which is constantly evolving. Speaking of evolution, did you know that the [Cambridge dictionary added 6,000 words](https://www.npr.org/2025/08/19/nx-s1-5506163/cambridge-dictionary-adds-more-than-6-000-words-including-skibidi-and-delulu) only this year, including "broligarchy" and "delulu", many of which are widely used by Gen Alpha? This constant expansion of language highlights just how dynamic the texts we analyze can be.
+
+Language is also inherently rich, ambiguous, and culturally nuanced. Lexicon-based approaches, for instance, rely on predefined word lists and often struggle to capture subtleties in human expression.
+
+In practice, sentiment analysis encounters issues like *code-switching*, where people mix languages in a single post, or compound sentences with mixed sentiments, such as “The movie had great acting, but the ending was lame,” which are difficult to score accurately. *Context dependence* further complicates interpretation: words can flip polarity depending on the domain, like “cheap,” which is positive when describing flights or monetary advantage in general, but negative when describing fabric or referring to quality.
+
+Temporal dynamics also play a role, as *slang and cultural references* evolve rapidly, e.g., “bad” meaning “good” in some communities. Ambiguity adds another layer of difficulty: polysemous words like “sick” can mean either “ill” or “awesome”.
+
+Another complication is to deal with *sarcasm* and irony which can completely invert the intended sentiment as in: “Oh great, another awesome Monday morning traffic jam!".
+
+*Implicit sentiment* may be present even when emotional words are absent, as in “The waiter ignored us for 30 minutes before taking our order.” These factors collectively make sentiment analysis a useful but inherently imperfect tool for understanding human language and emotion.
+
+However, it is important to emphasize that as described before, we have only explored sentiment analysis through a lexicon-based approach, and that, as illustrated in Figure ? below, there are other methods, including machine learning, deep learning and their combination (hybrid), that can be employed to extract emotions from text, including user generated content, all with their own limitations and challenges.
+
+![Overview of sentiment analysis methods, applications, and challenges from (Mao, Liu, & Zhang, 2024).](images/wheel_NLP.jpg){fig-align="center" width="500"}
+
+For example, Amazon relies on deep learning algorithms to determine the sentiment of customer reviews by identifying positive, negative, or neutral tones in the text. The models are trained on a vast dataset of Amazon’s product descriptions and reviews and are regularly updated with new information. This robust approach enables Amazon to efficiently analyze and interpret customer feedback on a large scale.
+
+While there are more advanced approaches to sentiment analysis, including AI-assisted methods, these are discussion topics for future workshops!
+
+------------------------------------------------------------------------
+
+## References
+
+Mao, Y., Liu, Q., & Zhang, Y. (2024). Sentiment analysis methods, applications, and challenges: A systematic literature review. *Journal of King Saud University - Computer and Information Sciences, 36*(4), 102048. <https://doi.org/10.1016/j.jksuci.2024.102048>
@@ -11,7 +11,7 @@ The `syuzhet` package implements the [National Research Council Canada (NRC) Emo
 
 This framework uses eight categories of emotions based on Robert Plutchik's theory of the emotional wheel, a foundational model that illustrates the relationships between human emotions from a psychological perspective. Plutchik’s wheel identifies eight primary emotions: anger, disgust, sadness, surprise, fear, trust, joy, and anticipation. As illustrated in Figure ? below, these emotions are organized into four pairs of opposites on the wheel. Emotions positioned diagonally across from each other represent opposites, while adjacent emotions share similarities, reflecting a positive correlation.
 
-![Figure?. Plutchik’s wheel of emotions. Image from: Zeng, X., Chen, Q., Chen, S., & Zuo, J. (2021). Emotion label enhancement via emotion wheel and lexicon. *Mathematical Problems in Engineering*, *2021*(1), 6695913. <https://doi.org/10.1155/2021/6695913>](images/emotion_wheel.jpg){fig-align="center" width="376"}
+![Plutchik’s wheel of emotions. Image from: Zeng, X., Chen, Q., Chen, S., & Zuo, J. (2021). Emotion label enhancement via emotion wheel and lexicon. Mathematical Problems in Engineering, 2021(1), 6695913. https://doi.org/10.1155/2021/6695913](images/emotion_wheel.jpg){fig-align="center" width="376"}
 
 The NRC Emotion Lexicon was developed as part of research into affective computing and sentiment analysis using a combination of manual annotation and crowdsourcing. Human annotators evaluated thousands of words, indicating which emotions were commonly associated with each word. This method ensured that the lexicon captured human-perceived emotional associations, rather than relying solely on statistical co-occurrences in text.
 
@@ -22,9 +22,174 @@ You may explore NRC's lexicon Tableau dashboard to explore words associated with
 ```{=html}
 <iframe width="780" height="500" src="https://public.tableau.com/views/NRC-Emotion-Lexicon-viz1/NRCEmotionLexicon-viz1?:embed=y&:loadOrderID=0&:display_count=no&:showVizHome=no" title="NRC Lexicon Interactive Visualization"></iframe>
 ```
-Now that we have a better understanding of this package, let's get back to business and perform emotion detection to our data:
+
+Now that we have a better understanding of this package, let's get back to business and perform emotion detection to our data.
+
+#### Emotion Detection with Syuzhet's NRC Lexicon
+
+##### Detecting Emotions per Comment/Sentence
+
+``` r
+sentences <- get_sentences(comments$comments)
+```
+
+##### Compute Emotion Scores per Sentence
+
+``` r
+emotion_score <- get_nrc_sentiment(sentences)
+```
+
+The `get_nrc_sentiment()` function assigns emotion and sentiment scores (based on the NRC lexicon) to each sentence. Each sentence gets numeric values (0 or 1) for the eight emotions to represent their absence or presence. The output also includes positive and negative sentiment scores:
+
+![](images/emotions_scores-dataframe.png)
+
+##### Review Summary of Emotion Scores
+
+Let's now compute basic statistics (min, max, mean, etc.) for each emotion column and get an overview of how frequent or strong each emotion is on our example dataset.
+
+``` r
+summary(emotion_score)
+```
+
+This step should generate the following output:
+
+![](images/emotion-score.png)
+
+Based on the results the overall emotion in these comments leans heavily toward **sadness**, which scored the highest average (1.236). It looks like **sadness** and **trust** are the most common feelings, since they're the only ones with a median score of 1.000, meaning at least half the comments contained words for them.
+
+On the flip side, **Disgust** was the rarest emotion, with the lowest average (0.145). It's also worth noting that while Sadness and Trust are the most *common*, a few comments really went off the rails with **Trust (47.000), Anger (44.000)**, and **Fear (37.000)**, hitting the highest extreme scores.
+
+##### Regroup with comments and IDs
+
+After computing scores for emotions, we want to link them back to its **original comment and ID**.
+
+``` r
+comments$comments <- sentences
+emotion_data <- bind_cols(comments, emotion_score)
+```
+
+`bind_cols()` merges the original `comments` data frame with the new `emotion_score` table.
+
+##### Summarize Emotion Counts Across All Sentences
+
+Now, let's count **how many times each emotion appears** overall.
+
+``` r
+emotion_summary <- emotion_data %>%
+  select(anger:trust) %>% # get only the emotion columns
+  summarise(across(everything(), sum)) %>% # sum counts 
+  pivot_longer(
+    cols = everything(), 
+    names_to = "emotion", 
+    values_to = "count"
+    ) %>% # long format for easy plotting
+  arrange(desc(count)) # sort emotions
+```
+
+![](images/emotion-counts.png){width="194"}
+
+##### Plot the Overall Emotion Distribution
 
 ``` r
+ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +
+  geom_col(show.legend = FALSE) +              # Bar plot for emotion counts
+  geom_text(aes(label = count), hjust = -0.2, size = 2) +  # Add count labels
+  scale_fill_manual(values = brewer.pal(10, "Paired")) +   # Color palette
+  theme_minimal(base_size = 12) +             # Clean theme
+  labs(title = "Overall Emotion Distribution",
+       x = "Emotion", y = "Total Count") +    # Titles and axis labels
+  coord_flip()                                # Flip axes for readability
 ```
 
-You might be wondering: if the **`syuzhet`** package also computes polarity, why did we choose **`sentimentr`** in our pipeline? The reason is that syuzhet does not inherently account for valence shifters. In the original syuzhet implementation, words are scored in isolation—so “good” = +1, “bad” = −1—regardless of nearby negations or intensifiers. For example, “not good” would still be counted as +1. Because **`sentimentr`** adjusts sentiment scores for negators and amplifiers, polarity results are more nuanced, robust, and reliable.
+![](images/barchart-emotions.png)
+
+##### Add a “Season” Variable (Grouping) and Summarize
+
+Let's now add a new column called `season` by looking at the ID pattern — for example, `s1_` means season 1 and `s2_` means season 2. This makes it easy to compare the emotional tone across seasons.
+
+``` r
+emotion_seasons <- emotion_data %>%
+  mutate(season = ifelse(grepl("^s1_", id), "s1",
+                  ifelse(grepl("^s2_", id), "s2", NA)))
+```
+
+Time to aggregates the total count of each emotion within each season.
+
+``` r
+# Aggregate emotion counts per season
+emotion_by_season <- emotion_seasons %>%
+  group_by(season) %>%
+  summarise(
+    across(anger:positive, ~sum(., na.rm = TRUE))
+  )
+```
+
+##### Plotting the Data
+
+Comparing emotions by season:
+
+``` r
+emotion_long <- emotion_by_season %>%
+  pivot_longer(cols = anger:positive, names_to = "emotion", values_to = "count")
+
+ggplot(emotion_long, aes(x = reorder(emotion, -count), y = count, fill = season)) +
+  geom_col(position = "dodge") + # separates bars for clarity
+  geom_text(aes(label = count), hjust = -0.2, size = 2) +
+  scale_fill_brewer(palette = "Set2") +
+  theme_minimal(base_size = 12) +
+  labs(title = "Emotion Distribution by Season", 
+       x = "Emotion", y = "Total Count", fill = "Season") +
+  coord_flip()
+```
+
+![](images/plot-emotion-by-season-01.png)
+
+Now, let's explore to see which emotions tend to occur together, revealing patterns of emotional co-occurrence in the text.
+
+``` r
+# Select only emotion columns (excluding overall positive/negative sentiment)
+emotion_matrix <- emotion_data %>%
+  select(anger:trust)
+
+# Compute the correlation matrix for emotions
+# Pearson correlation shows how strongly two emotions co-occur
+co_occurrence <- cor(emotion_matrix, method = "pearson")
+
+# Remove diagonal values to avoid coloring the perfect self-correlation
+diag(co_occurrence) <- NA
+
+# Convert the correlation matrix to long format for ggplot
+co_occurrence_long <- as.data.frame(as.table(co_occurrence))
+colnames(co_occurrence_long) <- c("emotion1", "emotion2", "correlation")
+
+# Plot the co-occurrence heatmap
+ggplot(co_occurrence_long, aes(x = emotion1, y = emotion2, fill = correlation)) +
+  geom_tile(color = "white") +  # draw grid tiles
+  scale_fill_gradient2(
+    mid = "white", high = "red", midpoint = 0,
+    limits = c(0, 1), na.value = "grey95", name = "Correlation"
+  ) +
+  theme_minimal(base_size = 12) +
+  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # rotate x-axis labels
+  labs(
+    title = "Emotion Co-occurrence Heatmap",
+    x = "Emotion",
+    y = "Emotion"
+  )
+```
+
+After running the script we should get the following heat map:
+
+![](images/emotion-correlation-heatmap.png)
+
+Based on these results, Overall, the emotional picture is pretty interconnected. It looks like the **negative emotions—Sadness, Fear, Anger, and Disgust—are more tightly linked**, meaning when people express one of these, they usually express the others too. In other words, they often show up together in the same comments.
+
+While we've only scratched the surface of this particular dataset, the steps we've completed—from calculating basic sentiment scores to visualizing the co-occurrence of emotions—have demonstrated the **power of sentiment and emotion detection**. You now have the foundational skills to convert unstructured text into actionable data, allowing you to understand the **polarity (positive/negative)** and **specific emotional landscape** of any textual dataset.
+
+##### Saving our work
+
+After performing all the calculations and visualizations, it’s important to save the results so they can be reused or shared.
+
+``` r
+write_csv(emotion_data, "output/sentiment_emotion_results.csv")
+```