adding eps content, images + menu

rcurty · rcurty · commit df845d686ec4 · 2025-11-09T16:22:06.000-08:00
diff --git a/_quarto.yml b/_quarto.yml
@@ -61,6 +61,12 @@ website:
         contents:
           - href: chapters/3.SentimentAnalysis/introduction.qmd
             text: Introduction to Sentiment Analysis
+          - href: chapters/3.SentimentAnalysis/polarity.qmd
+            text: Polarity Classification
+          - href: chapters/3.SentimentAnalysis/polarity.qmd
+            text: Polarity Classification
+          - href: chapters/3.SentimentAnalysis/emotion.qmd
+            text: Emotion Detection
       - about.qmd
 
   page-footer:
diff --git a/chapters/3.SentimentAnalysis/emotion.qmd b/chapters/3.SentimentAnalysis/emotion.qmd
@@ -0,0 +1,30 @@
+---
+title: "Emotion Detection"
+editor: visual
+---
+
+Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotions—one might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
+
+We will use the `syuzhet` package ([more info](https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html)) to to help us classify emotions detected in our dataset. The name “syuzhet” is inspired by the work of Russian Formalists Victor Shklovsky and Vladimir Propp, who distinguished between two aspects of a narrative: the fabula and the syuzhet. The fabula represents the chronological sequence of events, while the syuzhet refers to the way these events are presented or structured; the narrative’s technique or “device.” In other words, syuzhet focuses on how the story (fabula) is organized and conveyed to the audience.
+
+The `syuzhet` package implements the [National Research Council Canada (NRC) Emotion Lexicon](https://nrc-publications.canada.ca/eng/view/ft/?id=0b6a5b58-a656-49d3-ab3e-252050a7a88c) which maps words to basic emotions, in addition to polarity scores, allowing for fine-grained emotion scoring at the word, sentence, or document level.
+
+This framework uses eight categories of emotions based on Robert Plutchik's theory of the emotional wheel, a foundational model that illustrates the relationships between human emotions from a psychological perspective. Plutchik’s wheel identifies eight primary emotions: anger, disgust, sadness, surprise, fear, trust, joy, and anticipation. As illustrated in Figure ? below, these emotions are organized into four pairs of opposites on the wheel. Emotions positioned diagonally across from each other represent opposites, while adjacent emotions share similarities, reflecting a positive correlation.
+
+![Figure?. Plutchik’s wheel of emotions. Image from: Zeng, X., Chen, Q., Chen, S., & Zuo, J. (2021). Emotion label enhancement via emotion wheel and lexicon. *Mathematical Problems in Engineering*, *2021*(1), 6695913. <https://doi.org/10.1155/2021/6695913>](images/emotion_wheel.jpg){fig-align="center" width="376"}
+
+The NRC Emotion Lexicon was developed as part of research into affective computing and sentiment analysis using a combination of manual annotation and crowdsourcing. Human annotators evaluated thousands of words, indicating which emotions were commonly associated with each word. This method ensured that the lexicon captured human-perceived emotional associations, rather than relying solely on statistical co-occurrences in text.
+
+Since its release, the NRC Emotion Lexicon has become a widely used resource in computational social science, marketing analytics, and text mining, because it allows researchers to move beyond simple positive/negative polarity to fine-grained emotion detection, making it possible to analyze the emotional content of text at scale.
+
+You may explore NRC's lexicon Tableau dashboard to explore words associated with each emotion category:
+
+```{=html}
+<iframe width="780" height="500" src="https://public.tableau.com/views/NRC-Emotion-Lexicon-viz1/NRCEmotionLexicon-viz1?:embed=y&:loadOrderID=0&:display_count=no&:showVizHome=no" title="NRC Lexicon Interactive Visualization"></iframe>
+```
+Now that we have a better understanding of this package, let's get back to business and perform emotion detection to our data:
+
+``` r
+```
+
+You might be wondering: if the **`syuzhet`** package also computes polarity, why did we choose **`sentimentr`** in our pipeline? The reason is that syuzhet does not inherently account for valence shifters. In the original syuzhet implementation, words are scored in isolation—so “good” = +1, “bad” = −1—regardless of nearby negations or intensifiers. For example, “not good” would still be counted as +1. Because **`sentimentr`** adjusts sentiment scores for negators and amplifiers, polarity results are more nuanced, robust, and reliable.
diff --git a/chapters/3.SentimentAnalysis/images/polarity.jpg b/chapters/3.SentimentAnalysis/images/polarity.jpg
diff --git a/chapters/3.SentimentAnalysis/images/sentiment.png b/chapters/3.SentimentAnalysis/images/sentiment.png
diff --git a/chapters/3.SentimentAnalysis/introduction.qmd b/chapters/3.SentimentAnalysis/introduction.qmd
@@ -1,3 +1,21 @@
 ---
 title: "Introduction to Sentiment Analysis"
----
+---
+
+Now that we have completed all the key preprocessing steps and our example dataset is in much better shape, we can finally proceed with sentiment analysis.
+
+![Image from Canva](images/sentiment.png){width="750"}
+
+## What is Sentiment Analysis?
+
+As social beings, our beliefs, understanding of reality, and everyday decisions are deeply shaped by the opinions, perceptions and evaluations of others. This social conditioning is a well-documented phenomenon in fields such as psychology, sociology, and communication, where it is understood that individuals often rely on external cues, especially the attitudes and judgments of others when forming their own assessments.
+
+Understanding how public reaction and sentiment shapes and reflects collective perception has become central not only to corporate strategy, but also to scientific inquiry across many academic disciplines.
+
+While the analysis of public opinion predates the Internet, the modern field of sentiment analysis did not gain momentum until the mid-2000s. This surge was largely driven by the rise of Web 2.0, which leveraged the internet into a more participatory platform, enabling users to create, share, and comment on content across chats, blogs, forums, and other social media. These digital spaces dramatically expanded the circulation and accessibility of user-generated content, creating a fertile ground for computational approaches to analyze subjective expressions in large volumes of text. But what is sentiment analysis?
+
+> *Sentiment analysis, also known as opinion mining, is now a well-established area of study within natural language processing (NLP) and computational linguistics. It focuses on identifying and extracting people’s opinions, evaluations, attitudes, and emotions from written language.*
+
+Whether through product reviews, political commentary, or social media posts in virtually any possible topic of interest, sentiment analysis aims to quantify and interpret subjective information at scale, enabling applications in marketing, social science, finance, and beyond. In this course, we will explore ways of extracting insights from textual data, in particular how we can detect underlying emotions within messages shared by people on a popular streaming TV series.
+
+Our analysis pipeline will follow a two-step approach. First, we will compute basic sentiment polarity to determine whether viewers who commented on both season finales reacted more negatively, neutrally, or positively. Next, we will apply a more fine-grained emotion detection technique to capture and analyze the specific emotional expressions conveyed in the text.
diff --git a/chapters/3.SentimentAnalysis/polarity.qmd b/chapters/3.SentimentAnalysis/polarity.qmd
@@ -0,0 +1,50 @@
+---
+title: "Polarity Classification"
+editor: visual
+---
+
+Polarity classification is a fundamental aspect of sentiment analysis to measure the overall emotional tone expressed in text data which can be categorized as positive, neutral or negative.
+
+Most models assign "sentiment scores" with values ranging from -1 to +1 to represent the intensity of the sentiment, being scores closer to -1 considered negative, those closer to 0 neutral and +1 positive.
+
+![](images/polarity.jpg){width="424"}
+
+We will be using the package `sentimentr` ([more info](https://cran.r-project.org/web/packages/sentimentr/sentimentr.pdf)) to compute polarity classification and attribute sentiment scores to the posts included in our dataset.
+
+Traditional sentiment analysis techniques assign polarity by matching words against dictionaries labeled as “positive,” “negative,” or “neutral.” While straightforward, this approach is overly simplistic: it ignores context and flattens the richness of our syntactically complex, lexically nuanced language, that transcends individual words. The `sentimentr` package extends lexicon-based methods by accounting for *valence shifters*; words that subtly alter sentiment.
+
+The package includes 130 valence shifters that can reverse or modulate the sentiment indicated by standard dictionaries. These fall into four main categories: negators (e.g., not, can’t), amplifiers (e.g., very, really, absolutely, totally, certainly), de-amplifiers or down-toners (e.g., barely, hardly, rarely, almost), and adversative conjunctions (e.g., although, however, but, yet, that being said). This refinement is important because simple dictionary lookups miss the nuanced meaning.
+
+In summary, each word in a sentence is checked against a dictionary of positive and negative words, like the Jockers dictionary in the lexicon package. Words that are positive get a +1, and words that are negative get a -1, which are called polarized words. Around each polarized word, we look at the nearby words (usually four before and two after) to see if they change the strength or direction of the sentiment. This group of words is called a polarized context cluster. Words in the cluster can be neutral, negators (like "not"), amplifiers (like "very"), or de-amplifiers (like "slightly"). Neutral words don’t affect the sentiment but still count toward the total word count.
+
+The main polarized word’s sentiment is then adjusted by the surrounding words. Amplifiers make it stronger, de-amplifiers make it weaker, and negators can flip the sentiment. Multiple negators can cancel each other out, like “not unhappy” turning positive.
+
+Words like “but,” “however,” and “although” also influence the sentiment. Words before these are weighted less, and words after them are weighted more because they signal a shift in meaning. Finally, all the adjusted scores are combined and scaled by the sentence length to give a final sentiment score for the sentence.
+
+With this approach, we can explore more confidently whether the show’s viewers felt positive, neutral, or negative about it.
+
+``` r
+
+```
+
+Let's now take a look at the `sentiment_scores` data frame:
+
+<add output>
+
+It’s expected that the standard deviation is missing, because each row/case is treated as a single sentence when computing the score. Now, let's make sure to add these scores and the labels our dataset:
+
+``` r
+
+```
+
+#### Plotting Things
+
+Next, let's plot some results and histograms to check the distribution:
+
+``` r
+
+```
+
+We could have spent more time refining these plots, but this is sufficient for our initial exploration. In pairs, review the plots and discuss what they reveal about viewers’ perceptions of the *Severance* show.
+
+Well, that’s only part of the story. Now we move on to emotion detection to discover what else we can learn from the data.