|
| 1 | +--- |
| 2 | +title: "Polarity Classification" |
| 3 | +editor: visual |
| 4 | +--- |
| 5 | + |
| 6 | +Polarity classification is a fundamental aspect of sentiment analysis to measure the overall emotional tone expressed in text data which can be categorized as positive, neutral or negative. |
| 7 | + |
| 8 | +Most models assign "sentiment scores" with values ranging from -1 to +1 to represent the intensity of the sentiment, being scores closer to -1 considered negative, those closer to 0 neutral and +1 positive. |
| 9 | + |
| 10 | +{width="424"} |
| 11 | + |
| 12 | +We will be using the package `sentimentr` ([more info](https://cran.r-project.org/web/packages/sentimentr/sentimentr.pdf)) to compute polarity classification and attribute sentiment scores to the posts included in our dataset. |
| 13 | + |
| 14 | +Traditional sentiment analysis techniques assign polarity by matching words against dictionaries labeled as “positive,” “negative,” or “neutral.” While straightforward, this approach is overly simplistic: it ignores context and flattens the richness of our syntactically complex, lexically nuanced language, that transcends individual words. The `sentimentr` package extends lexicon-based methods by accounting for *valence shifters*; words that subtly alter sentiment. |
| 15 | + |
| 16 | +The package includes 130 valence shifters that can reverse or modulate the sentiment indicated by standard dictionaries. These fall into four main categories: negators (e.g., not, can’t), amplifiers (e.g., very, really, absolutely, totally, certainly), de-amplifiers or down-toners (e.g., barely, hardly, rarely, almost), and adversative conjunctions (e.g., although, however, but, yet, that being said). This refinement is important because simple dictionary lookups miss the nuanced meaning. |
| 17 | + |
| 18 | +In summary, each word in a sentence is checked against a dictionary of positive and negative words, like the Jockers dictionary in the lexicon package. Words that are positive get a +1, and words that are negative get a -1, which are called polarized words. Around each polarized word, we look at the nearby words (usually four before and two after) to see if they change the strength or direction of the sentiment. This group of words is called a polarized context cluster. Words in the cluster can be neutral, negators (like "not"), amplifiers (like "very"), or de-amplifiers (like "slightly"). Neutral words don’t affect the sentiment but still count toward the total word count. |
| 19 | + |
| 20 | +The main polarized word’s sentiment is then adjusted by the surrounding words. Amplifiers make it stronger, de-amplifiers make it weaker, and negators can flip the sentiment. Multiple negators can cancel each other out, like “not unhappy” turning positive. |
| 21 | + |
| 22 | +Words like “but,” “however,” and “although” also influence the sentiment. Words before these are weighted less, and words after them are weighted more because they signal a shift in meaning. Finally, all the adjusted scores are combined and scaled by the sentence length to give a final sentiment score for the sentence. |
| 23 | + |
| 24 | +With this approach, we can explore more confidently whether the show’s viewers felt positive, neutral, or negative about it. |
| 25 | + |
| 26 | +``` r |
| 27 | + |
| 28 | +``` |
| 29 | + |
| 30 | +Let's now take a look at the `sentiment_scores` data frame: |
| 31 | + |
| 32 | +<add output> |
| 33 | + |
| 34 | +It’s expected that the standard deviation is missing, because each row/case is treated as a single sentence when computing the score. Now, let's make sure to add these scores and the labels our dataset: |
| 35 | + |
| 36 | +``` r |
| 37 | + |
| 38 | +``` |
| 39 | + |
| 40 | +#### Plotting Things |
| 41 | + |
| 42 | +Next, let's plot some results and histograms to check the distribution: |
| 43 | + |
| 44 | +``` r |
| 45 | + |
| 46 | +``` |
| 47 | + |
| 48 | +We could have spent more time refining these plots, but this is sufficient for our initial exploration. In pairs, review the plots and discuss what they reveal about viewers’ perceptions of the *Severance* show. |
| 49 | + |
| 50 | +Well, that’s only part of the story. Now we move on to emotion detection to discover what else we can learn from the data. |
0 commit comments