Skip to content

Commit c32fe99

Browse files
committed
resolving #29
1 parent ac2649e commit c32fe99

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

chapters/1.Preprocessing/04_stopwords.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ How many stop words can you spot in each of the following sentences:
4343

4444
Now, let’s return to the worksheet and see how we can put that into practice.
4545

46-
In R, two commonly used stopword lists (lexicons) are **SMART** and **Snowbal** available through packages like `stopwords`, `tm`, or `tidytext`. Both serve the same purpose, removing common, low-information words, but they differ in origin, size, and linguistic design. **SMART** contains approximately 570 English stopwords, making it more comprehensive and slightly more restrictive, while **Snowball** fewer(350–400), leaving more content words intact. For this workshop, we will adopt the Snowball list because its less restrictive nature helps preserve context, which is especially important for NLP tasks such as topic modeling, sentiment analysis, or classification.
46+
**SMART**, **Snowbal** and **Onix** are the three lexicons available to handle `stopwords` through the the tidytext ecossytem. They serve the same purpose, removing common, low-information words, but they differ in origin, size, and linguistic design. For this workshop, we will adopt the **Snowball** list because its less restrictive nature, which helps preserve context, especially important for NLP tasks such as topic modeling, sentiment analysis, or classification.
4747

4848
We will start our stop word removal by calling `data("stop_words")` to load a built-in dataset from the tidytext package. This should create a dictionary containing 1,149 words as part of the lexicon's library.
4949

0 commit comments

Comments
 (0)