Skip to content

Commit c56e4b1

Browse files
authored
Merge pull request #34 from UCSB-Library-Research-Data-Services/renata
Renata
2 parents aef35fb + c32fe99 commit c56e4b1

17 files changed

+208
-85
lines changed

chapters/1.Preprocessing/01_introduction.qmd

Lines changed: 43 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,41 @@ Before we can apply any meaningful analysis or modeling, it’s crucial to visua
4040

4141
### Getting Files and Launching RStudio
4242

43-
Time to launch RStudio and our example! Click on this [link](https://ucsb.box.com/s/z6buv80wmgqm1wb389o1j6vl9k3ldapv) to download the `text-preprocessing` subfolder, from the folder `text-analysis-series`. Among other files, this subfolder contains the dataset we will be using `comments.csv`, a worksheet in qmd, a Quarto extension (learn more about [Quarto](https://quarto.org/)), named `preprocessing_worksheet` where we will be performing some coding, and an `renv.lock`(learn more about [Renv](https://rstudio.github.io/renv/articles/renv.html)) file listing all the R packages (and their versions) we’ll use during the workshop. This setup ensures a self-contained environment, so you can run everything needed for the session without installing or changing any packages that might affect your other R projects.
43+
Time to launch RStudio and our example! Click on this [link](https://ucsb.box.com/s/z6buv80wmgqm1wb389o1j6vl9k3ldapv) to download the `text-preprocessing` subfolder, from the folder `text-analysis-series`. Among other files, this subfolder contains the dataset we will be using `comments.csv`, a worksheet in qmd, a Quarto extension (learn more about [Quarto](https://quarto.org/)), named `preprocessing-workbook.qmd` where we will be performing some coding, and an `renv.lock`(learn more about [Renv](https://rstudio.github.io/renv/articles/renv.html)) file listing all the R packages (and their versions) we’ll use during the workshop.
4444

45-
After downloading this subfolder, double click on the project file `text-preprocessing.Rproj` to launch Rstudio. Look for and open the file `preprocessing_worksheet` on your Rstudio environment.
45+
This setup ensures a self-contained environment, so you can run everything needed for the session without installing or changing any packages that might affect your other R projects.
4646

47-
In your R Console, type `renv::restore()` to read the renv.lock file and installs the specific package versions used in the project.
47+
After downloading this subfolder, double click on the project file `text-preprocessing.Rproj` to launch Rstudio. Look for and open the file `preprocessing-workbook.qmd` on your Rstudio environment.
48+
49+
### Setting up the environment with renv
50+
51+
Next, we will need to install the package \`renv\` so you can setup the working environment correctly with all the packages and dependencies we will need. On the console, type:
52+
53+
``` r
54+
install.packages("renv")
55+
```
56+
57+
Then, still in the console, we will restore it, which will essentially installs packages in an R project to match the versions recorded in the project's renv.lock file we have shared with you.
58+
59+
``` r
60+
renv::restore()
61+
```
62+
63+
::: callout-warning
64+
**Matrix Package Incompatible with R**
65+
66+
If you encounter incompatibility issues with the **Matrix** package (or any other) due to your R version, you can explicitly install the package by running the following in your console:
67+
68+
```
69+
renv::install("Matrix")
70+
```
71+
72+
Next, update your `renv.lock` file to reflect this version by running:
73+
74+
```
75+
renv::snapshot()
76+
```
77+
:::
4878

4979
### Loading Packages & Inspecting the Data
5080

@@ -61,17 +91,25 @@ library(emo) # emoji dictionary
6191
library(textstem) # lemmatization
6292
```
6393

94+
After running it, you should get:
95+
96+
![](images/output-loaded-packages.png){width="757"}
97+
6498
Alright! With all the necessary packages loaded, let's take a look at the dataset we’ll be working with:
6599

66100
``` r
67101
# Inspecting the data
68102
comments <- readr::read_csv("./data/raw/comments.csv")
69103
```
70104

71-
You’ll notice that we’ve pre-populated a code chunk with Patterns to save you from the tedious task of typing out regular expressions (regex for short). Don’t worry about them for now, we’ll come back to it shortly.
105+
Which should show our dataset contains 5877 comments and two columns and display the comments dataset to our environment:
106+
107+
![](images/output-readingdata.png){width="774"}
108+
109+
In the workbook, you’ll notice that we’ve pre-populated some chunks below to save you from the tedious typing. Don’t worry about them for now, we’ll come back to them shortly.
72110

73111
::: {.callout-note icon="false"}
74112
# 💬 Discussion
75113

76-
Working in pairs or trios, look briefly at the data and discuss the challenges that may arise when attempting to analyze this dataset on its current form. What could be potential areas of friction that could compromise the results?
114+
Working in pairs or trios, look briefly by double clicking comments dataset in the environment panel. Then, discuss could be potential challenges of analyzing this text on its current form. What could be potential areas of friction that could compromise the results?
77115
:::

0 commit comments

Comments
 (0)