You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/01-create-new-environment.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,6 +29,12 @@ Python can live in many different places on your computer, and each source may h
29
29
By using an anaconda environment that we create, and by explicitly using only that environment, we can avoid conflicts...
30
30
and know exactly what environment is being used to run our python code. And we avoid the mess indicated by the above comic!
31
31
32
+
> ## Reflecting on environment mishaps
33
+
> Have you ever been unable to install a package due to a conflict?
34
+
>
35
+
> How did you solve the problem?
36
+
{: .discussion}
37
+
32
38
## Create an environment from the `environment.yml` file
33
39
34
40
The necessary packages are specified in the `environment.yml` file.
@@ -116,6 +122,13 @@ jupyter lab
116
122
~~~
117
123
{: .language-bash}
118
124
125
+
> ## Alternatives to Anaconda
126
+
> Have you ever used a different solution for creating/managing virtual python environments?
127
+
> For example, `pipenv` or `virtualenv`?
128
+
>
129
+
> How does conda differ from these solutions?
130
+
{: .discussion}
131
+
119
132
## Create the environment from scratch
120
133
121
134
If for some reason you are unable to create the environment from the `environment.yml` file, or you simply wish to do the process for yourself, you can follow these steps. These steps replace the `conda env create --file environment.yml` step in the instructions above.
Copy file name to clipboardExpand all lines: _episodes/02-data-wrangling.md
+36-18Lines changed: 36 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ questions:
10
10
objectives:
11
11
- "Learn useful pandas functions for wrangling data into a tidy format"
12
12
keypoints:
13
-
- "Import your CSV using `pd.read_csv('<FILEPATH>')"
13
+
- "Import your CSV using `pd.read_csv('<FILEPATH>')`"
14
14
- "Transform your dataframe from wide to long with `pd.melt()`"
15
15
- "Split column values with `df['<COLUMN>'].str.split('<DELIM>')`"
16
16
- "Sort rows using `df.sort_values()`"
@@ -25,6 +25,23 @@ You can click on the `Data` folder and double click on `gapminder_all.csv` to vi
25
25
26
26
We are going take this very wide dataset and make it very long, so the unit of observation will be each country + year + metric combination, rather than just the country. This process is made much simpler by a couple of functions in the `pandas` library.
27
27
28
+
> ## Tidy Data
29
+
> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
30
+
> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
31
+
>
32
+
> Tidy data follows 3 rules:
33
+
> 1. Each variable forms a column
34
+
> 2. Each observation forms a row
35
+
> 3. Each type of observational unit forms a table
36
+
>
37
+
> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html).
38
+
>
39
+
> The revised rules are:
40
+
> 1. Each variable must have its own column
41
+
> 2. Each observation must have its own row
42
+
> 3. Each value must have its own cell
43
+
{: .callout}
44
+
28
45
## Getting Started
29
46
30
47
Let's go ahead and get started by opening a Jupyter Notebook with the `dataviz` kernel. If you navigated to the `Data` folder to look at the CSV file, navigate back to the root before opening the new notebook.
@@ -109,24 +126,11 @@ df_melted
109
126
~~~
110
127
{: .language-python}
111
128
112
-
Take a moment to compare this dataframe to the one we started with. What are some advantages to having the data in this format?
113
-
114
-
> ## Tidy Data
115
-
> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
116
-
> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
129
+
> ## Wide vs long data
130
+
> Take a moment to compare this dataframe to the one we started with.
117
131
>
118
-
> Tidy data follows 3 rules:
119
-
> 1. Each variable forms a column
120
-
> 2. Each observation forms a row
121
-
> 3. Each type of observational unit forms a table
122
-
>
123
-
> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html).
124
-
>
125
-
> The revised rules are:
126
-
> 1. Each variable must have its own column
127
-
> 2. Each observation must have its own row
128
-
> 3. Each value must have its own cell
129
-
{: .callout}
132
+
> What are some advantages to having the data in this format?
> > fig = px.line(df_le_as, x = "year", y = "value", color = "country", title = "Life Expectancy in Asia", labels={"value": "Average Life Expectancy"})
> After the plot is displayed, add some text describing the plot.
231
+
>
232
+
> This time, add more to the description based on the information specified by the newly added widgets.
233
+
> > ## Solution
234
+
> > ~~~
235
+
> > st.markdown(f"This plot shows the {metric_labels[metric]} from {years[0]} to {years[1]} for the following countries in {continent}: {', '.join(countries)}")
0 commit comments