Skip to content

Commit 658a734

Browse files
committed
add exercises
1 parent 6c86ed8 commit 658a734

File tree

7 files changed

+247
-26
lines changed

7 files changed

+247
-26
lines changed

_episodes/01-create-new-environment.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,12 @@ Python can live in many different places on your computer, and each source may h
2929
By using an anaconda environment that we create, and by explicitly using only that environment, we can avoid conflicts...
3030
and know exactly what environment is being used to run our python code. And we avoid the mess indicated by the above comic!
3131

32+
> ## Reflecting on environment mishaps
33+
> Have you ever been unable to install a package due to a conflict?
34+
>
35+
> How did you solve the problem?
36+
{: .discussion}
37+
3238
## Create an environment from the `environment.yml` file
3339

3440
The necessary packages are specified in the `environment.yml` file.
@@ -116,6 +122,13 @@ jupyter lab
116122
~~~
117123
{: .language-bash}
118124

125+
> ## Alternatives to Anaconda
126+
> Have you ever used a different solution for creating/managing virtual python environments?
127+
> For example, `pipenv` or `virtualenv`?
128+
>
129+
> How does conda differ from these solutions?
130+
{: .discussion}
131+
119132
## Create the environment from scratch
120133

121134
If for some reason you are unable to create the environment from the `environment.yml` file, or you simply wish to do the process for yourself, you can follow these steps. These steps replace the `conda env create --file environment.yml` step in the instructions above.

_episodes/02-data-wrangling.md

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ questions:
1010
objectives:
1111
- "Learn useful pandas functions for wrangling data into a tidy format"
1212
keypoints:
13-
- "Import your CSV using `pd.read_csv('<FILEPATH>')"
13+
- "Import your CSV using `pd.read_csv('<FILEPATH>')`"
1414
- "Transform your dataframe from wide to long with `pd.melt()`"
1515
- "Split column values with `df['<COLUMN>'].str.split('<DELIM>')`"
1616
- "Sort rows using `df.sort_values()`"
@@ -25,6 +25,23 @@ You can click on the `Data` folder and double click on `gapminder_all.csv` to vi
2525

2626
We are going take this very wide dataset and make it very long, so the unit of observation will be each country + year + metric combination, rather than just the country. This process is made much simpler by a couple of functions in the `pandas` library.
2727

28+
> ## Tidy Data
29+
> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
30+
> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
31+
>
32+
> Tidy data follows 3 rules:
33+
> 1. Each variable forms a column
34+
> 2. Each observation forms a row
35+
> 3. Each type of observational unit forms a table
36+
>
37+
> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html).
38+
>
39+
> The revised rules are:
40+
> 1. Each variable must have its own column
41+
> 2. Each observation must have its own row
42+
> 3. Each value must have its own cell
43+
{: .callout}
44+
2845
## Getting Started
2946

3047
Let's go ahead and get started by opening a Jupyter Notebook with the `dataviz` kernel. If you navigated to the `Data` folder to look at the CSV file, navigate back to the root before opening the new notebook.
@@ -109,24 +126,11 @@ df_melted
109126
~~~
110127
{: .language-python}
111128

112-
Take a moment to compare this dataframe to the one we started with. What are some advantages to having the data in this format?
113-
114-
> ## Tidy Data
115-
> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
116-
> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
129+
> ## Wide vs long data
130+
> Take a moment to compare this dataframe to the one we started with.
117131
>
118-
> Tidy data follows 3 rules:
119-
> 1. Each variable forms a column
120-
> 2. Each observation forms a row
121-
> 3. Each type of observational unit forms a table
122-
>
123-
> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html).
124-
>
125-
> The revised rules are:
126-
> 1. Each variable must have its own column
127-
> 2. Each observation must have its own row
128-
> 3. Each value must have its own cell
129-
{: .callout}
132+
> What are some advantages to having the data in this format?
133+
{: .discussion}
130134

131135
## Saving the final dataframe
132136

@@ -148,5 +152,19 @@ df_final.to_csv("Data/gapminder_tidy.csv", index=False)
148152

149153
We set the index to False so that the index column does not get saved to the CSV file.
150154

155+
> ## Imagining other tidy ways to wrangle
156+
> We wrangled our data into a tidy form. However, there is no single "true tidy" form for any given dataset.
157+
>
158+
> What are some other ways you may wish to organize this dataset that are also tidy?
159+
> > ## For Example
160+
> > Instead of having a `metric` and `value` column, given that `metric` only has 3 values,
161+
> > you could have a column each for `gdpPercap`, `lifeExp`, and `pop`.
162+
> >
163+
> > The values in each of those three columns would reflect the value of that metric for a given country in a given year.
164+
> > The columns in this dataset would be: `country`, `continent`, `year`, `gdpPercap`, `lifeExp`, and `pop`.
165+
> {: .solution}
166+
> How would you wrangle the original dataset into this other tidy form using pandas?
167+
{: .discussion}
168+
151169
{% include links.md %}
152170

_episodes/03-create-visualizations.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,13 +115,37 @@ fig.show()
115115

116116
![Plot of Oceania's GDP over time with correct labels](../fig/L3_thirdplot.png)
117117

118-
You can go ahead and experiment with creating different plots for the different continents and metrics.
119-
120118
> ## Interactivity is baked in to Plotly charts
121119
> When you have many more lines, the interactive features of Plotly become very useful.
122120
> Notice how hovering over a line will tell you more information about that point.
123121
> You will also see several options in the upper right corner to further interact with the plot - including saving it as a PNG file!
124122
{: .callout}
125123

124+
## Exercises
125+
126+
> ## Visualize Population in Europe
127+
> Create a plot that visualizes the population of countries in Europe over time.
128+
> > ## Solution
129+
> > ~~~
130+
> > df_pop_eu = df.query("continent=='Europe' & metric=='pop'")
131+
> > fig = px.line(df_pop_eu, x = "year", y = "value", color = "country", title = "Population in Europe", labels={"value": "Population"})
132+
> > fig.show()
133+
> > ~~~
134+
> > {: .language-python}
135+
> {: .solution}
136+
{: .challenge}
137+
138+
> ## Visualize Average Life Expectancy in Asia
139+
> Create a plot that visualizes the average life expectancy of countries in Asia over time.
140+
> > ## Solution
141+
> > ~~~
142+
> > df_le_as = df.query("continent=='Asia' & metric=='lifeExp'")
143+
> > fig = px.line(df_le_as, x = "year", y = "value", color = "country", title = "Life Expectancy in Asia", labels={"value": "Average Life Expectancy"})
144+
> > fig.show()
145+
> > ~~~
146+
> > {: .language-python}
147+
> {: .solution}
148+
{: .challenge}
149+
126150
{% include links.md %}
127151

_episodes/04-create-streamlit-app.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,28 @@ We now have a web application that can allow you to share your interactive visua
165165
> Detailed instructions can be found in [Streamlit's Documentation](https://docs.streamlit.io/en/stable/deploy_streamlit_app.html)
166166
{: .callout}
167167

168+
## Exercises
169+
170+
> ## Add a description
171+
> After the plot is displayed, add some text describing the plot.
172+
> > ## Solution
173+
> > ~~~
174+
> > st.plotly_chart(fig, use_container_width=True) # this line is already in the app
175+
> > st.markdown("This plot shows the GDP Per Capita for countries in Oceania.")
176+
> > ~~~
177+
> > {: .language-python}
178+
> {: .solution}
179+
{: .challenge}
180+
181+
> ## Show me the data!
182+
> After the plot is displayed, also display the dataframe used to generate the plot.
183+
> > ## Solution
184+
> > ~~~
185+
> > st.dataframe(df_gdp_o) # df_gdp_o is defined in the code created in this lesson
186+
> > ~~~
187+
> > {: .language-python}
188+
> {: .solution}
189+
{: .challenge}
168190
169191
{% include links.md %}
170192

_episodes/05-refactoring-for-flexibility.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,5 +240,33 @@ st.plotly_chart(fig, use_container_width=True)
240240

241241
![Streamlit app after this lesson](../fig/streamlit_app_lesson5fin.png)
242242

243+
## Exercises
244+
245+
> ## Add a (flexible) description
246+
> After the plot is displayed, add some text describing the plot.
247+
>
248+
> This time, use F-strings so the description can change with the plot
249+
> > ## Solution
250+
> > ~~~
251+
> > st.markdown(f"This plot shows the {metric_labels[metric]} for countries in {continent}.")
252+
> > ~~~
253+
> > {: .language-python}
254+
> {: .solution}
255+
{: .challenge}
256+
257+
> ## Show me the data! (Maybe)
258+
> After the plot is displayed, also display the dataframe used to generate the plot.
259+
>
260+
> This time, make it optional - only display the dataframe if a variable is set to True.
261+
> > ## Solution
262+
> > ~~~
263+
> > show_data = True
264+
> > if show_data:
265+
> > st.dataframe(df_filtered)
266+
> > ~~~
267+
> > {: .language-python}
268+
> {: .solution}
269+
{: .challenge}
270+
243271
{% include links.md %}
244272

_episodes/06-add-widgets.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,5 +171,72 @@ Save, Rerun, and... Share!
171171

172172
![Final Streamlit app](../fig/streamlit_app_lesson6_3.png)
173173

174+
## Exercises
175+
176+
> ## Show me the data! (If the user wants it)
177+
> After the plot is displayed, also display the dataframe used to generate the plot.
178+
>
179+
> Use a widget so that the user can decide whether to display the data.
180+
>
181+
> (Hint: look at the checkbox!)
182+
> > ## Solution
183+
> > ~~~
184+
> > with st.sidebar:
185+
> > show_data = st.checkbox(label = "Show the data used to generate this plot", value = False)
186+
> >
187+
> > if show_data:
188+
> > st.dataframe(df_filtered)
189+
> > ~~~
190+
> > {: .language-python}
191+
> {: .solution}
192+
{: .challenge}
193+
194+
> ## Limit the countries displayed in the plot
195+
> Add a widget that allows users to limit the countries that will be displayed on the plot.
196+
>
197+
> (Hint: look at the multiselect!)
198+
> > ## Solution
199+
> > ~~~
200+
> > countries_list = list(df_filtered['country'].unique())
201+
> >
202+
> > with st.sidebar:
203+
> > countries = st.multiselect(label = "Which countries should be plotted?", options = countries_list, default = countries_list)
204+
> >
205+
> > df_filtered = df_filtered[df_filtered.country.isin(countries)]
206+
> > ~~~
207+
> > {: .language-python}
208+
> {: .solution}
209+
{: .challenge}
210+
211+
> ## Limit the dates displayed in the plot
212+
> Add a widget that allows users to limit the range of years that will be displayed on the plot.
213+
>
214+
> (Hint: look at the slider!)
215+
> > ## Solution
216+
> > ~~~
217+
> > year_min = int(df_filtered['year'].min())
218+
> > year_max = int(df_filtered['year'].max())
219+
> >
220+
> > with st.sidebar:
221+
> > years = st.slider(label = "What years should be plotted?", min_value = year_min, max_value = year_max, value = (year_min, year_max))
222+
> >
223+
> > df_filtered = df_filtered[(df_filtered.year >= years[0]) & (df_filtered.year <= years[1])]
224+
> > ~~~
225+
> > {: .language-python}
226+
> {: .solution}
227+
{: .challenge}
228+
229+
> ## Improve the description
230+
> After the plot is displayed, add some text describing the plot.
231+
>
232+
> This time, add more to the description based on the information specified by the newly added widgets.
233+
> > ## Solution
234+
> > ~~~
235+
> > st.markdown(f"This plot shows the {metric_labels[metric]} from {years[0]} to {years[1]} for the following countries in {continent}: {', '.join(countries)}")
236+
> > ~~~
237+
> > {: .language-python}
238+
> {: .solution}
239+
{: .challenge}
240+
174241
{% include links.md %}
175242

code/app.py

Lines changed: 55 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,31 +10,80 @@
1010
df = pd.read_csv("Data/gapminder_tidy.csv")
1111

1212
# get a list of all possible continents and metrics, for the widgets
13-
continent_list = list(df['continent'].unique())
14-
metric_list = list(df['metric'].unique())
13+
continent_list = list(df["continent"].unique())
14+
metric_list = list(df["metric"].unique())
1515

1616
# map the actual data values to more readable strings
17-
metric_labels = {"gdpPercap": "GDP Per Capita", "lifeExp": "Average Life Expectancy", "pop": "Population"}
17+
metric_labels = {
18+
"gdpPercap": "GDP Per Capita",
19+
"lifeExp": "Average Life Expectancy",
20+
"pop": "Population",
21+
}
1822

1923
# function to be used in widget argument format_func that maps metric values to readable labels, using dict above
2024
def format_metric(metric_raw):
2125
return metric_labels[metric_raw]
2226

27+
2328
# put all widgets in sidebar and have a subtitle
2429
with st.sidebar:
2530
st.subheader("Configure the plot")
2631
# widget to choose which continent to display
27-
continent = st.selectbox(label = "Choose a continent", options = continent_list)
32+
continent = st.selectbox(label="Choose a continent", options=continent_list)
2833
# widget to choose which metric to display
29-
metric = st.selectbox(label = "Choose a metric", options = metric_list, format_func=format_metric)
34+
metric = st.selectbox(
35+
label="Choose a metric", options=metric_list, format_func=format_metric
36+
)
37+
show_data = st.checkbox(
38+
label="Show the data used to generate this plot", value=False
39+
)
3040

3141
# use selected values from widgets to filter dataset down to only the rows we need
3242
query = f"continent=='{continent}' & metric=='{metric}'"
3343
df_filtered = df.query(query)
3444

45+
# for limiting countries and years (from the exercises)
46+
countries_list = list(df_filtered["country"].unique())
47+
48+
year_min = int(df_filtered["year"].min())
49+
year_max = int(df_filtered["year"].max())
50+
51+
with st.sidebar:
52+
years = st.slider(
53+
label="What years should be plotted?",
54+
min_value=year_min,
55+
max_value=year_max,
56+
value=(year_min, year_max),
57+
)
58+
countries = st.multiselect(
59+
label="Which countries should be plotted?",
60+
options=countries_list,
61+
default=countries_list,
62+
)
63+
64+
df_filtered = df_filtered[df_filtered.country.isin(countries)]
65+
df_filtered = df_filtered[
66+
(df_filtered.year >= years[0]) & (df_filtered.year <= years[1])
67+
]
68+
3569
# create the plot
3670
title = f"{metric_labels[metric]} for countries in {continent}"
37-
fig = px.line(df_filtered, x = "year", y = "value", color = "country", title = title, labels={"value": f"{metric_labels[metric]}"})
71+
fig = px.line(
72+
df_filtered,
73+
x="year",
74+
y="value",
75+
color="country",
76+
title=title,
77+
labels={"value": f"{metric_labels[metric]}"},
78+
)
3879

3980
# display the plot
4081
st.plotly_chart(fig, use_container_width=True)
82+
83+
# display other info (from the exercises)
84+
st.markdown(
85+
f"This plot shows the {metric_labels[metric]} from {years[0]} to {years[1]} for the following countries in {continent}: {', '.join(countries)}"
86+
)
87+
88+
if show_data:
89+
st.dataframe(df_filtered)

0 commit comments

Comments
 (0)