add exercises

jenna-jordan · jenna-jordan · commit 658a7340096f · 2021-08-22T22:10:13.000-04:00
diff --git a/_episodes/01-create-new-environment.md b/_episodes/01-create-new-environment.md
@@ -29,6 +29,12 @@ Python can live in many different places on your computer, and each source may h
 By using an anaconda environment that we create, and by explicitly using only that environment, we can avoid conflicts...
 and know exactly what environment is being used to run our python code. And we avoid the mess indicated by the above comic!
 
+> ## Reflecting on environment mishaps
+> Have you ever been unable to install a package due to a conflict?
+>
+> How did you solve the problem?
+{: .discussion}
+
 ## Create an environment from the `environment.yml` file
 
 The necessary packages are specified in the `environment.yml` file. 
@@ -116,6 +122,13 @@ jupyter lab
 ~~~
 {: .language-bash}
 
+> ## Alternatives to Anaconda
+> Have you ever used a different solution for creating/managing virtual python environments?
+> For example, `pipenv` or `virtualenv`?
+>
+> How does conda differ from these solutions?
+{: .discussion}
+
 ## Create the environment from scratch
 
 If for some reason you are unable to create the environment from the `environment.yml` file, or you simply wish to do the process for yourself, you can follow these steps. These steps replace the `conda env create --file environment.yml` step in the instructions above.
diff --git a/_episodes/02-data-wrangling.md b/_episodes/02-data-wrangling.md
@@ -10,7 +10,7 @@ questions:
 objectives:
 - "Learn useful pandas functions for wrangling data into a tidy format"
 keypoints:
-- "Import your CSV using `pd.read_csv('<FILEPATH>')"
+- "Import your CSV using `pd.read_csv('<FILEPATH>')`"
 - "Transform your dataframe from wide to long with `pd.melt()`"
 - "Split column values with `df['<COLUMN>'].str.split('<DELIM>')`"
 - "Sort rows using `df.sort_values()`"
@@ -25,6 +25,23 @@ You can click on the `Data` folder and double click on `gapminder_all.csv` to vi
 
 We are going take this very wide dataset and make it very long, so the unit of observation will be each country + year + metric combination, rather than just the country. This process is made much simpler by a couple of functions in the `pandas` library.
 
+> ## Tidy Data
+> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
+> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
+>
+> Tidy data follows 3 rules:
+> 1. Each variable forms a column
+> 2. Each observation forms a row
+> 3. Each type of observational unit forms a table
+>
+> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html). 
+>
+> The revised rules are:
+> 1. Each variable must have its own column
+> 2. Each observation must have its own row
+> 3. Each value must have its own cell
+{: .callout}
+
 ## Getting Started
 
 Let's go ahead and get started by opening a Jupyter Notebook with the `dataviz` kernel. If you navigated to the `Data` folder to look at the CSV file, navigate back to the root before opening the new notebook. 
@@ -109,24 +126,11 @@ df_melted
 ~~~
 {: .language-python}
 
-Take a moment to compare this dataframe to the one we started with. What are some advantages to having the data in this format?
-
-> ## Tidy Data
-> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
-> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
+> ## Wide vs long data
+> Take a moment to compare this dataframe to the one we started with. 
 >
-> Tidy data follows 3 rules:
-> 1. Each variable forms a column
-> 2. Each observation forms a row
-> 3. Each type of observational unit forms a table
->
-> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html). 
->
-> The revised rules are:
-> 1. Each variable must have its own column
-> 2. Each observation must have its own row
-> 3. Each value must have its own cell
-{: .callout}
+> What are some advantages to having the data in this format?
+{: .discussion}
 
 ## Saving the final dataframe
 
@@ -148,5 +152,19 @@ df_final.to_csv("Data/gapminder_tidy.csv", index=False)
 
 We set the index to False so that the index column does not get saved to the CSV file.
 
+> ## Imagining other tidy ways to wrangle
+> We wrangled our data into a tidy form. However, there is no single "true tidy" form for any given dataset.
+>
+> What are some other ways you may wish to organize this dataset that are also tidy?
+> > ## For Example
+> > Instead of having a `metric` and `value` column, given that `metric` only has 3 values, 
+> > you could have a column each for `gdpPercap`, `lifeExp`, and `pop`. 
+> > 
+> > The values in each of those three columns would reflect the value of that metric for a given country in a given year.
+> > The columns in this dataset would be: `country`, `continent`, `year`, `gdpPercap`, `lifeExp`, and `pop`.
+> {: .solution}
+> How would you wrangle the original dataset into this other tidy form using pandas?
+{: .discussion}
+
 {% include links.md %}
 
diff --git a/_episodes/03-create-visualizations.md b/_episodes/03-create-visualizations.md
@@ -115,13 +115,37 @@ fig.show()
 
 ![Plot of Oceania's GDP over time with correct labels](../fig/L3_thirdplot.png)
 
-You can go ahead and experiment with creating different plots for the different continents and metrics.
-
 > ## Interactivity is baked in to Plotly charts
 > When you have many more lines, the interactive features of Plotly become very useful. 
 > Notice how hovering over a line will tell you more information about that point. 
 > You will also see several options in the upper right corner to further interact with the plot - including saving it as a PNG file!
 {: .callout}
 
+## Exercises
+
+> ## Visualize Population in Europe
+> Create a plot that visualizes the population of countries in Europe over time.
+> > ## Solution
+> > ~~~
+> > df_pop_eu = df.query("continent=='Europe' & metric=='pop'")
+> > fig = px.line(df_pop_eu, x = "year", y = "value", color = "country", title = "Population in Europe", labels={"value": "Population"})
+> > fig.show()
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
+> ## Visualize Average Life Expectancy in Asia
+> Create a plot that visualizes the average life expectancy of countries in Asia over time.
+> > ## Solution
+> > ~~~
+> > df_le_as = df.query("continent=='Asia' & metric=='lifeExp'")
+> > fig = px.line(df_le_as, x = "year", y = "value", color = "country", title = "Life Expectancy in Asia", labels={"value": "Average Life Expectancy"})
+> > fig.show()
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
 {% include links.md %}
 
diff --git a/_episodes/04-create-streamlit-app.md b/_episodes/04-create-streamlit-app.md
@@ -165,6 +165,28 @@ We now have a web application that can allow you to share your interactive visua
 > Detailed instructions can be found in [Streamlit's Documentation](https://docs.streamlit.io/en/stable/deploy_streamlit_app.html)
 {: .callout}
 
+## Exercises
+
+> ## Add a description
+> After the plot is displayed, add some text describing the plot.
+> > ## Solution
+> > ~~~
+> > st.plotly_chart(fig, use_container_width=True) # this line is already in the app
+> > st.markdown("This plot shows the GDP Per Capita for countries in Oceania.")
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
+> ## Show me the data!
+> After the plot is displayed, also display the dataframe used to generate the plot.
+> > ## Solution
+> > ~~~
+> > st.dataframe(df_gdp_o) # df_gdp_o is defined in the code created in this lesson
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
 
 {% include links.md %}
 
diff --git a/_episodes/05-refactoring-for-flexibility.md b/_episodes/05-refactoring-for-flexibility.md
@@ -240,5 +240,33 @@ st.plotly_chart(fig, use_container_width=True)
 
 ![Streamlit app after this lesson](../fig/streamlit_app_lesson5fin.png)
 
+## Exercises
+
+> ## Add a (flexible) description
+> After the plot is displayed, add some text describing the plot. 
+>
+> This time, use F-strings so the description can change with the plot
+> > ## Solution
+> > ~~~
+> > st.markdown(f"This plot shows the {metric_labels[metric]} for countries in {continent}.")
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
+> ## Show me the data! (Maybe)
+> After the plot is displayed, also display the dataframe used to generate the plot.
+>
+> This time, make it optional - only display the dataframe if a variable is set to True.
+> > ## Solution
+> > ~~~
+> > show_data = True
+> > if show_data:
+> >     st.dataframe(df_filtered)
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
 {% include links.md %}
 
diff --git a/_episodes/06-add-widgets.md b/_episodes/06-add-widgets.md
@@ -171,5 +171,72 @@ Save, Rerun, and... Share!
 
 ![Final Streamlit app](../fig/streamlit_app_lesson6_3.png)
 
+## Exercises
+
+> ## Show me the data! (If the user wants it)
+> After the plot is displayed, also display the dataframe used to generate the plot.
+>
+> Use a widget so that the user can decide whether to display the data. 
+> 
+> (Hint: look at the checkbox!)
+> > ## Solution
+> > ~~~
+> > with st.sidebar:
+> >     show_data = st.checkbox(label = "Show the data used to generate this plot", value = False)
+> > 
+> > if show_data:
+> >     st.dataframe(df_filtered)
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
+> ## Limit the countries displayed in the plot
+> Add a widget that allows users to limit the countries that will be displayed on the plot.
+> 
+> (Hint: look at the multiselect!)
+> > ## Solution
+> > ~~~
+> > countries_list = list(df_filtered['country'].unique())
+> > 
+> > with st.sidebar:
+> >     countries = st.multiselect(label = "Which countries should be plotted?", options = countries_list, default = countries_list)
+> > 
+> > df_filtered = df_filtered[df_filtered.country.isin(countries)]
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
+> ## Limit the dates displayed in the plot
+> Add a widget that allows users to limit the range of years that will be displayed on the plot.
+> 
+> (Hint: look at the slider!)
+> > ## Solution
+> > ~~~
+> > year_min = int(df_filtered['year'].min())
+> > year_max = int(df_filtered['year'].max())
+> >
+> > with st.sidebar:
+> >     years = st.slider(label = "What years should be plotted?", min_value = year_min, max_value = year_max, value = (year_min, year_max))
+> >
+> > df_filtered = df_filtered[(df_filtered.year >= years[0]) & (df_filtered.year <= years[1])]
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
+> ## Improve the description
+> After the plot is displayed, add some text describing the plot. 
+>
+> This time, add more to the description based on the information specified by the newly added widgets.
+> > ## Solution
+> > ~~~
+> > st.markdown(f"This plot shows the {metric_labels[metric]} from {years[0]} to {years[1]} for the following countries in {continent}: {', '.join(countries)}")
+> > ~~~
+> > {: .language-python}
+> {: .solution}
+{: .challenge}
+
 {% include links.md %}
 
diff --git a/code/app.py b/code/app.py
@@ -10,31 +10,80 @@
 df = pd.read_csv("Data/gapminder_tidy.csv")
 
 # get a list of all possible continents and metrics, for the widgets
-continent_list = list(df['continent'].unique())
-metric_list = list(df['metric'].unique())
+continent_list = list(df["continent"].unique())
+metric_list = list(df["metric"].unique())
 
 # map the actual data values to more readable strings
-metric_labels = {"gdpPercap": "GDP Per Capita", "lifeExp": "Average Life Expectancy", "pop": "Population"}
+metric_labels = {
+    "gdpPercap": "GDP Per Capita",
+    "lifeExp": "Average Life Expectancy",
+    "pop": "Population",
+}
 
 # function to be used in widget argument format_func that maps metric values to readable labels, using dict above
 def format_metric(metric_raw):
     return metric_labels[metric_raw]
 
+
 # put all widgets in sidebar and have a subtitle
 with st.sidebar:
     st.subheader("Configure the plot")
     # widget to choose which continent to display
-    continent = st.selectbox(label = "Choose a continent", options = continent_list)
+    continent = st.selectbox(label="Choose a continent", options=continent_list)
     # widget to choose which metric to display
-    metric = st.selectbox(label = "Choose a metric", options = metric_list, format_func=format_metric)
+    metric = st.selectbox(
+        label="Choose a metric", options=metric_list, format_func=format_metric
+    )
+    show_data = st.checkbox(
+        label="Show the data used to generate this plot", value=False
+    )
 
 # use selected values from widgets to filter dataset down to only the rows we need
 query = f"continent=='{continent}' & metric=='{metric}'"
 df_filtered = df.query(query)
 
+# for limiting countries and years (from the exercises)
+countries_list = list(df_filtered["country"].unique())
+
+year_min = int(df_filtered["year"].min())
+year_max = int(df_filtered["year"].max())
+
+with st.sidebar:
+    years = st.slider(
+        label="What years should be plotted?",
+        min_value=year_min,
+        max_value=year_max,
+        value=(year_min, year_max),
+    )
+    countries = st.multiselect(
+        label="Which countries should be plotted?",
+        options=countries_list,
+        default=countries_list,
+    )
+
+df_filtered = df_filtered[df_filtered.country.isin(countries)]
+df_filtered = df_filtered[
+    (df_filtered.year >= years[0]) & (df_filtered.year <= years[1])
+]
+
 # create the plot
 title = f"{metric_labels[metric]} for countries in {continent}"
-fig = px.line(df_filtered, x = "year", y = "value", color = "country", title = title, labels={"value": f"{metric_labels[metric]}"})
+fig = px.line(
+    df_filtered,
+    x="year",
+    y="value",
+    color="country",
+    title=title,
+    labels={"value": f"{metric_labels[metric]}"},
+)
 
 # display the plot
 st.plotly_chart(fig, use_container_width=True)
+
+# display other info (from the exercises)
+st.markdown(
+    f"This plot shows the {metric_labels[metric]} from {years[0]} to {years[1]} for the following countries in {continent}: {', '.join(countries)}"
+)
+
+if show_data:
+    st.dataframe(df_filtered)