In-class exercise#
The in-class exercise is distributed via a GitHub Classroom repository. To get access to your group’s git repository, you can follow this link.
The first person will create a group and set up the repository for the group; the others will land on the webpage of your group’s repository immediately. All group members can then clone the repository.
Introduction#
In this exercise you will use pytask for the first time. You will take code from in-class exercise 6 (the one on functional data management and plotting) so you don’t have to write too much code and can focus on the mechanics of using pytask.
Task 1#
Copy all function definitions, relevant imports, and pandas settings from the solution
to in-class exercise 6, task 1 into a file called task_clean_election_data.py. Note
that there are no I/O operations (loading and saving of data) in those functions.
Add a task function at the top of the file (after import statements but before function definitions) that does the following things:
load the original dataset
load the metadata
call the
clean_datafunction you copiedsave the result in a
bldfolder under a suitable name.
In the project directory, run pytask from a terminal and verify that the task is
executed correctly.
Task 2#
The purpose of this task is to make sure that your dependencies are configured correctly.
Open a shell in the project directory and execute pytask.
Don’t make any changes and run pytask again. The result should be that all tasks are skipped because nothing has changed. If this is not the case, ask for help.
Now delete the created dataset and run pytask again. The result should be that your task runs again and the created dataset is stored on disk. If this is not the case, you have a problem in your product specification (in the
producesargument). Debug it!Now add a comment or docstring in
task_clean_election_data.pyand run pytask again. The result should be that the task is executed again. If this is not the case, you have a problem in your dependency specification (in the other argument(s) of the pytask function). Debug it!
Task 3#
In task 4 of in-class exercise 6 we use the gapminder data. Create the file
task_prepare_gapminder_data.py and add the following functions:
A function called
_reduce_gapminder_datathat takes the raw gapminder data as a DataFrame and returns a reduced DataFrame containing only the countries Algeria, Egypt, Sudan and South AfricaA function called
task_prepare_gapminder_datathat downloads the gapminder data, calls the_reduce_gapminder_datafunction and saves the result under a suitable name in the bld folder.
No matter how little there is to do, you should implement your main logic in a function that works on Python objects (here DataFrames) and delegate all loading and saving to the task function.
Task 4#
Add a file called task_create_simple_lineplot.py and create the simple line-plot from
in-class exercise 6, Task 4.
Again, you can define all functions you need inside the task file but you should define a plotting function that only works on Python objects (and returns Python objects).
Save the generated plot under a suitable name in the bld folder.
Note
Remember from these slides, that you will need to install Chrome in order to export static figures from plotly. You can do this by running:
pixi run plotly_get_chrome
Alternatively you can do in your Python code:
import plotly.io as pio
pio.get_chrome()
Whatever version you prefer, use #2 in case you have trouble on Windows with the shell command.
Task 5#
Add a file called task_create_highlighted_lineplot.py and add a function that creates
the lineplot from tasks 7, 8, and 9 in in-class exercise 6. We just want the final plot
that you see at the end of task 9, not the intermediate plots.
Add a corresponding task function that imports data, calls a function
create_highlighted_lineplot function and saves the generated plot under a suitable
name in the bld folder.
Task 6#
Add a file called task_create_scatterplots.py that loops over a task function in order
to create the three scatterplots from in-class exercise 6, task 3.
Task 7 (Bonus)#
Refactor your plotting code in create_highlighted_lineplot.py such that there is one
main function that calls several private functions that do the actual plotting.
Note that when writing code for complex plots, you are allowed to use functions that modify a figure in-place, i.e. have a side effect.