A collection of Jupyter notebooks for working with data from:
- Kingfisher Process
- Data Registry
- Field lists, like from a field-level mappings
To use a notebook:
- Click the Open In Colab button
- Click the File > Save a copy in Drive menu item
- Make your changes (e.g.
collection_ids,schema_name, etc.)
If you encounter unfamiliar errors, try the Runtime > Disconnect and delete runtime menu item. If the error still occurs, please open an issue.
If you make any improvements or fixes, please follow the Contributing guide below to merge your changes back into this repository.
You can also use a notebook without creating a copy. However, if you re-open the notebook, any changes and outputs will be lost.
| Notebook | Open in Colab | Description |
|---|---|---|
| Publisher analysis template | Analyze data from a specific publisher. | |
| Meta analysis template | Analyze data from multiple publishers, or to perform other types of analysis on the Kingfisher Process database. | |
| Basic criteria feedback template | Provide feedback on the OCDS basic criteria. | |
| Structure and format feedback template | Provide feedback on structure and format errors reported by lib-cove-ocds. | |
| Data quality feedback template | Provide detailed feedback on structure, format, conformance and quality issues. | |
| Usability checks template | Provide feedback on data usability for OCDS datasets. | |
| Red flags checks template | Provide feedback on red flags for OCDS datasets. |
| Notebook | Open in Colab | Description |
|---|---|---|
| Usability checks using a field list | Provide feedback on data usability for prospective OCDS publishers, using a field list, like from a field-level mapping. | |
| Usability checks using the Data Registry | Provide feedback on data usability using data from the Data Registry. | |
| Relevant checks using a field list | Provide feedback on data relevance for prospective publishers, using a field list, like from a field-level mapping. | |
| Relevant checks using the Data Registry | Provide feedback on data relevance using data from the Data Registry. | |
| Relevant checks for all the Data Registry publications | Provide feedback on data relevance downloading all the publications from the Data Registry. | |
| Red flags checks using the Data Registry | Provide feedback on coverage for red flags using data from the Data Registry. | |
| Red flags checks using a field list | Provide feedback on red flags for prospective OCDS publishers, using a field list, like from a field-level mapping. | |
| Field list for all the Data Registry publications | Extract the fields published by all the publications from the Data Registry. |
To ease maintenance, the notebooks are made up of reusable components with clear scopes:
- Environment: Setup Google Colaboratory in general
environment: Install requirements, import packages, load extensions and configure the notebook.
- Setup: Setup Google Colaboratory for a data source
setup_charts: Install charts requirements, import charts packages and define plot functions.setup_kingfisher: Connect to the Kingfisher Process database. Choose the collection(s) and schema to work with.setup_fieldlist: Load the field list.setup_metadata_from_registry: Define the functions to list publications and their metadata, including coverage, from the Registry.setup_usability: Define the usability functions.setup_red_flags: Define the red flags functions.
- Errors: Review any issues in loading the data
errors_kingfisher: Check for data collection (Kingfisher Collect) and processing (Kingfisher Process) errors.
- Scope: Understand the scope of the data
scope_kingfisher: Check how many releases and records your data contains. Check the date range and stages of the contracting process covered by your data.scope_usability: Calculate general statistics.
- Check: Perform a category of checks
check_structure: Check for structure and format errors reported by lib-cove-ocds.check_conformance: Check against the OCDS conformance criteria.check_quality: Check for conformance and quality issues that require manual review.check_usability_kingfisher: Usability checks using Kingfisher with coverage.check_usability_external: Usability checks using a field list without coverage.check_relevant: Given a field list, check if the list pass the "relevant" criteria.check_relevant_all_registry: Performs the "relevant" checks against the active publications from the Registry.check_red_flags_external: Red flags checks using a field list without coverage.
- Other
select_data_from_registry: Define the form to select a publication from the Registry.get_field_list_all_registry: Get the fields used by all OCDS publications in the Registry.
Follow the style guide for SQL statements.
- To see which components are used in each notebook, refer to the
NOTEBOOKSvariable inmanage.py. - To add new components to a notebook, add to the entry for the notebook in the
NOTEBOOKSvariable inmanage.py. - To add a new notebook:
- Add an entry for the the notebook and its components to the
NOTEBOOKSvariable inmanage.py. - Update the Notebooks section of the
README.md.
- Add an entry for the the notebook and its components to the
- Create a branch.
- Create a new notebook
- Set a title using H2 formatting, and add your cells.
- Commit your changes:
- Click Edit -> Clear all outputs.
- Click File -> Save.
- Select the 'notebooks-ocds' repository.
- Select your branch, enter a commit message and click OK.
- Uncheck 'Include a link to Colab'
- Request a review:
- Create a pull request.
- Request a review from a data support manager.
- If the reviewer requests changes, make the changes then repeat this step.
- Once approved, you can merge your own changes.
Reminder: If you change headings or add sections, check whether any related Document template in this process note needs an update.
Jupytext is used to encode notebooks as Markdown files (if code cells are mostly SQL) or Python files.
Python files use the light format. For example:
# ## A heading
#
# A paragraph
python_code = "cell"
second_line = "code"
another_code_cell = TrueTo merge code cells, use start-of-cell and end-of-cell delimiters:
# +
code = "cell"
same = "cell"
# -To hide code:
# +
# @title My title { display-mode: "form" }
python = "code"
# -The end-of-cell delimiter is optional if the next cell is also hidden, or at the end of the file.
To add SQL:
# + language="sql"
# SELECT 1
# -Or:
# + magic_args="my_variable <<" language="sql"
# SELECT 1
# -