|
| 1 | +--- |
| 2 | +title: "First published release of `check-datapackage`!" |
| 3 | +description: "We've published our second Python package. :tada: :grin: This package checks that a Data Package is compliant with its specification." |
| 4 | +author: |
| 5 | +- Luke W. Johnston |
| 6 | +date: "2025-12-08" |
| 7 | +categories: |
| 8 | + - packaging |
| 9 | + - publishing |
| 10 | + - programming |
| 11 | +--- |
| 12 | + |
| 13 | +On November 27th, 2025, we published our second Python package to |
| 14 | +[PyPI](https://pypi.org/project/check-datapackage). This package forms |
| 15 | +the basis for ensuring that any metadata we create or edit for a [Data |
| 16 | +Package](https://decisions.seedcase-project.org/why-frictionless-data/) |
| 17 | +is correct and compliant with the [Data Package |
| 18 | +standard](https://datapackage.org). And since we are and will be working |
| 19 | +with and managing many Data Packages over the coming years, this is an |
| 20 | +important tool for us to have! |
| 21 | + |
| 22 | +## What's `check-datapackage`? |
| 23 | + |
| 24 | +As with all our packages and software tools, we have a dedicated website |
| 25 | +for |
| 26 | +[`check-datapackage`](https://check-datapackage.seedcase-project.org). |
| 27 | +So, rather than repeat what is already in that website, this post gives |
| 28 | +a very quick overview of what it is and why you might want to use it. It |
| 29 | +can be summarised by its tagline: |
| 30 | + |
| 31 | +> Ensure the compliance of your Data Package metadata |
| 32 | +
|
| 33 | +The "only" thing it does is checks the content of a `datapackage.json` |
| 34 | +file against the standard. Nothing fancy. But we designed it to be |
| 35 | +configurable, so that if you have specific needs for your Data Package, |
| 36 | +you can adjust the checks accordingly. For example, if you want to |
| 37 | +ensure that certain fields are always present in the metadata, you can |
| 38 | +set up the checks to enforce that. |
| 39 | + |
| 40 | +For now, `check-datapackage` is only a few Python functions and classes |
| 41 | +that you can use within your own Python scripts. But in the future, we |
| 42 | +plan to develop a command-line interface (CLI) so that you can use it |
| 43 | +directly from your terminal without needing to write any code. Along |
| 44 | +with including a config file, we hope to incorporate `check-datapackage` |
| 45 | +into typical build tools or automated check workflows. |
| 46 | + |
| 47 | +## Why use it? |
| 48 | + |
| 49 | +We wanted this package to be incredibly simple and focused in its scope. |
| 50 | +If you install or use it, you know exactly what it does. It also doesn't |
| 51 | +include extra dependencies or features that you might not need. We |
| 52 | +wanted it lightweight and easy to use. |
| 53 | + |
| 54 | +While there are a few tools that provide some type of checks of Data |
| 55 | +Packages, such as the |
| 56 | +[frictionless-py](https://pypi.org/project/frictionless/) package, we |
| 57 | +didn't want all the extras that came with these packages. Nor are these |
| 58 | +tools easy to configure for our needs. In this regard, there were no |
| 59 | +tools available that fit ours needs. So we built our own package that |
| 60 | +does exactly what we need. And hopefully it might be useful for you too! |
| 61 | + |
| 62 | +Eventually, when we develop `check-datapackage` as a CLI, you could |
| 63 | +include it as a [pre-commit hook](https://pre-commit.com/) or part of |
| 64 | +your [continuous |
| 65 | +integration](https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration) |
| 66 | +workflow so that every time you make changes to your Data Package |
| 67 | +metadata, it is automatically checked for compliance. That way, you will |
| 68 | +always know that everything is good with your Data Package metadata. At |
| 69 | +least, good according to the standard and your specific needs! |
| 70 | + |
| 71 | +### Example use |
| 72 | + |
| 73 | +We have a detailed |
| 74 | +[guide](https://check-datapackage.seedcase-project.org/docs/guide/) on |
| 75 | +how to use `check-datapackage`. But I'll briefly show how you might use |
| 76 | +`check-datapackage`. The main function you would use is `check()`, which |
| 77 | +takes as input the properties of a Data Package (i.e., the contents of |
| 78 | +the `datapackage.json` file) as a Python dictionary. |
| 79 | + |
| 80 | +``` python |
| 81 | +import check_datapackage as cdp |
| 82 | + |
| 83 | +# Normally you'd read in the `datapackage.json` file, but we'll |
| 84 | +# show the actual contents here as a Python dict. |
| 85 | +properties = { |
| 86 | + "name": "woolly-dormice", |
| 87 | + "id": "123-abc-123", |
| 88 | + "resources": [{ |
| 89 | + "name": "woolly-dormice-2015", |
| 90 | + "path": "data.csv", |
| 91 | + "schema": {"fields": [{ |
| 92 | + "name": "eye-colour", |
| 93 | + "type": "string", |
| 94 | + }]}, |
| 95 | + }], |
| 96 | +} |
| 97 | + |
| 98 | +cdp.check(properties) |
| 99 | +``` |
| 100 | + |
| 101 | +At a minimum, a Data Package needs to have a `resources` property. So in |
| 102 | +this case, there are no issues with the Data Package. But if you were to |
| 103 | +remove the `resources` property, which is required, and run the check |
| 104 | +again, there would be an issue: |
| 105 | + |
| 106 | +``` python |
| 107 | +del properties["resources"] |
| 108 | +cdp.check(properties) |
| 109 | +``` |
| 110 | + |
| 111 | +If you want these checks to be treated as an error, you set the |
| 112 | +parameter `error` to `True`: |
| 113 | + |
| 114 | +``` python |
| 115 | +cdp.check(properties, error=True) |
| 116 | +``` |
| 117 | + |
| 118 | +If you wanted to exclude certain checks, you can do that by using the |
| 119 | +`Config` and `Exclusion` classes. For example, if you wanted to ignore |
| 120 | +all required checks, you could do: |
| 121 | + |
| 122 | +``` python |
| 123 | +exclusion_required = cdp.Exclusion(type="required") |
| 124 | +config = cdp.Config(exclusions=[exclusion_required]) |
| 125 | +cdp.check(properties=package_properties, config=config) |
| 126 | +``` |
| 127 | + |
| 128 | +If you wanted the issues listed in a more human-friendly way, we have |
| 129 | +the `explain()` function that takes the list of issues returned by |
| 130 | +`check()` and formats them nicely: |
| 131 | + |
| 132 | +``` python |
| 133 | +issues = cdp.check(properties) |
| 134 | +cdp.explain(issues) |
| 135 | +``` |
| 136 | + |
| 137 | +There's many other things you can configure in `check-datapackage`, so |
| 138 | +be sure to check out the |
| 139 | +[website](https://check-datapackage.seedcase-project.org) for more |
| 140 | +information! |
0 commit comments