|
| 1 | +--- |
| 2 | +title: "First published release of `check-datapackage`!" |
| 3 | +description: "We've published our second Python package. :tada: :grin: This package checks that a Data Package is compliant with its specification." |
| 4 | +author: |
| 5 | +- Luke W. Johnston |
| 6 | +date: "2025-12-08" |
| 7 | +categories: |
| 8 | + - packaging |
| 9 | + - publishing |
| 10 | + - programming |
| 11 | +--- |
| 12 | + |
| 13 | +On November 27th, 2025, we published our second Python package to |
| 14 | +[PyPI](https://pypi.org/project/check-datapackage). This package forms |
| 15 | +the basis for ensuring that any metadata created or edited for a [Data |
| 16 | +Package](https://decisions.seedcase-project.org/why-frictionless-data/) |
| 17 | +is correct and compliant with the [Data Package |
| 18 | +standard](https://datapackage.org). Since we are and will be working |
| 19 | +with and managing many Data Packages over the coming years, this is an |
| 20 | +important tool for us to have! Generally, this will be a helpful tool |
| 21 | +for anyone working with and managing Data Packages. |
| 22 | + |
| 23 | +## What's `check-datapackage`? |
| 24 | + |
| 25 | +As with all our packages and software tools, we have a dedicated website |
| 26 | +for |
| 27 | +[`check-datapackage`](https://check-datapackage.seedcase-project.org). |
| 28 | +So, rather than repeat what is already in that website, this post gives |
| 29 | +a very quick overview of what this package does and why you might want |
| 30 | +to use it. It can be summarised by its tagline: |
| 31 | + |
| 32 | +> Ensure the compliance of your Data Package metadata |
| 33 | +
|
| 34 | +The "only" thing `check-datapackage` does is to check the content of a |
| 35 | +`datapackage.json` file against the Data Package standard. Nothing |
| 36 | +fancy. But we designed it to be configurable, so that if you have |
| 37 | +specific needs for your Data Package, you can adjust the checks |
| 38 | +accordingly. It's possible to both add checks on top of the standard or |
| 39 | +ignore certain checks from the standard. For example, if you want to |
| 40 | +ensure that certain fields that aren't required by the standard are |
| 41 | +always present in the metadata, you can set up the checks to enforce |
| 42 | +that. |
| 43 | + |
| 44 | +For now, `check-datapackage` is only a few Python functions and classes |
| 45 | +that you can use within your own Python scripts. But in the future, we |
| 46 | +plan to develop a command-line interface (CLI) so that you can use it |
| 47 | +directly from your terminal without needing to write any code. Along |
| 48 | +with including a config file, we hope to incorporate `check-datapackage` |
| 49 | +into typical build tools and automated check workflows. |
| 50 | + |
| 51 | +## Why use it? |
| 52 | + |
| 53 | +We wanted this package to be incredibly simple and focused. It also |
| 54 | +doesn't include extra dependencies or features that you might not need. |
| 55 | +We wanted it lightweight and easy to use. |
| 56 | + |
| 57 | +While there are a few tools that provide some type of checks of Data |
| 58 | +Packages, such as the |
| 59 | +[frictionless-py](https://pypi.org/project/frictionless/) package, we |
| 60 | +didn't want all the extras that came with these packages. Nor are these |
| 61 | +tools easy to configure for our needs. In this regard, there were no |
| 62 | +tools available that fit ours needs. So, we built our own package that |
| 63 | +does exactly what we need. Hopefully, it will be useful for other people |
| 64 | +too! |
| 65 | + |
| 66 | +Eventually, when we develop `check-datapackage` as a CLI, you could |
| 67 | +include it as a [pre-commit hook](https://pre-commit.com/) or part of |
| 68 | +your [continuous |
| 69 | +integration](https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration) |
| 70 | +workflow so that every time you make changes to your Data Package |
| 71 | +metadata, it is automatically checked for compliance. That way, you will |
| 72 | +always know that your Data Package metadata lives up to the standard and |
| 73 | +your configuration. |
| 74 | + |
| 75 | +### Example use |
| 76 | + |
| 77 | +We have a detailed |
| 78 | +[guide](https://check-datapackage.seedcase-project.org/docs/guide/) on |
| 79 | +how to use `check-datapackage`. But we'll briefly show how you might use |
| 80 | +`check-datapackage`. The main function of the package is `check()`, |
| 81 | +which takes as input the properties of a Data Package (i.e., the |
| 82 | +contents of the `datapackage.json` file) as a Python dictionary and |
| 83 | +checks it against the standard. |
| 84 | + |
| 85 | +``` python |
| 86 | +import check_datapackage as cdp |
| 87 | + |
| 88 | +# Normally you'd read in the `datapackage.json` file, but we'll |
| 89 | +# show the actual contents here as a Python dict. You can use |
| 90 | +# the `read_json()` helper function to read in `datapackage.json` |
| 91 | +properties = { |
| 92 | + "name": "woolly-dormice", |
| 93 | + "id": "123-abc-123", |
| 94 | + "resources": [{ |
| 95 | + "name": "woolly-dormice-2015", |
| 96 | + "path": "data.csv", |
| 97 | + "schema": {"fields": [{ |
| 98 | + "name": "eye-colour", |
| 99 | + "type": "string", |
| 100 | + }]}, |
| 101 | + }], |
| 102 | +} |
| 103 | + |
| 104 | +cdp.check(properties) |
| 105 | +``` |
| 106 | + |
| 107 | +At a minimum, a Data Package needs to have a `resources` property. So in |
| 108 | +this case, there are no issues with the Data Package. But if you were to |
| 109 | +remove the `resources` property, which is required, and run the check |
| 110 | +again, there would be an issue: |
| 111 | + |
| 112 | +``` python |
| 113 | +del properties["resources"] |
| 114 | +cdp.check(properties) |
| 115 | +``` |
| 116 | + |
| 117 | +If you want these checks to be treated as an error, you set the |
| 118 | +parameter `error` to `True`: |
| 119 | + |
| 120 | +``` python |
| 121 | +cdp.check(properties, error=True) |
| 122 | +``` |
| 123 | + |
| 124 | +If you want to exclude certain checks, you can do that by using the |
| 125 | +`Config` and `Exclusion` classes. For example, if you want to exclude |
| 126 | +all required checks, you can define the exclusion, add it to the |
| 127 | +configuration, and pass it to the check function like so: |
| 128 | + |
| 129 | +``` python |
| 130 | +exclusion_required = cdp.Exclusion(type="required") |
| 131 | +config = cdp.Config(exclusions=[exclusion_required]) |
| 132 | +cdp.check(properties=package_properties, config=config) |
| 133 | +``` |
| 134 | + |
| 135 | +If you want the issues listed in a more human-friendly way, you can use |
| 136 | +the `explain()` function that takes the list of issues returned by |
| 137 | +`check()` and formats them nicely: |
| 138 | + |
| 139 | +``` python |
| 140 | +issues = cdp.check(properties) |
| 141 | +cdp.explain(issues) |
| 142 | +``` |
| 143 | + |
| 144 | +There's many other checks you can configure with `check-datapackage`, so |
| 145 | +be sure to check out the |
| 146 | +[website](https://check-datapackage.seedcase-project.org) for more |
| 147 | +information! |
0 commit comments