diff --git a/posts/published-check-datapackage/index.qmd b/posts/published-check-datapackage/index.qmd new file mode 100644 index 0000000..0ce3cdc --- /dev/null +++ b/posts/published-check-datapackage/index.qmd @@ -0,0 +1,147 @@ +--- +title: "First published release of `check-datapackage`!" +description: "We've published our second Python package. :tada: :grin: This package checks that a Data Package is compliant with its specification." +author: +- Luke W. Johnston +date: "2025-12-08" +categories: + - packaging + - publishing + - programming +--- + +On November 27th, 2025, we published our second Python package to +[PyPI](https://pypi.org/project/check-datapackage). This package forms +the basis for ensuring that any metadata created or edited for a [Data +Package](https://decisions.seedcase-project.org/why-frictionless-data/) +is correct and compliant with the [Data Package +standard](https://datapackage.org). Since we are and will be working +with and managing many Data Packages over the coming years, this is an +important tool for us to have! Generally, this will be a helpful tool +for anyone working with and managing Data Packages. + +## What's `check-datapackage`? + +As with all our packages and software tools, we have a dedicated website +for +[`check-datapackage`](https://check-datapackage.seedcase-project.org). +So, rather than repeat what is already in that website, this post gives +a very quick overview of what this package does and why you might want +to use it. It can be summarised by its tagline: + +> Ensure the compliance of your Data Package metadata + +The "only" thing `check-datapackage` does is to check the content of a +`datapackage.json` file against the Data Package standard. Nothing +fancy. But we designed it to be configurable, so that if you have +specific needs for your Data Package, you can adjust the checks +accordingly. It's possible to both add checks on top of the standard or +ignore certain checks from the standard. For example, if you want to +ensure that certain fields that aren't required by the standard are +always present in the metadata, you can set up the checks to enforce +that. + +For now, `check-datapackage` is only a few Python functions and classes +that you can use within your own Python scripts. But in the future, we +plan to develop a command-line interface (CLI) so that you can use it +directly from your terminal without needing to write any code. Along +with including a config file, we hope to incorporate `check-datapackage` +into typical build tools and automated check workflows. + +## Why use it? + +We wanted this package to be incredibly simple and focused. It also +doesn't include extra dependencies or features that you might not need. +We wanted it lightweight and easy to use. + +While there are a few tools that provide some type of checks of Data +Packages, such as the +[frictionless-py](https://pypi.org/project/frictionless/) package, we +didn't want all the extras that came with these packages. Nor are these +tools easy to configure for our needs. In this regard, there were no +tools available that fit ours needs. So, we built our own package that +does exactly what we need. Hopefully, it will be useful for other people +too! + +Eventually, when we develop `check-datapackage` as a CLI, you could +include it as a [pre-commit hook](https://pre-commit.com/) or part of +your [continuous +integration](https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration) +workflow so that every time you make changes to your Data Package +metadata, it is automatically checked for compliance. That way, you will +always know that your Data Package metadata lives up to the standard and +your configuration. + +### Example use + +We have a detailed +[guide](https://check-datapackage.seedcase-project.org/docs/guide/) on +how to use `check-datapackage`. But we'll briefly show how you might use +`check-datapackage`. The main function of the package is `check()`, +which takes as input the properties of a Data Package (i.e., the +contents of the `datapackage.json` file) as a Python dictionary and +checks it against the standard. + +``` python +import check_datapackage as cdp + +# Normally you'd read in the `datapackage.json` file, but we'll +# show the actual contents here as a Python dict. You can use +# the `read_json()` helper function to read in `datapackage.json` +properties = { + "name": "woolly-dormice", + "id": "123-abc-123", + "resources": [{ + "name": "woolly-dormice-2015", + "path": "data.csv", + "schema": {"fields": [{ + "name": "eye-colour", + "type": "string", + }]}, + }], +} + +cdp.check(properties) +``` + +At a minimum, a Data Package needs to have a `resources` property. So in +this case, there are no issues with the Data Package. But if you were to +remove the `resources` property, which is required, and run the check +again, there would be an issue: + +``` python +del properties["resources"] +cdp.check(properties) +``` + +If you want these checks to be treated as an error, you set the +parameter `error` to `True`: + +``` python +cdp.check(properties, error=True) +``` + +If you want to exclude certain checks, you can do that by using the +`Config` and `Exclusion` classes. For example, if you want to exclude +all required checks, you can define the exclusion, add it to the +configuration, and pass it to the check function like so: + +``` python +exclusion_required = cdp.Exclusion(type="required") +config = cdp.Config(exclusions=[exclusion_required]) +cdp.check(properties=package_properties, config=config) +``` + +If you want the issues listed in a more human-friendly way, you can use +the `explain()` function that takes the list of issues returned by +`check()` and formats them nicely: + +``` python +issues = cdp.check(properties) +cdp.explain(issues) +``` + +There's many other checks you can configure with `check-datapackage`, so +be sure to check out the +[website](https://check-datapackage.seedcase-project.org) for more +information!