Skip to content

Commit 10afe52

Browse files
authored
feat: ✨ post on publishing check-datapackage (#184)
1 parent fa9cd3f commit 10afe52

File tree

1 file changed

+147
-0
lines changed

1 file changed

+147
-0
lines changed
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
---
2+
title: "First published release of `check-datapackage`!"
3+
description: "We've published our second Python package. :tada: :grin: This package checks that a Data Package is compliant with its specification."
4+
author:
5+
- Luke W. Johnston
6+
date: "2025-12-08"
7+
categories:
8+
- packaging
9+
- publishing
10+
- programming
11+
---
12+
13+
On November 27th, 2025, we published our second Python package to
14+
[PyPI](https://pypi.org/project/check-datapackage). This package forms
15+
the basis for ensuring that any metadata created or edited for a [Data
16+
Package](https://decisions.seedcase-project.org/why-frictionless-data/)
17+
is correct and compliant with the [Data Package
18+
standard](https://datapackage.org). Since we are and will be working
19+
with and managing many Data Packages over the coming years, this is an
20+
important tool for us to have! Generally, this will be a helpful tool
21+
for anyone working with and managing Data Packages.
22+
23+
## What's `check-datapackage`?
24+
25+
As with all our packages and software tools, we have a dedicated website
26+
for
27+
[`check-datapackage`](https://check-datapackage.seedcase-project.org).
28+
So, rather than repeat what is already in that website, this post gives
29+
a very quick overview of what this package does and why you might want
30+
to use it. It can be summarised by its tagline:
31+
32+
> Ensure the compliance of your Data Package metadata
33+
34+
The "only" thing `check-datapackage` does is to check the content of a
35+
`datapackage.json` file against the Data Package standard. Nothing
36+
fancy. But we designed it to be configurable, so that if you have
37+
specific needs for your Data Package, you can adjust the checks
38+
accordingly. It's possible to both add checks on top of the standard or
39+
ignore certain checks from the standard. For example, if you want to
40+
ensure that certain fields that aren't required by the standard are
41+
always present in the metadata, you can set up the checks to enforce
42+
that.
43+
44+
For now, `check-datapackage` is only a few Python functions and classes
45+
that you can use within your own Python scripts. But in the future, we
46+
plan to develop a command-line interface (CLI) so that you can use it
47+
directly from your terminal without needing to write any code. Along
48+
with including a config file, we hope to incorporate `check-datapackage`
49+
into typical build tools and automated check workflows.
50+
51+
## Why use it?
52+
53+
We wanted this package to be incredibly simple and focused. It also
54+
doesn't include extra dependencies or features that you might not need.
55+
We wanted it lightweight and easy to use.
56+
57+
While there are a few tools that provide some type of checks of Data
58+
Packages, such as the
59+
[frictionless-py](https://pypi.org/project/frictionless/) package, we
60+
didn't want all the extras that came with these packages. Nor are these
61+
tools easy to configure for our needs. In this regard, there were no
62+
tools available that fit ours needs. So, we built our own package that
63+
does exactly what we need. Hopefully, it will be useful for other people
64+
too!
65+
66+
Eventually, when we develop `check-datapackage` as a CLI, you could
67+
include it as a [pre-commit hook](https://pre-commit.com/) or part of
68+
your [continuous
69+
integration](https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration)
70+
workflow so that every time you make changes to your Data Package
71+
metadata, it is automatically checked for compliance. That way, you will
72+
always know that your Data Package metadata lives up to the standard and
73+
your configuration.
74+
75+
### Example use
76+
77+
We have a detailed
78+
[guide](https://check-datapackage.seedcase-project.org/docs/guide/) on
79+
how to use `check-datapackage`. But we'll briefly show how you might use
80+
`check-datapackage`. The main function of the package is `check()`,
81+
which takes as input the properties of a Data Package (i.e., the
82+
contents of the `datapackage.json` file) as a Python dictionary and
83+
checks it against the standard.
84+
85+
``` python
86+
import check_datapackage as cdp
87+
88+
# Normally you'd read in the `datapackage.json` file, but we'll
89+
# show the actual contents here as a Python dict. You can use
90+
# the `read_json()` helper function to read in `datapackage.json`
91+
properties = {
92+
"name": "woolly-dormice",
93+
"id": "123-abc-123",
94+
"resources": [{
95+
"name": "woolly-dormice-2015",
96+
"path": "data.csv",
97+
"schema": {"fields": [{
98+
"name": "eye-colour",
99+
"type": "string",
100+
}]},
101+
}],
102+
}
103+
104+
cdp.check(properties)
105+
```
106+
107+
At a minimum, a Data Package needs to have a `resources` property. So in
108+
this case, there are no issues with the Data Package. But if you were to
109+
remove the `resources` property, which is required, and run the check
110+
again, there would be an issue:
111+
112+
``` python
113+
del properties["resources"]
114+
cdp.check(properties)
115+
```
116+
117+
If you want these checks to be treated as an error, you set the
118+
parameter `error` to `True`:
119+
120+
``` python
121+
cdp.check(properties, error=True)
122+
```
123+
124+
If you want to exclude certain checks, you can do that by using the
125+
`Config` and `Exclusion` classes. For example, if you want to exclude
126+
all required checks, you can define the exclusion, add it to the
127+
configuration, and pass it to the check function like so:
128+
129+
``` python
130+
exclusion_required = cdp.Exclusion(type="required")
131+
config = cdp.Config(exclusions=[exclusion_required])
132+
cdp.check(properties=package_properties, config=config)
133+
```
134+
135+
If you want the issues listed in a more human-friendly way, you can use
136+
the `explain()` function that takes the list of issues returned by
137+
`check()` and formats them nicely:
138+
139+
``` python
140+
issues = cdp.check(properties)
141+
cdp.explain(issues)
142+
```
143+
144+
There's many other checks you can configure with `check-datapackage`, so
145+
be sure to check out the
146+
[website](https://check-datapackage.seedcase-project.org) for more
147+
information!

0 commit comments

Comments
 (0)