Skip to content

Commit a204fcc

Browse files
committed
feat: ✨ post on publishing check-datapackage
1 parent fa9cd3f commit a204fcc

File tree

1 file changed

+140
-0
lines changed

1 file changed

+140
-0
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
title: "First published release of `check-datapackage`!"
3+
description: "We've published our second Python package. :tada: :grin: This package checks that a Data Package is compliant with its specification."
4+
author:
5+
- Luke W. Johnston
6+
date: "2025-12-08"
7+
categories:
8+
- packaging
9+
- publishing
10+
- programming
11+
---
12+
13+
On November 27th, 2025, we published our second Python package to
14+
[PyPI](https://pypi.org/project/check-datapackage). This package forms
15+
the basis for ensuring that any metadata we create or edit for a [Data
16+
Package](https://decisions.seedcase-project.org/why-frictionless-data/)
17+
is correct and compliant with the [Data Package
18+
standard](https://datapackage.org). And since we are and will be working
19+
with and managing many Data Packages over the coming years, this is an
20+
important tool for us to have!
21+
22+
## What's `check-datapackage`?
23+
24+
As with all our packages and software tools, we have a dedicated website
25+
for
26+
[`check-datapackage`](https://check-datapackage.seedcase-project.org).
27+
So, rather than repeat what is already in that website, this post gives
28+
a very quick overview of what it is and why you might want to use it. It
29+
can be summarised by its tagline:
30+
31+
> Ensure the compliance of your Data Package metadata
32+
33+
The "only" thing it does is checks the content of a `datapackage.json`
34+
file against the standard. Nothing fancy. But we designed it to be
35+
configurable, so that if you have specific needs for your Data Package,
36+
you can adjust the checks accordingly. For example, if you want to
37+
ensure that certain fields are always present in the metadata, you can
38+
set up the checks to enforce that.
39+
40+
For now, `check-datapackage` is only a few Python functions and classes
41+
that you can use within your own Python scripts. But in the future, we
42+
plan to develop a command-line interface (CLI) so that you can use it
43+
directly from your terminal without needing to write any code. Along
44+
with including a config file, we hope to incorporate `check-datapackage`
45+
into typical build tools or automated check workflows.
46+
47+
## Why use it?
48+
49+
We wanted this package to be incredibly simple and focused in its scope.
50+
If you install or use it, you know exactly what it does. It also doesn't
51+
include extra dependencies or features that you might not need. We
52+
wanted it lightweight and easy to use.
53+
54+
While there are a few tools that provide some type of checks of Data
55+
Packages, such as the
56+
[frictionless-py](https://pypi.org/project/frictionless/) package, we
57+
didn't want all the extras that came with these packages. Nor are these
58+
tools easy to configure for our needs. In this regard, there were no
59+
tools available that fit ours needs. So we built our own package that
60+
does exactly what we need. And hopefully it might be useful for you too!
61+
62+
Eventually, when we develop `check-datapackage` as a CLI, you could
63+
include it as a [pre-commit hook](https://pre-commit.com/) or part of
64+
your [continuous
65+
integration](https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration)
66+
workflow so that every time you make changes to your Data Package
67+
metadata, it is automatically checked for compliance. That way, you will
68+
always know that everything is good with your Data Package metadata. At
69+
least, good according to the standard and your specific needs!
70+
71+
### Example use
72+
73+
We have a detailed
74+
[guide](https://check-datapackage.seedcase-project.org/docs/guide/) on
75+
how to use `check-datapackage`. But I'll briefly show how you might use
76+
`check-datapackage`. The main function you would use is `check()`, which
77+
takes as input the properties of a Data Package (i.e., the contents of
78+
the `datapackage.json` file) as a Python dictionary.
79+
80+
``` python
81+
import check_datapackage as cdp
82+
83+
# Normally you'd read in the `datapackage.json` file, but we'll
84+
# show the actual contents here as a Python dict.
85+
properties = {
86+
"name": "woolly-dormice",
87+
"id": "123-abc-123",
88+
"resources": [{
89+
"name": "woolly-dormice-2015",
90+
"path": "data.csv",
91+
"schema": {"fields": [{
92+
"name": "eye-colour",
93+
"type": "string",
94+
}]},
95+
}],
96+
}
97+
98+
cdp.check(properties)
99+
```
100+
101+
At a minimum, a Data Package needs to have a `resources` property. So in
102+
this case, there are no issues with the Data Package. But if you were to
103+
remove the `resources` property, which is required, and run the check
104+
again, there would be an issue:
105+
106+
``` python
107+
del properties["resources"]
108+
cdp.check(properties)
109+
```
110+
111+
If you want these checks to be treated as an error, you set the
112+
parameter `error` to `True`:
113+
114+
``` python
115+
cdp.check(properties, error=True)
116+
```
117+
118+
If you wanted to exclude certain checks, you can do that by using the
119+
`Config` and `Exclusion` classes. For example, if you wanted to ignore
120+
all required checks, you could do:
121+
122+
``` python
123+
exclusion_required = cdp.Exclusion(type="required")
124+
config = cdp.Config(exclusions=[exclusion_required])
125+
cdp.check(properties=package_properties, config=config)
126+
```
127+
128+
If you wanted the issues listed in a more human-friendly way, we have
129+
the `explain()` function that takes the list of issues returned by
130+
`check()` and formats them nicely:
131+
132+
``` python
133+
issues = cdp.check(properties)
134+
cdp.explain(issues)
135+
```
136+
137+
There's many other things you can configure in `check-datapackage`, so
138+
be sure to check out the
139+
[website](https://check-datapackage.seedcase-project.org) for more
140+
information!

0 commit comments

Comments
 (0)