The data checker relies on the following libraries:
numpy, xarray, argparse, dateutil.relativedelta, datetime, json, sys, os, pathlib, re, stat, logging, typing
Install requirements with:
pip install numpy xarray python-dateutil
- Add
${checkerdir}/srctoPYTHONPATHin~/.bashrc, where${checkerdir}is the full path to the checker directory:
export PYTHONPATH="${PYTHONPATH}:${checkerdir}/src"
- Configure the config file (
config_lu.jsonfor landuse orconfig_em.jsonfor emissions), which contains the following settings:directory: the directory with the files requiring checking;log_path: where to save logs (relative path inside the checker directory);base_path: full path to the checker directory;required_file_types: for the landuse files there are "multiple-management", "multiple-states", "multiple-transitions";required_variables: variables which are mandatory to be in the files (for each file type independently)required_coords: coordinates which are mandatory to be in the files (for each file type independently);required_attributes: general attributes which are mandatory for the files;required_attributes_in_vars: variable-specific attributes which are mandatory for the files.
- Configure the file
${checkerdir}/src/variable-info_landuse.jsonor${checkerdir}/src/variable-info_emissions.jsonwhich contains the variable ranges requirements (for each file type independently).
- Run:
python run_script.py config_lu.jsonorpython run_script.py config_em.json.
FileNameChecker: ${checkerdir}/src/checkers/checker_00_file_name.py
Check filetype ("multiple-management", "multiple-states", or "multiple-transitions") and the filename (it should match a pattern multiple-<...>_input4MIPs_landState_<...>_gn_YYYY-YYYY.nc).
It uses functions from ${checkerdir}/src/utils/misc_utils.py.
StandardComplianceChecker: ${checkerdir}/src/checkers/checker_01_standard_compliance.py
Check file permissions, dimension variables, compulsory attributes, _FillValue.
SpatialCompletenessChecker: ${checkerdir}/src/checkers/checker_02_spatial_completeness.py
Create the reference mask based on the reference file and check the presence of missing values.
It uses functions from ${checkerdir}/src/utils/misc_utils.py.
SpatialConsistencyChecker: ${checkerdir}/src/checkers/checker_03_spatial_consistency.py
Check that the lon/lat grid points correspond to the reference file.
TemporalConsistencyChecker: ${checkerdir}/src/checkers/checker_04_temporal_consistency.py
Check timesteps for consistency.
It uses functions from ${checkerdir}/src/utils/path_utils.py.
i is the number of the timestep in the file:
time = "2020-01-01" [0], "2025-01-01" [1], "2030-01-01" [2], "2035-01-01" [3], "2040-01-01" [4],
"2045-01-01" [5], "2050-01-01" [6], "2055-01-01" [7], "2060-01-01" [8], "2070-01-01" [9],
"2080-01-01" [10], "2090-01-01" [11], "2100-01-01" [12]
During the check, the 'time' array is replaced by the 'timestep' array:
timesteps = [50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 110, 120, 130]
timediff is the difference between two consequtive timesteps: timediff = timesteps[i] - timesteps[i-1].
Here we have timefiff of either 5 or 10 years:
- for
i<=8 (before 2060)timediffshould be 5 years - for other
i(after 2060)timediffshould be 10 years
ValidRangesChecker: ${checkerdir}/src/checkers/checker_05_valid_ranges.py
Check that data values are in the required range (defined in ${checkerdir}/src/variable-info.json).
It uses functions from ${checkerdir}/src/utils/misc_utils.py.
StatesTransitionsChecker: ${checkerdir}/src/checkers/checker_06_states_transitions.py
-
For each
multiple-states_<XXX>: check that the sum of all variables is close to 1. -
For each
multiple-transitions_<XXX>: take the corresponding filemultiple-states_<XXX>(with the same<XXX>) and check that the sum of the gross landuse transitions matches the difference in states between two consecutive years (except for the variablessecdf, primf, secdn, primn).
Algorithm for (2):
-
In
multiple-states_<...>, we have variables'c3ann' 'c3nfx' 'c3per' ..., so for each variablevarwe take its value for the year Y:var_states_Y, and its value for the year Y+1:var_states_(Y+1). -
In
multiple-transitions_<...>, we have'c3ann_to_c3nfx' 'c3ann_to_c3per' 'c3ann_to_c4ann' ..., i.e.X_to_varandvar_to_Xwithvarfrommultiple-states_<...>.
We calculate (for every year Y):
sum(X_to_var)- the sum of all variables inmultiple-transitions_<...>for the year Y with namesto_{var}, and
sum(var_to_X)- the sum of all variables inmultiple-transitions_<...>for the year Y with names{var}_to,
e.g. forc3annat the year Y:
sum(X_to_var) = sum ['c3nfx_to_c3ann', 'c3per_to_c3ann', 'c4ann_to_c3ann', 'c4per_to_c3ann', 'primf_to_c3ann', 'primn_to_c3ann', 'secdf_to_c3ann', 'secdn_to_c3ann', 'urban_to_c3ann', 'pastr_to_c3ann', 'range_to_c3ann']
sum(var_to_X) = sum ['c3ann_to_c3nfx', 'c3ann_to_c3per', 'c3ann_to_c4ann', 'c3ann_to_c4per', 'c3ann_to_secdf', 'c3ann_to_secdn', 'c3ann_to_urban', 'c3ann_to_pastr', 'c3ann_to_range'] -
We want this equation to be true:
sum(X_to_var) - sum(var_to_X) = var_states_(Y+1) - var_states_Y,
so for each variable we calculatedeltawhich should be close to 0:
delta = [ sum(X_to_var) - sum(var_to_X) ] - [ states_(Y+1) - states_Y) ]
${checkerdir}/run_script.py: run the "main" function;${checkerdir}/src/checkers/directory_checker.pyand${checkerdir}/scripts/check_file.py: configure the parameters and run all checkers;${checkerdir}/src/utils: functions which are used by checkers.
For each run, the checker creates a new logging directory (its name includes the dataset name, current date and time) in ${checkerdir}/logs (the "logs" name can be modified in config_lu.json in "log_path").
There are files:
<...>_errors.log- only errors;<...>_output.log- all information about the checking.