This module installs and runs the end-to-end data pipeline on a schedule: collect logs, process to CSV, time-window filter, upload to S3 (or queue if offline), and export Kolibri summary snapshots to the Kolibri/ S3 prefix.
Why systemd?
It provides reliable scheduling, dependency handling, robust logging via journalctl, and runs without an interactive session.
Key features
- Scheduled runs via systemd timer (daily, weekly, monthly, custom, and hourly for Castle logs)
- Offline-first uploads with queue flushing on the next successful run
- Shared S3 destination logic for both
RACHEL/andKolibri/ - Kolibri summary exports using the supported
kolibri manage exportlogs -l summaryCLI - Built-in workaround for Kolibri
0.19.2, which crashes ifstart_dateandend_dateare omitted - Guided configuration with AWS bucket discovery and a live test upload
- Status dashboard covering timer, queue, connectivity, and AWS identity
- Dual logging to
journalctland/var/log/v5_log_processor/automation.log
Components
main.sh— menu entrypoint for Install, Status, Configureinstall.sh— creates the service/timer and the wrapper at/usr/local/bin/run_v5_log_processor.shconfigure.sh— writesconfig/automation.conf, discovers buckets/subfolders, validates with a live test upload, and sets the schedulerunner.sh— orchestrates the pipeline, flushes queued uploads, and exports/upload Kolibri summariesstatus.sh— health/status report: timer/service, queue contents, connectivity, AWS identity, last logsflush_queue.sh— uploads any queued CSVs for bothRACHEL/andKolibri/filter_time_based.py— builds final CSVs for hourly/daily/weekly; automation callsprocess_csv.pyfor monthlyscripts/data/lib/s3_helpers.sh— shared bucket, upload, and queue helpersscripts/data/lib/kolibri_helpers.sh— shared Kolibri facility resolution and summary export helpers
Configuration (config/automation.conf)
Written by configure.sh and kept inside the repo so the automation can run from the project directory.
SERVER_VERSION:v1(Server v4/Apache),v2(Server v5/OC4D),v3(D-Hub), orv6PYTHON_SCRIPT:oc4dorcape_coast_d(onlyv2)DEVICE_LOCATION: short label used in folder names and output filenamesS3_BUCKET:s3://bucket-nameS3_SUBFOLDER: optional prefix under the bucketKOLIBRI_FACILITY_ID: optional override; if omitted, Kolibri's default facility is usedSCHEDULE_TYPE:hourly(Castle only),daily,weekly,monthly, orcustomRUN_INTERVAL: for custom schedules (seconds,>= 300)
Data flow
- Collect
v1/v4: copies/var/log/apache2/access.log*into00_DATA/LOCATION_logs_YYYY_MM_DDv2/v5: copies/var/log/oc4d/oc4d-*.log, Castle logs, and.gzfiles (excluding exceptions)v3/dhub: copies/var/log/dhub/*.logv6: copies/var/log/oc4d/oc4d-*.log(excluding exceptions)
- Process
- Chooses the matching processor and writes
00_DATA/00_PROCESSED/RUN_FOLDER/summary.csv
- Filter + Upload
RACHEL/
- Hourly, daily, and weekly runs use
filter_time_based.py - Monthly runs use
scripts/data/upload/process_csv.py - If online, queued
RACHEL/files are flushed before the new CSV uploads - If offline, the file is copied into
00_DATA/00_UPLOAD_QUEUE/RACHEL/
- Export + Upload
Kolibri/
- Uses
kolibri manage exportlogs -l summary --start_date ... --end_date ... - Exports land in
00_DATA/00_KOLIBRI_EXPORTS/ - If online, the summary CSV uploads to
S3_BUCKET/S3_SUBFOLDER/Kolibri/ - If offline or the upload fails, the file is copied into
00_DATA/00_UPLOAD_QUEUE/Kolibri/
Where things live
- Config:
config/automation.conf - Raw runs:
00_DATA/<DEVICE_LOCATION>_logs_YYYY_MM_DD/ - Processed logs:
00_DATA/00_PROCESSED/<DEVICE_LOCATION>_logs_YYYY_MM_DD/ - Kolibri exports:
00_DATA/00_KOLIBRI_EXPORTS/ - Upload queue:
00_DATA/00_UPLOAD_QUEUE/ - Logs:
/var/log/v5_log_processor/automation.logandjournalctl -u v5-log-processor.service
Commands
- Install:
sudo ./scripts/data/automation/install.sh - Configure:
sudo ./scripts/data/automation/configure.sh - Status:
./scripts/data/automation/status.sh - Manual run (wrapper):
sudo /usr/local/bin/run_v5_log_processor.sh - Manual Kolibri export/upload:
./scripts/data/upload/kolibri.sh
Troubleshooting
- Use
./scripts/data/automation/status.shto see timer state, queue contents, connectivity, AWS identity, and recent logs - If uploads fail, the automation still keeps the CSV in the matching queue folder for the next run
- If Kolibri export fails on
0.19.2, confirm the command still receives both--start_dateand--end_date - If
KOLIBRI_FACILITY_IDis not set, the scripts use Kolibri's default facility automatically