Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,12 +120,39 @@ python -m dfode_kit.cli.main sample \
--include_mesh
```

### 5. Continue to data preparation and training

Typical next steps are:

```bash
python -m dfode_kit.cli.main augment \
--mech /path/to/mechanisms/CH4/gri30.yaml \
--h5_file /path/to/run/oneD_flame_CH4_phi1/ch4_phi1_sample.h5 \
--output_file /path/to/data/ch4_phi1_aug.npy \
--dataset_num 20000

python -m dfode_kit.cli.main label \
--mech /path/to/mechanisms/CH4/gri30.yaml \
--time 1e-6 \
--source /path/to/data/ch4_phi1_aug.npy \
--save /path/to/data/ch4_phi1_labeled.npy

python -m dfode_kit.cli.main train \
--mech /path/to/mechanisms/CH4/gri30.yaml \
--source_file /path/to/data/ch4_phi1_labeled.npy \
--output_path /path/to/models/ch4_phi1_model.pt
```

See the published data workflow guide for the expected artifacts and stage boundaries:
- https://deepflame-ai.github.io/DFODE-kit/data-workflow/

## Recommended documentation entry points

If you are using the CLI, start with:
- https://deepflame-ai.github.io/DFODE-kit/cli/
- https://deepflame-ai.github.io/DFODE-kit/init/
- https://deepflame-ai.github.io/DFODE-kit/run-case/
- https://deepflame-ai.github.io/DFODE-kit/data-workflow/

If you are working on the repository itself, see:
- `AGENTS.md`
Expand All @@ -135,8 +162,7 @@ If you are working on the repository itself, see:

- `dfode_kit/cli/` — CLI entrypoints and subcommands
- `dfode_kit/cases/` — case init, presets, sampling, and DeepFlame/OpenFOAM-facing helpers
- `dfode_kit/data/` — data contracts, HDF5 I/O, and integration helpers
- `dfode_kit/data_operations/` — augmentation and labeling workflows
- `dfode_kit/data/` — data contracts, HDF5 I/O, integration, augmentation, and labeling helpers
- `dfode_kit/models/` — model architectures and registries
- `dfode_kit/training/` — training configuration, registries, training loops, and preprocessing
- `canonical_cases/` — canonical flame case templates
Expand Down
26 changes: 22 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@

- `dfode_kit/cli/`: CLI entrypoints and subcommands
- `dfode_kit/cases/`: explicit case init, presets, sampling, and DeepFlame-facing helpers
- `dfode_kit/data/`: contracts, HDF5 I/O, and integration utilities
- `dfode_kit/data_operations/`: augmentation and labeling workflows
- `dfode_kit/data/`: contracts, HDF5 I/O, integration, augmentation, and labeling utilities
- `dfode_kit/models/`: model architectures and registries
- `dfode_kit/training/`: training configuration, training loops, registries, and preprocessing
- `docs/agents/`: agent-facing operational and planning docs
Expand All @@ -21,11 +20,30 @@ The repository now includes:
- lightweight CI
- documentation topology for agents and maintainers

### 2. Data contracts
A new contracts layer is being used to make HDF5 dataset assumptions explicit and testable.
### 2. Data contracts and workflow boundaries
A contracts layer is used to make HDF5 dataset assumptions explicit and testable.
The canonical `dfode_kit.data` package now also owns the main data-preparation boundary:

- HDF5 sampling outputs
- HDF5-to-NumPy conversion
- perturbation-based augmentation
- CVODE/Cantera labeling
- integration utilities used by downstream workflows

### 3. Config-driven training
The training stack is moving toward explicit config objects and registries so new model architectures and trainer types can be added without editing a monolithic training loop.

### 4. Agent-friendly CLI
The CLI now uses lighter command discovery and deferred heavy imports for improved usability in minimal environments.

## Architectural end state of the recent refactor

The repository has now completed the transition away from the older compatibility layout. In particular, these legacy layers are removed from `main`:

- `dfode_kit/cli_tools/`
- `dfode_kit/df_interface/`
- `dfode_kit/data_operations/`
- `dfode_kit/runtime_config.py`
- legacy `dfode_core` model/train compatibility packages

The current published docs should therefore treat `cli`, `cases`, `data`, `models`, `runtime`, and `training` as the only canonical implementation homes.
41 changes: 40 additions & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,15 +74,52 @@ dfode-kit sample \
### `augment`
Apply perturbation-based dataset augmentation to sampled states.

Example:

```bash
dfode-kit augment \
--mech /path/to/gri30.yaml \
--h5_file /path/to/sample.h5 \
--output_file /path/to/augmented.npy \
--dataset_num 20000
```

### `label`
Generate supervised learning targets using Cantera/CVODE time advancement.

Example:

```bash
dfode-kit label \
--mech /path/to/gri30.yaml \
--time 1e-6 \
--source /path/to/augmented.npy \
--save /path/to/labeled.npy
```

### `train`
Train a neural-network surrogate for chemistry integration.

Example:

```bash
dfode-kit train \
--mech /path/to/gri30.yaml \
--source_file /path/to/labeled.npy \
--output_path /path/to/model.pt
```

### `h52npy`
Convert HDF5 scalar-field datasets into a stacked NumPy array.

Example:

```bash
dfode-kit h52npy \
--source /path/to/sample.h5 \
--save_to /path/to/sample.npy
```

## Current design notes

Recent CLI refactors improved:
Expand All @@ -92,7 +129,9 @@ Recent CLI refactors improved:
- lazy command loading for lighter help paths,
- more predictable command dispatch behavior.

The new `init` command already supports machine-readable JSON output for planning/provenance.
The new `init` command already supports machine-readable JSON output for planning/provenance, and `run-case` supports JSON output for preview/apply results.

For the end-to-end artifact flow between `sample`, `augment`, `label`, `h52npy`, and `train`, see [Data Preparation and Training Workflow](data-workflow.md).

Future work should still add:

Expand Down
179 changes: 179 additions & 0 deletions docs/data-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Data Preparation and Training Workflow

This page documents the currently exposed CLI stages after a case has been initialized and run successfully.

It focuses on the data pipeline from:

1. finished DeepFlame/OpenFOAM case outputs
2. sampled HDF5 state data
3. optional HDF5-to-NumPy conversion
4. augmented state datasets
5. labeled supervised-learning datasets
6. trained surrogate model artifacts

## Stage boundaries

The current CLI presents the data workflow as a sequence of artifact transformations.

### 1. `sample`
Input:
- a finished case directory
- a mechanism file

Output:
- an HDF5 file containing sampled scalar fields
- optionally mesh datasets

Example:

```bash
dfode-kit sample \
--mech /path/to/gri30.yaml \
--case /path/to/run/oneD_flame_CH4_phi1 \
--save /path/to/run/oneD_flame_CH4_phi1/ch4_phi1_sample.h5 \
--include_mesh
```

Typical contents include:
- root metadata such as `mechanism`
- `scalar_fields/` datasets keyed by output time
- optional mesh datasets

### 2. `h52npy`
Input:
- sampled HDF5 file

Output:
- stacked NumPy array of scalar fields

Example:

```bash
dfode-kit h52npy \
--source /path/to/run/oneD_flame_CH4_phi1/ch4_phi1_sample.h5 \
--save_to /path/to/data/ch4_phi1_sample.npy
```

Use this when downstream workflows need a single NumPy array rather than time-indexed HDF5 datasets.

### 3. `augment`
Input:
- sampled HDF5 file
- mechanism file

Output:
- augmented NumPy dataset

Example:

```bash
dfode-kit augment \
--mech /path/to/gri30.yaml \
--h5_file /path/to/run/oneD_flame_CH4_phi1/ch4_phi1_sample.h5 \
--output_file /path/to/data/ch4_phi1_aug.npy \
--dataset_num 20000
```

Current optional controls:
- `--heat_limit`
- `--element_limit`
- `--perturb_factor`

## Current note on `augment`

The current CLI surface exposes `--perturb_factor`, but the present command implementation does not yet thread that value through to the underlying augmentation routine. Treat the command as functional, but the public option surface here is not yet fully normalized.

### 4. `label`
Input:
- mechanism file
- NumPy state dataset
- reactor advancement time step

Output:
- labeled NumPy dataset suitable for supervised learning

Example:

```bash
dfode-kit label \
--mech /path/to/gri30.yaml \
--time 1e-6 \
--source /path/to/data/ch4_phi1_aug.npy \
--save /path/to/data/ch4_phi1_labeled.npy
```

Conceptually, this stage advances each sampled state with Cantera/CVODE and writes paired source/target state data.

### 5. `train`
Input:
- mechanism file
- labeled NumPy dataset

Output:
- trained model artifact written to the requested output path

Example:

```bash
dfode-kit train \
--mech /path/to/gri30.yaml \
--source_file /path/to/data/ch4_phi1_labeled.npy \
--output_path /path/to/models/ch4_phi1_model.pt
```

## Recommended artifact layout

A practical directory layout is:

```text
<project-root>/
runs/
oneD_flame_CH4_phi1/
ch4_phi1_sample.h5
data/
ch4_phi1_sample.npy
ch4_phi1_aug.npy
ch4_phi1_labeled.npy
models/
ch4_phi1_model.pt
```

This keeps:
- case-run artifacts near the case directory
- derived training datasets under a separate `data/` area
- trained models under a separate `models/` area

## Current limitations and documentation gaps

The CLI surface for the data pipeline is usable, but not yet as normalized as `init` and `run-case`.

Current gaps include:
- limited machine-readable JSON output for `sample`, `augment`, `label`, and `train`
- older option naming conventions such as `--h5_file` and `--source_file`
- thinner published documentation for training outputs and configuration detail than for case init/run

These are good future cleanup targets, but the commands above describe the current behavior on `main`.

## Validated minimal sequence

For a validated 1D flame workflow, the current practical sequence is:

```bash
dfode-kit init oneD-flame ... --apply
dfode-kit run-case --case /path/to/case --apply --json
dfode-kit sample --mech /path/to/gri30.yaml --case /path/to/case --save /path/to/sample.h5 --include_mesh
```

After sampling, continue with either:

```bash
dfode-kit h52npy --source /path/to/sample.h5 --save_to /path/to/sample.npy
```

or directly with augmentation/labeling:

```bash
dfode-kit augment ...
dfode-kit label ...
dfode-kit train ...
```
30 changes: 30 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,36 @@ uv venv .venv
uv pip install --python .venv/bin/python -e '.[dev]'
```

## CLI entrypoint

If the console script is installed, use:

```bash
dfode-kit --help
```

A reliable fallback inside the repository is:

```bash
.venv/bin/python -m dfode_kit.cli.main --help
```

## Runtime environment split

Different stages of the workflow may require different dependencies:

- lightweight repository verification: local `.venv`
- canonical case initialization: Python environment with `cantera`
- case execution: configured OpenFOAM + Conda + DeepFlame runtime via `dfode-kit config` and `dfode-kit run-case`
- sampling / labeling: Python environment with `cantera`, `numpy`, and `h5py`

If you are starting with the case workflow, continue to:

1. [CLI](cli.md)
2. [Canonical Case Initialization](init.md)
3. [Runtime Configuration and Case Execution](run-case.md)
4. [Data Preparation and Training Workflow](data-workflow.md)

## Current focus

The project is being refactored toward:
Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ DFODE-kit is a Python toolkit for accelerating combustion chemistry integration
- **CLI**: current `dfode-kit` commands and their purpose
- **Canonical Case Initialization**: preset-based case setup with preview/apply/config workflows
- **Runtime Configuration and Case Execution**: persistent machine-local environment config plus reproducible case launching
- **Data Preparation and Training Workflow**: the current artifact flow from sampled HDF5 to labeled datasets and models
- **Architecture**: repo layout and current refactor direction
- **Tutorials and Workflow**: how to think about the DFODE pipeline
- **Agent Docs**: operational guidance for coding agents and maintainers
Expand Down
12 changes: 12 additions & 0 deletions docs/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,15 @@ A future docs iteration can bring notebook tutorials into the published site, bu
- repository architecture,
- CLI guidance,
- agent and maintainer workflow documentation.

## Practical workflow entry points

For reproducible command-line usage, use the published Markdown docs in this order:

1. [Getting Started](getting-started.md)
2. [CLI](cli.md)
3. [Canonical Case Initialization](init.md)
4. [Runtime Configuration and Case Execution](run-case.md)
5. [Data Preparation and Training Workflow](data-workflow.md)

That sequence reflects the currently validated path from case creation to sampled/training-ready datasets.
Loading
Loading