Skip to content

docs: production-deployment guide for trained FLAML models#1562

Open
immu4989 wants to merge 1 commit into
microsoft:mainfrom
immu4989:flaml-docs-production-deployment
Open

docs: production-deployment guide for trained FLAML models#1562
immu4989 wants to merge 1 commit into
microsoft:mainfrom
immu4989:flaml-docs-production-deployment

Conversation

@immu4989

Copy link
Copy Markdown
Contributor

Why are these changes needed?

Adds a new Use-Cases/Production-Deployment.md page covering the train → save → reload → predict on new data lifecycle, with a focus on the gotchas that actually surface in production but aren't in the quick-start tutorials.

Each section is grounded in a real user-reported pain point — the page exists because the same handful of issues keep getting filed against FLAML at the inference boundary:

Issue Pain Page section
#1101 Categorical encoding silently drifts between fit and predict §3 — Categorical features at inference time
#1136 automl.model.estimators_[i].predict(raw_X) fails §4 — Ensemble component access
#1115 MultiOutputRegressor(AutoML()) ignores X_val §6 — Multi-output regression
#1181 How to persist a StackingRegressor ensemble §1 — Save and reload
#887 sample_weight + split_type="time" AttributeError §5 — Sample weights
#1200 SMOTE with cross-validation §5 — note on imbalanced classification
#1540 Reproducibility audit follow-up §7 — Versioning and reproducibility

The page follows the existing Use-Cases/ style (it sits next to Task-Oriented-AutoML.md, Zero-Shot-AutoML.md, and Tune-User-Defined-Function.md) and is picked up automatically by the sidebar ({type: 'autogenerated', dirName: 'Use-Cases'} in website/sidebars.js).

Pre-flight verification

Every runnable snippet on the page was exercised against current main before writing. One discovery from that pre-flight is worth flagging in review: the MLflow autolog example in Best-Practices.md, as written, reloads as an unfitted Pipeline on recent MLflow versions (verified on mlflow==2.22.1). The new page recommends the explicit mlflow.sklearn.log_model(automl, artifact_path="...") pattern instead, which round-trips correctly. Happy to file a follow-up bug for the autolog reload path if useful.

What the page does not cover

  • Training-time configuration (covered in Task-Oriented-AutoML.md).
  • Zero-shot estimators (covered in Zero-Shot-AutoML.md).
  • Distributed / Spark / Microsoft Fabric deployment.

Related issue / PR list

This page references but does not duplicate:

Checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant