Layer 2: Training

Model Registry

The Model Registry is the lifecycle system of record for trained models. It converts checkpoints into governed model objects with explicit states, attached evidence, operational metadata, and downstream deployment meaning. A checkpoint is a training artifact. A model in the registry is a decision-bearing platform entity.

What This Surface Owns

The Model Registry owns the transition from training output to deployment-ready lifecycle object.

Store model candidates with full provenance and attached evidence.

Manage lifecycle states such as candidate, evaluated, approved, deployed, deprecated, and archived.

Preserve comparison context between model versions, rollback targets, and release lines.

Provide the source of truth for Artifact Builder and deployment decisions.

It sits between training and deployment, but it is also where evaluation and operational evidence accumulate over time.

Core Capabilities

Candidate registration

Promote one or more checkpoints from a run into registry candidates.
Attach experiment identity, dataset lineage, code version, dependency surface, and compute profile.
Record which training configuration and evaluation context produced the candidate.

Lifecycle state management

Candidate → evaluated → approved → deployed → deprecated → archived.
Explicit transitions with policy and approval rather than informal naming conventions.
Support parallel candidates for different robot classes, hardware targets, or customer environments.

Evidence attachment

Store replay results, benchmark pack outcomes, latency budgets, intervention rates, and rollout notes alongside the model.
Preserve not only metric values, but also which evaluation pack and release policy were used.
Carry deployment-relevant metadata forward instead of forcing downstream systems to reconstruct it.

Version comparison

Compare candidate models against prior approved or deployed versions.
Show performance deltas, latency changes, cohort-specific improvements, and regression risks.
Support attribution back to dataset, curation, architecture, or evaluation changes.

Consumption contracts

Artifact Builder consumes approved models from the registry.
Deployment and Update surfaces target explicit model or artifact lineage rather than loose filenames.
Telemetry can be tied back to exact deployed model identities and their prior evidence.

What A Registry Entry Should Answer

A serious model registry should let teams answer the following immediately:

Which dataset version and curation ruleset produced this model?

Which experiment justified its promotion?

Which evaluation pack approved it and under what thresholds?

Which artifact variants were built from it for which hardware targets?

Which fleets, sites, or customers are currently running it?

Which regressions, rollbacks, or maintenance events occurred after deployment?

If those answers require joining ad hoc notes across multiple systems, the registry is not doing its job.

Relationship To Neighboring Surfaces

Upstream

**Training Orchestrator** provides checkpoints and runtime metadata.
**Experiment Tracker** provides hypothesis, comparison context, and decision rationale.
**Evaluation & Release** provides benchmark evidence, release packs, and approval outcomes.

Downstream

**Artifact Builder** packages registered models into deployable artifacts.
**Deployment Manager** references approved models and their rollout eligibility.
**Telemetry & Monitoring** and **Maintenance System** can attribute incidents and regressions to deployed model versions.

Registry States And Governance

Candidate

A checkpoint with enough context to be evaluated, but not yet approved for rollout.

Evaluated

A candidate that has been run through the required replay packs, benchmarks, or release checks.

Approved

A candidate that has met the defined promotion policy and is authorized for packaging and rollout.

Deployed

A model whose artifact is active in one or more fleets, cohorts, or sites.

Deprecated / Archived

Models retained for history, rollback context, or auditability, but not eligible for normal promotion or rollout. State transitions should be auditable, attributable, and policy-aware. High-risk environments can require human approval. Lower-risk environments can automate promotion once evidence thresholds are satisfied.

Why This Matters Architecturally

The Model Registry is the glue between data-centric ML work and production robotics operations.

It carries provenance forward from training.

It gives deployment and update systems a governed source of truth.

It preserves the evidence that made a rollout acceptable.

It provides rollback and regression analysis with lifecycle continuity.

Without it, the platform cannot reliably answer what is running, why it was approved, or how to revert safely.

Why Teams Care

Release clarity

Teams know which candidate is actually approved and why.

Operational safety

Rollout and rollback decisions reference governed models, not raw checkpoints.

Comparability

Regressions can be analyzed against model lineage, not just deployment timing.

Institutional memory

The platform retains the reasons behind promotion and deprecation decisions over time.