rFabric

Data Foundation

Data Ingestion & Preprocessing

Data Ingestion & Preprocessing move robot data from edge devices, cloud storage, developer tooling, and external systems into rFabric with full provenance, transport resilience, normalization, and validation. Their job is to establish the first trustworthy record of what was collected and convert it into structured platform entities.

What This Surface Owns

Reliable intake

Robot data has to enter the platform without losing provenance or operational context.

  • Robot-side upload, cloud sync, API, SDK, and CLI entry paths
  • Resumability and unreliable-network tolerance
  • Immediate source registration at the moment data crosses into the platform

Preprocessing and normalization

The component does not stop at transport. It converts accepted data into structured, trusted platform state.

  • Validation, timing alignment, and schema checks
  • Creation of datasets, episodes, sensor streams, and derived assets
  • LeRobot-native handling with room for additional robotics formats

Why Robotics Needs This As One Surface

Transport and structure are inseparable

In robotics, a trustworthy upload path has to preserve collection context and produce structured entities, not just move files into a bucket.

Multimodal alignment matters immediately

Video, proprioception, force signals, and operator context have to stay aligned from the first moment they enter the platform.

Formats evolve

Sensor suites, hardware revisions, and open data formats change. Intake and normalization have to absorb that change without breaking historical continuity.

The downstream stack depends on this boundary

Annotation, curation, model development, and operations only stay coherent if the first trustworthy representation of the data is already inside the shared entity graph.

Why Teams Care

Less operational friction

Data arrives without brittle manual transfer steps and without later repair work to reconstruct provenance.

Trust

Teams know what was collected, where it came from, and which workflow version interpreted it into platform state.

Format continuity

The platform can absorb heterogeneous collection setups without losing shared lifecycle identity.

A stronger wedge

Everything downstream is stronger because the system of record begins at the first trustworthy boundary rather than after ad hoc preprocessing.