Data Foundation
Data Ingestion & Preprocessing
Data Ingestion & Preprocessing move robot data from edge devices, cloud storage, developer tooling, and external systems into rFabric with full provenance, transport resilience, normalization, and validation. Their job is to establish the first trustworthy record of what was collected and convert it into structured platform entities.
What This Surface Owns
Reliable intake
Robot data has to enter the platform without losing provenance or operational context.
- Robot-side upload, cloud sync, API, SDK, and CLI entry paths
- Resumability and unreliable-network tolerance
- Immediate source registration at the moment data crosses into the platform
Preprocessing and normalization
The component does not stop at transport. It converts accepted data into structured, trusted platform state.
- Validation, timing alignment, and schema checks
- Creation of datasets, episodes, sensor streams, and derived assets
- LeRobot-native handling with room for additional robotics formats
Why Robotics Needs This As One Surface
Transport and structure are inseparable
In robotics, a trustworthy upload path has to preserve collection context and produce structured entities, not just move files into a bucket.
Multimodal alignment matters immediately
Video, proprioception, force signals, and operator context have to stay aligned from the first moment they enter the platform.
Formats evolve
Sensor suites, hardware revisions, and open data formats change. Intake and normalization have to absorb that change without breaking historical continuity.
The downstream stack depends on this boundary
Annotation, curation, model development, and operations only stay coherent if the first trustworthy representation of the data is already inside the shared entity graph.
Why Teams Care
Less operational friction
Data arrives without brittle manual transfer steps and without later repair work to reconstruct provenance.
Trust
Teams know what was collected, where it came from, and which workflow version interpreted it into platform state.
Format continuity
The platform can absorb heterogeneous collection setups without losing shared lifecycle identity.
A stronger wedge
Everything downstream is stronger because the system of record begins at the first trustworthy boundary rather than after ad hoc preprocessing.