## 1. Pre-training
Collect diverse data so a single model learns broad physical skills before touching a target task.
| What happens | Typical ingredients | Example models / projects |
|--------------|--------------------|---------------------------|
| Large-scale vision-language-action pre-training | Web images, captions, robot camera streams, joint logs | **RT-2** (Google DeepMind) |
| Foundation pre-training on warehouse data | Millions of pick-and-place episodes, depth maps | **RFM-1** (Covariant) |
| Cross-lab data pooling | 1 M+ trajectories across 22 robot types | **Open X-Embodiment / RT-X** consortium |
| Video-only motor learning | Self-supervised video prediction | **Tesla Optimus** humanoid stack |
## 2. Post-training
Adapt the foundation model to a new robot, task, or workflow.
| Technique | Why it matters | Example uses |
|-----------|---------------|--------------|
| Few-shot fine-tuning / LoRA | Keep the core frozen and add a small adapter | **Gemini Robotics** one-day task onboarding |
| RL from human feedback | Align behaviour with operator preferences | Visual-motor RLHF research lines |
| Self-improvement cycles | Model collects its own roll-outs, then re-trains | **RoboCat** (DeepMind) |
| Rapid domain adaptation | Close sim-to-real gaps, update SKU lists overnight | **Covariant Brain** in warehouses |
## 3. Runtime
Run the model safely and fast on the robot, with sensing and control loops.
| Layer | Key considerations | Field examples |
|-------|-------------------|----------------|
| On-device inference | Low latency, no network dependence | **Gemini Robotics** factory cells |
| Edge + cloud split | Heavy model in cloud, light skills locally | **Covariant Brain** edge boxes |
| Safety checks | Validate plans in sim or controller before actuation | MIT CSAIL LLM-plan-then-simulate system |
| Full-body humanoid runtime | Torque control, perception, speech | **Sanctuary AI Phoenix** generation 8 |
## 4. Prompt Engineering for Robots
Craft language or multimodal prompts that steer the pre-trained model at run-time.
| Pattern | How it works | Illustrative projects |
|---------|-------------|-----------------------|
| Skill-scoring prompts | Break a request into candidate skills, pick the best | **PaLM-SayCan** household tasks |
| Structure-augmented prompts | Embed observations, allowed actions, task schema | **Prompt2Walk** locomotion |
| Iterative re-planning prompts | Prompt → plan → sim check → re-prompt if unsafe | MIT CSAIL open-ended planner |
| Few-shot action templates | Show (command, code) pairs once, reuse at inference | Community “Awesome-LLM-Robotics” repos |
### How the phases link
1. **Pre-training** builds a broad prior on physics and perception.
2. **Post-training** personalises that prior with minimal fresh data.
3. **Runtime** wraps the model with real-time guarantees and safety gates.
4. **Prompt engineering** provides the lightweight interface that turns human intent into concrete behaviour.