## 1. Pre-training Collect diverse data so a single model learns broad physical skills before touching a target task. | What happens | Typical ingredients | Example models / projects | |--------------|--------------------|---------------------------| | Large-scale vision-language-action pre-training | Web images, captions, robot camera streams, joint logs | **RT-2** (Google DeepMind) | | Foundation pre-training on warehouse data | Millions of pick-and-place episodes, depth maps | **RFM-1** (Covariant) | | Cross-lab data pooling | 1 M+ trajectories across 22 robot types | **Open X-Embodiment / RT-X** consortium | | Video-only motor learning | Self-supervised video prediction | **Tesla Optimus** humanoid stack | ## 2. Post-training Adapt the foundation model to a new robot, task, or workflow. | Technique | Why it matters | Example uses | |-----------|---------------|--------------| | Few-shot fine-tuning / LoRA | Keep the core frozen and add a small adapter | **Gemini Robotics** one-day task onboarding | | RL from human feedback | Align behaviour with operator preferences | Visual-motor RLHF research lines | | Self-improvement cycles | Model collects its own roll-outs, then re-trains | **RoboCat** (DeepMind) | | Rapid domain adaptation | Close sim-to-real gaps, update SKU lists overnight | **Covariant Brain** in warehouses | ## 3. Runtime Run the model safely and fast on the robot, with sensing and control loops. | Layer | Key considerations | Field examples | |-------|-------------------|----------------| | On-device inference | Low latency, no network dependence | **Gemini Robotics** factory cells | | Edge + cloud split | Heavy model in cloud, light skills locally | **Covariant Brain** edge boxes | | Safety checks | Validate plans in sim or controller before actuation | MIT CSAIL LLM-plan-then-simulate system | | Full-body humanoid runtime | Torque control, perception, speech | **Sanctuary AI Phoenix** generation 8 | ## 4. Prompt Engineering for Robots Craft language or multimodal prompts that steer the pre-trained model at run-time. | Pattern | How it works | Illustrative projects | |---------|-------------|-----------------------| | Skill-scoring prompts | Break a request into candidate skills, pick the best | **PaLM-SayCan** household tasks | | Structure-augmented prompts | Embed observations, allowed actions, task schema | **Prompt2Walk** locomotion | | Iterative re-planning prompts | Prompt → plan → sim check → re-prompt if unsafe | MIT CSAIL open-ended planner | | Few-shot action templates | Show (command, code) pairs once, reuse at inference | Community “Awesome-LLM-Robotics” repos | ### How the phases link 1. **Pre-training** builds a broad prior on physics and perception. 2. **Post-training** personalises that prior with minimal fresh data. 3. **Runtime** wraps the model with real-time guarantees and safety gates. 4. **Prompt engineering** provides the lightweight interface that turns human intent into concrete behaviour.