Posted by: Jordan Kretchmer
Posted on 05/18/2026
By Jordan Kretchmer, Senior Partner at Outlander VC
Robotics does not have a model problem nearly as much as it has an integration problem.
Yes, models are getting better. Vision-language models are improving. Teleoperation is improving. Haptics are improving. Dexterous manipulation is improving. Simulation is improving. Training pipelines are improving. Foundation models for robotics are improving. But almost every major piece of the stack is still being built as its own island.
That is why the industry keeps producing impressive technical breakthroughs without corresponding deployment scale. The components are getting better individually, but the system as a whole is still fragmented.
Robotics needs its missing protocol layer.
It needs an MCP for the physical world: a shared standard that lets perception, control, teleoperation, haptics, training, data, simulation, foundation models, and real machines all interoperate through a common interface. Until that exists, robotics will continue to advance in silos, and the cost of stitching everything together will keep slowing the entire industry down.
Right now, most robotics systems are assembled like custom projects, not interoperable products.
One company builds a tele-op layer. Another builds dexterous hands. Another builds foundation models. Another builds control software. Another builds haptic interfaces. Another builds simulation tooling. Another builds training infrastructure. Another builds robot arms, mobile bases, humanoids, drones, or underwater vehicles.
Each layer may be world-class. But the interfaces between them are usually bespoke.
A tele-op system has its own control API.
A hardware platform has its own command structure.
A manipulation stack has its own task representation.
A VLM has its own wrapper around perception and planning.
A dataset pipeline has its own logging schema.
A simulator has its own object definitions.
A human operator console has its own notion of intervention and override.
That means almost every serious deployment still requires custom glue code across the entire stack. This is a tax on the industry. And it is getting larger as the stack gets more powerful. The problem is no longer that we do not have enough interesting robotics technologies. The problem is that they do not compose.
The industry does not need one winning robot architecture. It needs a shared language. An MCP for the physical world would be a protocol layer that standardizes how robotic systems describe and exchange:
This is not glamorous, which is exactly why it matters.
Infrastructure layers are often less visible than the products they enable, but they are what allow ecosystems to form. In software, standards and protocols turned isolated systems into platforms. Robotics is approaching the point where it needs the same thing.
Without that layer, every robotics company is forced to behave like a vertically integrated island, even when the best future for the industry is modular.
This issue becomes far more urgent as robotics moves from tightly controlled industrial workflows into semi-structured and unstructured environments.
A robot in a fixed production cell can get away with a lot of custom engineering. The environment is constrained. Variability is low. The edge cases are limited. That is not the future most people are trying to build toward.
The real frontier is in environments like:
These environments require multiple modes of control and intelligence to work together in the same system:
The more open-ended the environment, the more important the integration layer becomes. A robot that cannot fluidly move between those modes is not really robust. It is just a demo that works when conditions are favorable.
The most obvious place to see the need for a shared protocol is manipulation. Manipulation is where the hardest problems pile on top of each other: perception, contact, uncertainty, dexterity, force control, recovery, human intervention, and training.
Take a warehouse picking robot.
Today, a real-world deployment might include:
Every integration point is fragile. Every handoff is custom. Every logging pipeline is inconsistent. Every training dataset requires cleanup and translation before it can be reused.
Now imagine a shared protocol layer in between.
The perception system publishes objects, poses, affordances, and uncertainty.
The robot publishes kinematics, end-effector state, control modes, force limits, fault states, and reachable workspace.
The task planner publishes goals, subtasks, constraints, and escalation logic.
The tele-op system can subscribe to the same state and take over only the part of the task that needs human help.
The training pipeline logs all of it in a standard format.
The simulator replays the same episode using the same definitions.
The foundation model reasons over the same task graph and state representation.
That is what a real interoperability layer would unlock: not just compatibility, but compounding.
A robot should be able to describe itself in a standardized way:
{
“robot_id”: “mm_warehouse_12”,
“platform_type”: “mobile_manipulator”,
“locomotion”: [“wheeled”],
“manipulators”: [
{
“arm_id”: “right_arm”,
“dof”: 7,
“payload_kg”: 10,
“reach_m”: 1.2
}
],
“end_effectors”: [
{
“type”: “parallel_gripper”,
“max_force_n”: 35,
“tactile_sensing”: true
}
],
“sensors”: [
“rgb_camera”,
“depth_camera”,
“wrist_force_torque”,
“joint_encoders”
],
“control_modes”: [
“joint_space”,
“cartesian_pose”,
“impedance”,
“shared_teleop”
]
}
A task should be able to arrive in a standard structure:
{
“task_id”: “pick_task_10027”,
“type”: “pick_and_place”,
“object”: {
“class”: “polybag”,
“sku”: “SKU-8821”,
“pose_estimate”: [0.31, 0.22, 0.14, 0.0, 1.57, 0.0],
“pose_confidence”: 0.74
},
“source”: “bin_B4”,
“target”: “tote_Z9”,
“constraints”: {
“max_grip_force_n”: 12,
“avoid_crushing”: true,
“time_limit_s”: 18
},
“fallback”: {
“teleop_allowed”: true,
“escalate_after_failures”: 2
}
}
A human handoff should be structured, not improvised:
{
“handoff”: {
“from”: “autonomy”,
“to”: “remote_operator”,
“scope”: [“wrist_rotation”, “grip_force”],
“keep_autonomous”: [“base_stability”, “collision_avoidance”],
“reason”: “low_confidence_final_grasp_alignment”,
“max_duration_s”: 10
}
}
A training episode should be logged in a reusable way:
{
“episode_id”: “ep_544002”,
“task_type”: “bin_pick”,
“observations”: {
“vision”: “uri://frames/ep_544002”,
“robot_state”: “uri://state/ep_544002”,
“force_torque”: “uri://ft/ep_544002”,
“tactile”: “uri://tactile/ep_544002”
},
“actions”: {
“policy_commands”: “uri://actions/ep_544002”,
“human_override_segments”: [
{
“start_ms”: 8210,
“end_ms”: 11520,
“operator_id”: “operator_4”
}
]
},
“outcome”: {
“success”: true,
“completion_time_ms”: 16310,
“recovery_used”: true
}
}
None of this is exotic. That is the point. It is basic infrastructure that should already exist in a standardized form.
Haptics is one of the clearest examples of a capability that should be part of the shared protocol layer.
Today, haptics is often treated as an accessory to teleoperation. But in a mature robotics stack, haptics should be much more than that.
It should be:
Imagine a remote operator guiding a robot through a delicate cable insertion task, a valve turn, a surgical motion, or a damaged-parts extraction. The operator’s force adjustments, hesitation, compensation, and contact patterns should not disappear into a closed loop. They should become structured data the whole system can use.
That is how tele-op becomes a path to autonomy rather than a parallel dead-end stack.
The industry often frames teleoperation as something autonomy will eventually replace. That is too simplistic.
Tele-op plays at least four durable roles in serious robotic deployments:
The strongest robotic systems will not choose between tele-op and autonomy. They will integrate both cleanly.
A shared protocol would let a tele-op system work across different robot platforms without requiring a rebuild every time. It would let remote operators use consistent abstractions for state, control authority, intervention, and safety constraints. It would let operator actions flow into standardized datasets that improve autonomy over time.
That is a much more powerful model than treating tele-op as just a labor layer attached to a brittle robot.
A lot of robotics commentary assumes that better models will naturally absorb the integration problem. They will not.
Foundation models and VLMs are useful only to the extent that they can interact with the physical world through reliable abstractions.
A model may understand that a tool is partially occluded, that a handle is probably graspable, or that a failed insertion likely requires a rotation before retrying. But unless the robot stack exposes standardized primitives around world state, action space, control modes, and uncertainty, the model is still trapped inside a custom wrapper for each deployment. That prevents real portability.
A shared protocol would let models do things like:
That is how model progress actually becomes deployment progress.
Although manipulation is the clearest starting point, this protocol layer matters across the full spectrum of robotic systems.
In drones, it would unify mission planning, autonomy, payload control, operator intervention, and perception outputs.
In autonomous ground systems, it would unify navigation, sensor fusion, tele-op fallback, and mission-level commands.
In maritime robotics, it would connect autonomy, sparse human supervision, degraded communications, and platform-agnostic mission control.
In industrial robotics, it would make higher-mix manufacturing more modular and less dependent on one-off integration.
In agriculture, it would connect mobility, sensing, actuation, and human supervision across many crop and task types.
In surgical or assistive robotics, it would unify haptics, supervision, safety logic, and precision manipulation interfaces.
The embodiments differ. The integration problem is the same.
A serious MCP layer for robotics should standardize the core primitives of interaction:
Capability discovery: What can this machine sense, reach, manipulate, carry, or tolerate safely?
World representation: What objects, humans, obstacles, surfaces, and affordances exist in the scene, and with what uncertainty?
Task specification: What is the goal, what are the subtasks, what constraints matter, and what counts as success?
Action abstraction: What actions can be requested at a high level, and how do they map to control modes?
Human intervention: How does the system request help, hand off control partially or fully, log intervention, and resume autonomy?
Feedback streams: How are vision, tactile, force, audio, and telemetry represented so they can be consumed across tools?
Safety semantics: What does degraded mode mean, when is intervention required, and how are confidence thresholds expressed?
Training and replay: How are trajectories, demonstrations, corrections, outcomes, and contexts logged for reuse?
Simulation portability: How do real and simulated episodes share common task and state representations?
That is the level where the industry needs convergence.
The best version of this protocol does not merely connect software modules. It becomes the common operating grammar of embodied intelligence. It allows a robot to learn from a human correction in one environment and reuse that lesson elsewhere.
It allows a foundation model to reason over a task in a platform-agnostic way.
It allows tele-op to become a scalable bridge to autonomy rather than a dead-end service layer.
It allows tactile, visual, and semantic information to be fused in reusable formats.
It allows heterogeneous machines (arms, drones, humanoids, AMRs, underwater vehicles, surgical systems, field robots) to participate in a broader ecosystem instead of living inside closed vertical stacks.
That is what turns robotics from a collection of bespoke systems into a true platform economy.
This is not just a technical standards conversation. It is an economic one.
A real interoperability layer would reduce the cost of deploying robotics by lowering the amount of custom integration required at every site and every workflow. It would reduce vendor lock-in. It would make it easier for customers to adopt best-of-breed systems instead of betting everything on one vertically integrated provider.
It would also make data more valuable. Right now, a huge amount of robotic training data is trapped inside proprietary schemas and deployment-specific stacks. Standardization would make more of that data reusable across tasks, sites, and even embodiments.
It would also accelerate the rate at which model improvements can propagate into the field. A better policy, planner, or reasoning system becomes more valuable when it can plug into many robotic environments instead of one.
That is how ecosystems scale.
The biggest long-term winners in robotics may not just be the companies with the best arm, the best hand, the best model, or the best tele-op interface. They may be the companies that help define the grammar that lets the rest of the industry interoperate.
Because robotics is reaching the stage where the bottleneck is no longer just “can this one thing work?” It is increasingly “can many things work together reliably enough to deploy at scale?”
That is an infrastructure question. And infrastructure questions tend to determine who compounds.
The robotics industry still tends to treat integration as downstream plumbing, something to solve after the exciting breakthroughs are built. That is backwards. Integration is not the boring part. It is the part that determines whether breakthroughs remain isolated or become systemic.
The next major leap in robotics may not come from a new model architecture or a novel end effector alone. It may come from the shared protocol layer that allows all of those advances to connect. That is the missing standard.
An MCP for the physical world.
Not because robotics needs less innovation, but because it needs a way for innovation to compound across the stack.
Are you building this? Apply now.
As we explore the unknown of each new investment, our Field Guides are where we document all that we learn along the way.
So, whether you’re actively raising, trying to break into VC, or interested in our game-changing portfolio, our Field Guide's got you covered.
Sign up now for exclusive access to funding opportunities, events/resources from our network of experts, updates from our portfolio, and more!