<strong>Robotics Needs Its Missing Standard: An MCP for the Physical World</strong>

Back

RELATED BLOGS

Robotics Needs Its Missing Standard: An MCP for the Physical World

5 Questions I Ask Before Investing

Fitmob and ClassPass Team Up to Revolutionize Fitness

Airbnb, My $1 Billion Lesson

HONK. The startup disrupting AAA

Robotics Needs Its Missing Standard: An MCP for the Physical World

By Jordan Kretchmer, Senior Partner at Outlander VC

Robotics does not have a model problem nearly as much as it has an integration problem.

Yes, models are getting better. Vision-language models are improving. Teleoperation is improving. Haptics are improving. Dexterous manipulation is improving. Simulation is improving. Training pipelines are improving. Foundation models for robotics are improving. But almost every major piece of the stack is still being built as its own island.

That is why the industry keeps producing impressive technical breakthroughs without corresponding deployment scale. The components are getting better individually, but the system as a whole is still fragmented.

Robotics needs its missing protocol layer.

It needs an MCP for the physical world: a shared standard that lets perception, control, teleoperation, haptics, training, data, simulation, foundation models, and real machines all interoperate through a common interface. Until that exists, robotics will continue to advance in silos, and the cost of stitching everything together will keep slowing the entire industry down.

The current stack is fragmented by design

Right now, most robotics systems are assembled like custom projects, not interoperable products.

One company builds a tele-op layer. Another builds dexterous hands. Another builds foundation models. Another builds control software. Another builds haptic interfaces. Another builds simulation tooling. Another builds training infrastructure. Another builds robot arms, mobile bases, humanoids, drones, or underwater vehicles.

Each layer may be world-class. But the interfaces between them are usually bespoke.

A tele-op system has its own control API.
A hardware platform has its own command structure.
A manipulation stack has its own task representation.
A VLM has its own wrapper around perception and planning.
A dataset pipeline has its own logging schema.
A simulator has its own object definitions.
A human operator console has its own notion of intervention and override.

That means almost every serious deployment still requires custom glue code across the entire stack. This is a tax on the industry. And it is getting larger as the stack gets more powerful. The problem is no longer that we do not have enough interesting robotics technologies. The problem is that they do not compose.

What the missing layer actually is

The industry does not need one winning robot architecture. It needs a shared language. An MCP for the physical world would be a protocol layer that standardizes how robotic systems describe and exchange:

capabilities
world state
object representations
task definitions
action requests
control modes
safety boundaries
operator intervention
confidence and uncertainty
feedback signals
execution traces
training data

This is not glamorous, which is exactly why it matters.

Infrastructure layers are often less visible than the products they enable, but they are what allow ecosystems to form. In software, standards and protocols turned isolated systems into platforms. Robotics is approaching the point where it needs the same thing.

Without that layer, every robotics company is forced to behave like a vertically integrated island, even when the best future for the industry is modular.

Why this matters now

This issue becomes far more urgent as robotics moves from tightly controlled industrial workflows into semi-structured and unstructured environments.

A robot in a fixed production cell can get away with a lot of custom engineering. The environment is constrained. Variability is low. The edge cases are limited. That is not the future most people are trying to build toward.

The real frontier is in environments like:

mixed-SKU warehouses
higher-mix manufacturing
logistics yards
construction sites
agriculture
hospitals
homes
field maintenance
defense environments
remote infrastructure inspection and repair

These environments require multiple modes of control and intelligence to work together in the same system:

autonomy
remote operation
shared control
haptic feedback
semantic reasoning
motion planning
force control
policy learning
safety supervision
human escalation

The more open-ended the environment, the more important the integration layer becomes. A robot that cannot fluidly move between those modes is not really robust. It is just a demo that works when conditions are favorable.

The clearest example: manipulation

The most obvious place to see the need for a shared protocol is manipulation. Manipulation is where the hardest problems pile on top of each other: perception, contact, uncertainty, dexterity, force control, recovery, human intervention, and training.

Take a warehouse picking robot.

Today, a real-world deployment might include:

cameras and depth sensors from one stack
object detection from another
a VLM for semantic understanding
a task planner from another vendor
robot arm controls from the OEM
a custom gripper
a teleoperation interface for fallback
a separate training pipeline
a separate simulator
a separate dashboard for monitoring and QA

Every integration point is fragile. Every handoff is custom. Every logging pipeline is inconsistent. Every training dataset requires cleanup and translation before it can be reused.

Now imagine a shared protocol layer in between.

The perception system publishes objects, poses, affordances, and uncertainty.

The robot publishes kinematics, end-effector state, control modes, force limits, fault states, and reachable workspace.

The task planner publishes goals, subtasks, constraints, and escalation logic.

The tele-op system can subscribe to the same state and take over only the part of the task that needs human help.

The training pipeline logs all of it in a standard format.

The simulator replays the same episode using the same definitions.

The foundation model reasons over the same task graph and state representation.

That is what a real interoperability layer would unlock: not just compatibility, but compounding.

Literal (overly simplified) examples of what this could look like

A robot should be able to describe itself in a standardized way:

{

“robot_id”: “mm_warehouse_12”,

“platform_type”: “mobile_manipulator”,

“locomotion”: [“wheeled”],

“manipulators”: [

{

“arm_id”: “right_arm”,

“dof”: 7,

“payload_kg”: 10,

“reach_m”: 1.2

}

“end_effectors”: [

{

“type”: “parallel_gripper”,

“max_force_n”: 35,

“tactile_sensing”: true

}

“sensors”: [

“rgb_camera”,

“depth_camera”,

“wrist_force_torque”,

“joint_encoders”

“control_modes”: [

“joint_space”,

“cartesian_pose”,

“impedance”,

“shared_teleop”

]

}

A task should be able to arrive in a standard structure:

{

“task_id”: “pick_task_10027”,

“type”: “pick_and_place”,

“object”: {

“class”: “polybag”,

“sku”: “SKU-8821”,

“pose_estimate”: [0.31, 0.22, 0.14, 0.0, 1.57, 0.0],

“pose_confidence”: 0.74

“source”: “bin_B4”,

“target”: “tote_Z9”,

“constraints”: {

“max_grip_force_n”: 12,

“avoid_crushing”: true,

“time_limit_s”: 18

“fallback”: {

“teleop_allowed”: true,

“escalate_after_failures”: 2

}

A human handoff should be structured, not improvised:

{

“handoff”: {

“from”: “autonomy”,

“to”: “remote_operator”,

“scope”: [“wrist_rotation”, “grip_force”],

“keep_autonomous”: [“base_stability”, “collision_avoidance”],

“reason”: “low_confidence_final_grasp_alignment”,

“max_duration_s”: 10

}

A training episode should be logged in a reusable way:

{

“episode_id”: “ep_544002”,

“task_type”: “bin_pick”,

“observations”: {

“vision”: “uri://frames/ep_544002”,

“robot_state”: “uri://state/ep_544002”,

“force_torque”: “uri://ft/ep_544002”,

“tactile”: “uri://tactile/ep_544002”

“actions”: {

“policy_commands”: “uri://actions/ep_544002”,

“human_override_segments”: [

{

“start_ms”: 8210,

“end_ms”: 11520,

“operator_id”: “operator_4”

}

]

“outcome”: {

“success”: true,

“completion_time_ms”: 16310,

“recovery_used”: true

}

None of this is exotic. That is the point. It is basic infrastructure that should already exist in a standardized form.

Haptics should not be trapped in proprietary systems

Haptics is one of the clearest examples of a capability that should be part of the shared protocol layer.

Today, haptics is often treated as an accessory to teleoperation. But in a mature robotics stack, haptics should be much more than that.

It should be:

a live feedback channel for operators
a training signal for policy learning
a replayable data stream in logged episodes
a simulation input and output
a safety signal for fragile tasks
a bridge between human dexterity and machine autonomy

Imagine a remote operator guiding a robot through a delicate cable insertion task, a valve turn, a surgical motion, or a damaged-parts extraction. The operator’s force adjustments, hesitation, compensation, and contact patterns should not disappear into a closed loop. They should become structured data the whole system can use.

That is how tele-op becomes a path to autonomy rather than a parallel dead-end stack.

Tele-op is not a crutch. It is part of the architecture.

The industry often frames teleoperation as something autonomy will eventually replace. That is too simplistic.

Tele-op plays at least four durable roles in serious robotic deployments:

fallback for long-tail failures
supervision in safety-critical tasks
shared control for fine manipulation
data generation for future policy training

The strongest robotic systems will not choose between tele-op and autonomy. They will integrate both cleanly.

A shared protocol would let a tele-op system work across different robot platforms without requiring a rebuild every time. It would let remote operators use consistent abstractions for state, control authority, intervention, and safety constraints. It would let operator actions flow into standardized datasets that improve autonomy over time.

That is a much more powerful model than treating tele-op as just a labor layer attached to a brittle robot.

Foundation models need structure too

A lot of robotics commentary assumes that better models will naturally absorb the integration problem. They will not.

Foundation models and VLMs are useful only to the extent that they can interact with the physical world through reliable abstractions.

A model may understand that a tool is partially occluded, that a handle is probably graspable, or that a failed insertion likely requires a rotation before retrying. But unless the robot stack exposes standardized primitives around world state, action space, control modes, and uncertainty, the model is still trapped inside a custom wrapper for each deployment. That prevents real portability.

A shared protocol would let models do things like:

query robot capabilities
inspect scene state
propose subgoals
estimate confidence
request operator assistance
recommend recovery strategies
annotate failures
hand off cleanly to a lower-level controller

That is how model progress actually becomes deployment progress.

This applies across all robotics categories

Although manipulation is the clearest starting point, this protocol layer matters across the full spectrum of robotic systems.

In drones, it would unify mission planning, autonomy, payload control, operator intervention, and perception outputs.

In autonomous ground systems, it would unify navigation, sensor fusion, tele-op fallback, and mission-level commands.

In maritime robotics, it would connect autonomy, sparse human supervision, degraded communications, and platform-agnostic mission control.

In industrial robotics, it would make higher-mix manufacturing more modular and less dependent on one-off integration.

In agriculture, it would connect mobility, sensing, actuation, and human supervision across many crop and task types.

In surgical or assistive robotics, it would unify haptics, supervision, safety logic, and precision manipulation interfaces.

The embodiments differ. The integration problem is the same.

What the standard should cover

A serious MCP layer for robotics should standardize the core primitives of interaction:

Capability discovery: What can this machine sense, reach, manipulate, carry, or tolerate safely?

World representation: What objects, humans, obstacles, surfaces, and affordances exist in the scene, and with what uncertainty?

Task specification: What is the goal, what are the subtasks, what constraints matter, and what counts as success?

Action abstraction: What actions can be requested at a high level, and how do they map to control modes?

Human intervention: How does the system request help, hand off control partially or fully, log intervention, and resume autonomy?

Feedback streams: How are vision, tactile, force, audio, and telemetry represented so they can be consumed across tools?

Safety semantics: What does degraded mode mean, when is intervention required, and how are confidence thresholds expressed?

Training and replay: How are trajectories, demonstrations, corrections, outcomes, and contexts logged for reuse?

Simulation portability: How do real and simulated episodes share common task and state representations?

That is the level where the industry needs convergence.

What the best version of this becomes

The best version of this protocol does not merely connect software modules. It becomes the common operating grammar of embodied intelligence. It allows a robot to learn from a human correction in one environment and reuse that lesson elsewhere.

It allows a foundation model to reason over a task in a platform-agnostic way.

It allows tele-op to become a scalable bridge to autonomy rather than a dead-end service layer.

It allows tactile, visual, and semantic information to be fused in reusable formats.

It allows heterogeneous machines (arms, drones, humanoids, AMRs, underwater vehicles, surgical systems, field robots) to participate in a broader ecosystem instead of living inside closed vertical stacks.

That is what turns robotics from a collection of bespoke systems into a true platform economy.

The economic impact is bigger than the technical impact

This is not just a technical standards conversation. It is an economic one.

A real interoperability layer would reduce the cost of deploying robotics by lowering the amount of custom integration required at every site and every workflow. It would reduce vendor lock-in. It would make it easier for customers to adopt best-of-breed systems instead of betting everything on one vertically integrated provider.

It would also make data more valuable. Right now, a huge amount of robotic training data is trapped inside proprietary schemas and deployment-specific stacks. Standardization would make more of that data reusable across tasks, sites, and even embodiments.

It would also accelerate the rate at which model improvements can propagate into the field. A better policy, planner, or reasoning system becomes more valuable when it can plug into many robotic environments instead of one.

That is how ecosystems scale.

The winners may not be who the industry expects

The biggest long-term winners in robotics may not just be the companies with the best arm, the best hand, the best model, or the best tele-op interface. They may be the companies that help define the grammar that lets the rest of the industry interoperate.

Because robotics is reaching the stage where the bottleneck is no longer just “can this one thing work?” It is increasingly “can many things work together reliably enough to deploy at scale?”

That is an infrastructure question. And infrastructure questions tend to determine who compounds.

The robotics industry should stop treating integration as glue work

The robotics industry still tends to treat integration as downstream plumbing, something to solve after the exciting breakthroughs are built. That is backwards. Integration is not the boring part. It is the part that determines whether breakthroughs remain isolated or become systemic.

The next major leap in robotics may not come from a new model architecture or a novel end effector alone. It may come from the shared protocol layer that allows all of those advances to connect. That is the missing standard.

An MCP for the physical world.

Not because robotics needs less innovation, but because it needs a way for innovation to compound across the stack.

Are you building this? Apply now.

Share on:

Robotics Needs Its Missing Standard: An MCP for the Physical World

RELATED BLOGS

Robotics Needs Its Missing Standard: An MCP for the Physical World

The current stack is fragmented by design

What the missing layer actually is

Why this matters now

The clearest example: manipulation

Literal (overly simplified) examples of what this could look like

Haptics should not be trapped in proprietary systems

Tele-op is not a crutch. It is part of the architecture.

Foundation models need structure too

This applies across all robotics categories

What the standard should cover

What the best version of this becomes

The economic impact is bigger than the technical impact

The winners may not be who the industry expects

The robotics industry should stop treating integration as glue work

Jordan Kretchmer

FIELD GUIDE

Expert advice, startup tools,
& exclusive updates straight to your inbox.

Exciting stuff ahead—we promise!

Sign up for our monthly Field Guides:

Robotics Needs Its Missing Standard: An MCP for the Physical World

RELATED BLOGS

Robotics Needs Its Missing Standard: An MCP for the Physical World

The current stack is fragmented by design

What the missing layer actually is

Why this matters now

The clearest example: manipulation

Literal (overly simplified) examples of what this could look like

Haptics should not be trapped in proprietary systems

Tele-op is not a crutch. It is part of the architecture.

Foundation models need structure too

This applies across all robotics categories

What the standard should cover

What the best version of this becomes

The economic impact is bigger than the technical impact

The winners may not be who the industry expects

The robotics industry should stop treating integration as glue work

Jordan Kretchmer

FIELD GUIDE

Expert advice, startup tools,& exclusive updates straight to your inbox.

Exciting stuff ahead—we promise!

Sign up for our monthly Field Guides:

Expert advice, startup tools,
& exclusive updates straight to your inbox.