Back
 
Blog

The missing execution layer

The dream of mass automation has existed for decades. The technology to achieve it finally exists. What has been missing is not smarter models or better robots. It is the execution layer that connects them to reality.

Foundation Models Are Here. Now What?

The last two years have produced a genuine leap in AI capabilities for robotics. Large pre-trained models that generalize across tasks and environments have transformed what is possible in perception, planning, and control. Machines can now recognize and locate objects in three dimensions with remarkable accuracy. Modern vision systems produce rich, detailed spatial data that would have been impossible with traditional methods a few years ago.

Equally important is the hardware that makes these advances deployable at the edge. NVIDIA’s compute stack, from embedded modules to GPU-accelerated inference pipelines, has made it possible to run advanced AI perception in real time, inside an industrial cell, without a data center. This is not a minor detail. For years, the limiting factor was not model quality but latency: systems that could not produce results fast enough to participate in a real-time control loop were, for practical purposes, useless in manufacturing. That constraint has now been lifted.

This is real progress. We build on these advances at Inbolt. The question is not whether AI matters for robotics. It does. The question is what problem you apply it to.

The dream of mass automation has never been about robot capability. It has always been about integration cost. The robot that can deploy itself is worth more than the most sophisticated robot that needs six months of engineering to install.

Albane Dersy

Co-founder & COO, Inbolt

The Real Blocker: Integration Cost, Not Robot Capability

Ask any manufacturing engineer why automation has remained out of reach for most of the industry and you will not hear “the robots are not capable enough.” Robots have been capable for decades. The answer is always the same: integration is too expensive, too slow, and too fragile.

A traditional robot cell is not just a robot. It is a system:

  • Proximity sensors to detect part presence
  • Conveyor encoders feeding position data into a PLC
  • Fixture-mounted switches confirming part loading
  • A logic controller orchestrating all of it, wired to the robot controller
  • An integrator who spends weeks wiring, programming, tuning, and debugging

The cost of the robot itself is often the one of the smallest line items. The integration is where the budget goes. And because every cell is a custom engineering project, none of that work transfers. Each deployment starts from scratch.

This is why automation has historically been the domain of large OEMs with the capital and engineering resources to absorb these costs. For smaller manufacturers, for short production runs, for any operation that requires genuine flexibility, traditional robot automation has simply not been economically viable. The technology was capable. The integration cost made it inaccessible.

The path to mass automation is not a better robot. It is the elimination of integration costs.

Vision as the Universal Sensor

Here is the insight that changes the economics entirely: a robot that can see does not need most of what makes integration expensive.

Part presence detection? Vision handles it. Fixture alignment confirmation? Vision measures it directly from the part geometry. Position of the part on the conveyor? Vision tracks it continuously. No encoder, no switch, no calibration routine.

Each sensor that vision replaces is not just a hardware saving. It is a reduction in integration complexity that compounds. Fewer sensors means fewer PLC I/O points. Fewer I/O points means simpler logic. Simpler logic means less programming, less debugging, less documentation, less validation. The integration cost does not decrease linearly. It collapses.

The autonomous robot station, a cell that requires only power, a network connection, and a CAD file to deploy, becomes achievable. Not because the robot got smarter in some abstract sense, but because vision has replaced the entire sensory infrastructure that previously required custom engineering for every deployment.

Why the Execution Layer Is the Missing Piece

Every manufacturing engineer knows this pain. You spend weeks in an offline programming environment building precise, CAD-grounded trajectories, validating every waypoint, tuning every approach vector. The digital model is perfect. Then the program goes live and the robot flies blind: executing its path rigidly, with no awareness of where the part actually is.

A 2mm shift in fixture position. A part tolerance at the edge of spec. A gripper that wears slightly over ten thousand cycles. The robot does not know. It keeps executing the nominal path until it jams, misses, or produces scrap. The integrator is called back. The cell stops. The program gets manually retuned, again.

The link between the digital program and physical reality has always been broken. Not because the programming tools were bad, and not because the robots were dumb. Because there was no execution layer: no system to take the intent encoded in the offline program and continuously reconcile it with what is actually happening in the cell, in real time, at servo frequency.

A robot cell that requires manual retuning every time a fixture wears or a supplier delivers parts at the edge of tolerance is not truly autonomous. Vision without a real-time execution layer gives you better diagnosis of the problem. It does not give you a system that fixes itself. The autonomous station requires both: the vision to see what is happening, and the execution layer to act on it continuously.

Why VLAs Miss the Point

The dominant narrative in physical AI right now is that Vision-Language-Action models will eventually replace offline programming entirely. The robot will look at the scene, understand the task, and figure out the motion on its own. No CAD, no OLP, no integration work.

For manufacturing, this solves a problem that does not exist while ignoring the one that does.

What VLAs are optimized for: novel scenes, unknown objects, ambiguous tasks, general reasoning from language or demonstration, flexibility across diverse environments, research cobots, probabilistic outputs. What manufacturing actually needs: known parts, known tasks, executed with micron precision, faithful and validated trajectory execution, robustness to real-world variance, FANUC, KUKA, ABB, Yaskawa, deterministic and certifiable behavior.

In a factory, the scene is known. The part is known. The task is known. What changes is the small, relentless gap between the nominal world encoded in the program and the physical world the robot actually operates in. A VLA that can handle arbitrary scene novelty is engineering overkill for this problem, and brings none of the precision, determinism, or hardware compatibility that production environments demand. More importantly, it does not solve the integration cost problem. It replaces one form of custom engineering with another.

The Inbolt Approach: Frontier AI, Zero Integration

Inbolt is built on the conviction that the right answer is not to abandon the structure that manufacturing provides, but to use the best available AI to exploit it fully, and in doing so, collapse the integration cost that has kept automation inaccessible.

We integrate frontier models at every stage of our pipeline, running on NVIDIA’s edge compute infrastructure, chosen precisely because they unlock capabilities that classical methods cannot match. But we apply them within the constraints of industrial reality: known geometry, validated programs, production hardware, deterministic execution.

Frontier models for perception, grounded in CAD. We apply state-of-the-art frontier models for both pose estimation and depth perception, trained on synthetic data rendered from CAD with no physical samples required. A new part can be onboarded in minutes, not weeks, with no data collection and no labeling pipeline. This is how vision becomes a universal sensor: one system, one CAD file, and the robot knows where every part is.

Offline programming, preserved and extended. Engineers continue to program in the digital twin, the validated, CAD-grounded workflow that production environments require. The difference is what happens when that program runs. Instead of executing blindly, the robot operates within our execution layer, which continuously corrects its trajectory against the actual observed pose of the part. No PLC logic for part presence. No fixture sensors. No wiring.

Real-time servo control, robot-agnostic. Our control pipeline streams corrected joint commands into the servo loop of industrial robots at native control frequency, on FANUC, Universal Robots, KUKA, ABB, Yaskawa and others. This is the piece the industry has been missing: a high-frequency, low-latency bridge between AI perception and industrial motion control that handles the jerk limits, fault recovery, and protocol differences of each platform.

What This Unlocks: Mass Automation

When the execution layer works, the economics of robot deployment change fundamentally. A new part variant no longer requires weeks of reprogramming and sensor reconfiguration. Update the CAD, and the vision model, the trajectory generator, and the part detection logic all follow automatically. What used to take a skilled integrator a month takes hours.

The robot station is no longer brittle to real-world variance. Fixture wear, part tolerances, thermal drift: all of the small imperfections that silently degrade production quality over time are continuously compensated. The robot always knows where the part actually is, and always executes against that ground truth.

And because vision has replaced the peripheral sensing infrastructure, the cell itself is simpler. Fewer components. Less wiring. Less PLC logic. Less to go wrong, less to maintain, less to redeploy when the product changes. The autonomous station that requires only power, a network connection, and a CAD file becomes achievable for manufacturers who could never have justified traditional automation.

This is what mass automation actually means. Not robots that are more impressive. Robots that are cheap enough, fast enough to deploy, and robust enough to operate that they become accessible to the full breadth of manufacturing, not just the largest OEMs.

The technology has been ready for years. The execution layer is what was missing.

The race in physical AI will not be won by the team with the most general model. It will be won by the team that makes automation accessible: deployable in days rather than months, on the hardware the world already owns, with a cost structure that works for every manufacturer, not just the largest ones.

Share