Skip to content
world

Thinking Machines wants to build an AI that actually listens while it talks

A startup called Thinking Machines is rethinking the fundamental rhythm of AI conversation — building a model that can process your input and generate a response at the exact same time.

·ottown·3 min read
Thinking Machines wants to build an AI that actually listens while it talks
153

Every AI assistant you've ever used follows the same conversational rhythm: you speak, it listens, it responds, you listen. It's less like talking to someone and more like exchanging voice memos. A startup called Thinking Machines wants to blow up that model entirely.

The company is developing an AI that can process your input and generate a response simultaneously — turning what currently feels like a structured exchange into something closer to a real phone call.

Why the current approach feels off

Today's large language models are designed to be sequential. You finish your sentence, the model processes it as a complete prompt, and then — and only then — it begins crafting a reply. That architecture is baked deep into how these systems are trained and deployed.

The result is a subtle but persistent sense of artificiality. Even the best voice AI assistants have a beat of silence before they respond, and they can't adjust mid-sentence if you change what you're saying. Once you've stopped talking, you're locked into whatever half-formed thought you managed to get out.

Thinking Machines is betting that this gap is the single biggest barrier between AI assistants and genuinely natural conversation.

What simultaneous processing would actually change

If an AI model can listen and respond at the same time, a few things become possible that aren't today.

First, the model could interrupt itself — or you — when it detects that you're mid-correction. If you start saying "actually, no, I meant—" the model doesn't have to wait for you to finish before it starts adjusting course. That's how humans talk.

Second, latency drops substantially. Right now, even on fast hardware, there's a gap between when you finish speaking and when the AI starts replying. Parallel processing would compress that gap to near-zero.

Third, the conversation feels less like a form you're filling out and more like a dialogue. That shift matters more than it sounds — people communicate differently when they know the other party is tracking them in real time.

A technically difficult problem

Simultaneous input-output processing is not a trivial engineering challenge. Standard transformer architectures assume your prompt is complete before generation starts. Rebuilding that assumption requires rethinking how the model attends to context, how it manages competing information streams, and how it handles the case where your new input contradicts what it's already started generating.

Thinking Machines hasn't published technical details about how it's approaching those problems, but the ambition itself is notable. Most AI labs are focused on making sequential models faster, smarter, and cheaper. Thinking Machines is questioning whether the sequence itself is the right structure.

The bigger picture for AI interaction

This kind of research has implications that extend well beyond consumer voice assistants. Real-time AI in customer service, medical consultations, education, and accessibility tools all depend on conversation feeling natural and responsive. The phone-call versus text-chain distinction isn't just a UX preference — it affects how much people trust and rely on what the AI says.

If Thinking Machines can demonstrate that simultaneous listening and talking is achievable at scale, it could push the entire industry toward rethinking the conversational loop that's been standard since the earliest chatbots.

For now, the concept is compelling enough to watch closely.

Source: TechCrunch

Stay in the know, Ottawa

Get the best local news, new restaurant openings, events, and hidden gems delivered to your inbox every week.