Voice as the first non-screen surface
A multi-device voice interface for enterprise CRM, designed by going to where the work actually happens — in cars, between meetings, anywhere except the desk the existing software was built for.
“Voice was the first interface that asked humans to delegate to software in plain language. The lessons from designing one of the early enterprise voice products are the lessons I am still applying to AI-native work today.”
Field reps did not have a CRM problem. They had a 'where the work happens' problem.
Tact.ai's bet was that the value of a CRM lives in the speed and accuracy of the data inside it — and that field sales reps, the people creating most of that data, were the worst-served users in enterprise software. They were filling in records hours after the fact, from memory, on a phone that was not designed for it.
The brief was to design a voice interface that let those reps update and retrieve CRM data hands-free, across mobile, tablet, and desktop, in a way that connected enterprise back-ends with the personal tools they already used.
Going to where reps actually update CRM
The single most useful thing I did on this project was research, and the research was useful because it left the office. I led interviews and observation across a cross-section of sales professionals — frequent travelers, inside sales, sales managers — but the interviews mattered less than understanding the moments around them. When does a rep actually update CRM? Not at their desk. In a car park between meetings. On the train home. In the ten minutes before the next call. Sometimes never, because the moment passed and the details faded.
Once you have watched that pattern enough times, the design problem reframes itself. The interface is not competing with other CRM interfaces. It is competing with not updating CRM at all. That is a much harder bar — and a much more useful one to design against.
This is the same reframing I keep coming back to in AI-native work. The competition is rarely another product. It is the version of the workflow where the human gives up and does it later, or not at all.
A parallel surface, not a feature
Once the research was clear, the design implications followed. Voice could not be a feature bolted onto the existing app — that would just give reps another reason to say 'I'll do it later.' It had to be a parallel surface with its own grammar, available everywhere they already were, that knew what they were doing and let them act on it conversationally.
Cross-device consistency was the second hard problem. The same intent needed to work on a phone in a parking lot, on a tablet at a kitchen table, and on a desktop between calls. We solved it by treating the voice grammar as the source of truth and the visual UI on each device as a reflection of state — not as competing interfaces.
Conversational UI is mostly silence
The hardest design decisions were not about how the assistant talked. They were about when it stayed quiet. Reps did not want to be greeted. They did not want to be asked clarifying questions when context was obvious. They wanted the voice surface to feel like a competent colleague who already knew the deal and could be told 'log a call with Acme, mark it productive, follow up next Tuesday' without ceremony.
Most voice products in that era were designed by people who had built mobile apps and were applying the same instincts to a new surface. The mobile-app instinct is to confirm, to delight, to chat. The right instinct for enterprise voice was almost the opposite — say less, do more, get out of the way.
The product shipped and was used in production by enterprise sales teams. The company reported meaningful improvements in CRM data timeliness, sales productivity, and data quality, though I would caveat that I was the lead designer on the experience, not the analyst on the numbers.
What I take from the work is less about the metrics and more about the method. The interaction principles that came out of it — that delegation is mostly about restraint, that multi-device consistency requires a single grammar, and that the real competition is the absence of the workflow rather than another product — are the principles I still apply to every AI-native surface I design.
The design layer for AI agents is a continuation of the design layer for voice agents. Different technology, same problem.
- 01Shipped voice surface across mobile, tablet, and desktop
- 02Reported gains in CRM data timeliness, sales productivity, and data quality
- 03Established interaction principles I still use in AI-native work today