AI Vendors Must Have the Same SLAs You Demand from AWS
The infrastructure layer cloud providers built in 2010 is still missing from enterprise AI compliance in 2026.
Nobody tells you this when you’re signing the contract.
You ask about features. You ask about integrations. You ask about pricing and support. You get a demo with a very smooth presenter who uses the words “agentic,” “real-time,” and “compliance-native” in every other sentence.
You do not ask, “How does your system decide which model handles my request?”
And the vendor — relieved you didn’t ask — moves on.
This is the infrastructure gap in TaxTech AI. And it is costing enterprises a lot more than they think.
What Actually Happens When You Hit Send
Let me walk you through what happens the moment you submit a prompt to an AI-powered compliance tool. Somewhere, a server rack with an AI chip receives your request. Your text gets converted into tokens — numerical representations of words and concepts. A model gets allocated GPU compute to process those tokens. The response streams back. Your account gets charged.
Simple, right?
Here is the part nobody talks about: in most TaxTech deployments today, the same model handles every single request. Whether you’re asking it to validate a VAT registration number (200 tokens, done in milliseconds) or reconcile 400,000 transaction lines against twelve EU continuous transaction control regimes (tens of thousands of tokens, multiple tool calls, minutes of compute) — it all goes to the same place.
No routing. No tier selection. No “this task is simple, use the cheaper model.” No “this task is complex, escalate.”
One model to rule them all. You pay for all of it.
Now let me tell you what this looks like at Uber scale, because I tested it.
I Asked for One Line. I Got a Redesign.
Earlier this year I ran a test with Anthropic’s Fable 5 — their highest-capability model at the time, since restricted for enterprise use outside the United States.
The task was simple. Embarrassingly simple. Recalculate a single line in a compliance spreadsheet based on updated input parameters.
What Fable 5 did instead: redesigned the entire spreadsheet architecture.
It reformatted the structure. Added conditional logic across twelve new columns. Restructured the formula dependencies. Generated explanatory commentary for every change. Produced a masterpiece of spreadsheet engineering that absolutely nobody asked for.
The model wasn’t broken. It was doing exactly what frontier models do — applying maximum capability to every request regardless of whether that capability is needed. Maximum intelligence. Maximum tokens. Maximum compute.
At Uber’s transaction volumes — millions of tax determination events per day, across dozens of jurisdictions — a task mismatch like this isn’t a funny anecdote. It’s a Token Effectiveness Ratio disaster that compounds quietly until it detonates your quarterly compute budget.
The fix isn’t a smarter model. The fix is a routing layer that catches the mismatch before it happens.
Do TaxTech vendors not have one?
What AWS Figured Out Fifteen Years Ago
AWS didn’t launch EC2 in 2006 and say “good luck figuring out the routing.”
They spent the next decade building the layer that makes the hard part invisible:
1️⃣ Load balancing. Incoming requests get distributed across multiple compute instances. The month-end close traffic spike doesn’t bring the system down. Someone already solved that.
2️⃣ Auto-scaling. Capacity expands during a VAT filing deadline and contracts at 3am on a Tuesday. You pay for what you use. Someone already solved that, too.
3️⃣ Instance tier routing. A background batch job doesn’t get the same resources as a real-time checkout event. AWS routes intelligently. The user doesn’t choose the rack, and he shouldn’t have to.
4️⃣ Availability zone failover. If one data center degrades, traffic reroutes automatically. Your 99.99% SLA is enforced at the infrastructure layer. Not the application layer. The infrastructure layer.
5️⃣ Spot vs. on-demand pricing. Non-urgent workloads run on spare capacity at a fraction of the cost. Latency-sensitive operations get premium compute. The decision is automatic.
6️⃣ Request-level observability. You can audit exactly which requests ran, on what instance type, for how long, and at what cost. Every dollar is traceable.
7️⃣ Agentic governance. At the AWS New York Summit in 2026, AWS introduced release management capabilities for its DevOps agent specifically because parallel AI coding creates a compounding risk of failure. As David Yanacek, AWS Senior Principal Engineer for Agentic AI, put it: “Coding agents are getting a ton done in parallel. The more that you’re doing in parallel, the higher the probability that the resulting train will not be able to leave the station because one of these things has a bug in it.” AWS also introduced Continuum, a cybersecurity tool designed to detect vulnerabilities in AI models before they reach production. The result: a task that took eight minutes in January takes three minutes in May — not because the agents got smarter, but because the governance layer got better.
Now. Imagine asking an enterprise buyer to choose which physical rack handles their AWS request. They would look at you like you’d lost your mind. That’s not a user decision. That’s an infrastructure decision. One that AWS made for you, automatically, fifteen years ago.
In 2026, asking a CTO to “select the LLM model” for their TaxTech AI prompt — without any of the above abstraction — is the exact same thing.
They will be lost. They are right to be lost.
The TaxTech AI Gap
In 2026, none of the major TaxTech AI deployments publish anything equivalent to what AWS documents on day one.
Go ahead and try.
Ask Avalara how their ALFA framework decides whether your request goes to a large or small language model. Ask Thomson Reuters what ONESOURCE+’s failover logic looks like during a provider outage. Ask Sovos what model tier handles a simple tax code lookup versus a complex cross-border VAT determination. Ask Anrok what happens to your in-flight compliance request if their primary LLM provider has a brownout at the month-end close.
You won’t get an architecture document.
You’ll get a product brochure.
And to be fair — the product surface area is genuinely impressive. As I mapped in The 2026 AI TaxTech Map, Avalara’s ALFA framework combines large and small language models across 1,400+ application integrations. Thomson Reuters launched ONESOURCE+ as an “Intelligent Compliance Network.” Sovos shipped Sovi AI. Anrok built Atlas with Big 4-verified outputs. The teams building these products are getting there.
But “combines large and small language models” is not a routing architecture. It’s a marketing line.
What governs the routing decision? What’s the threshold? What happens on failover? What’s the uptime SLA for the AI layer specifically — not the underlying AWS or Azure layer the vendor is running on, but the AI orchestration layer?
Nobody discloses this. Probably nobody asks for it either.
That’s the gap.
A Brief, Painful Look at Vendor Reality
Let’s be specific, because specificity is the point.
Avalara deserves credit for the ALFA framework — and their MCP server integration signals genuine infrastructure thinking. But there is no published routing logic. No compute tier disclosure. No AI-specific SLA. What exactly triggers the “use the small model” decision? Unknown.
Thomson Reuters ONESOURCE+ — “Intelligent Compliance Network.” What does the intelligence routing look like? What’s the failover path during a provider degradation? Not disclosed.
Sovos Sovi AI and Anrok Atlas both lead with the output story. Anrok’s combination of AI and human tax experts is smart, accuracy risk management. It is a quality control layer, not an infrastructure layer. Those are different problems.
Xero and Intuit build AI through partnerships — including Xero’s deep integration with Anthropic, which I covered here. Partnership architecture creates a second question: when Xero routes your request through Anthropic’s API, which model tier handles it? Which region? What’s the fallback? Those answers live inside the partnership agreement. They are not visible to you, the user.
Pattern: strong on what the AI does. Silent on how it’s governed.
Caps Are Not Architecture. They’re Surrender.
The enterprise response to AI budget overruns in 2026 has been impressively uniform.
Uber burned its entire AI budget by April — Uber’s CTO confirmed it publicly. Amazon capped it. Microsoft capped it. Meta capped it. The industry response to “AI is costing us a fortune” was: put a ceiling on it.
Respectable as a short-term move. Completely missing the point.
A cap is a circuit breaker. It stops the bleeding when you hit the limit. But it does nothing about what happened before the limit. If your AI compliance stack is routing 40,000-token tasks to frontier models when 2,000-token tasks to a smaller model would have produced the exact same compliance outcome, you will hit your cap faster. And you will hit it on waste, not on value.
The agentic dimension makes this worse, not better. Agentic TaxTech workflows — multiple AI agents running reconciliation in parallel, cross-referencing jurisdictions, and calling external APIs simultaneously — amplify the risk of compounding failures. AWS’s own David Yanacek flagged this exact problem at the 2026 NY Summit: the more you run in parallel, the higher the probability that one agent’s bug stops the entire output from clearing. In TaxTech terms: one misfiring reconciliation agent doesn’t just produce a bad output. It can hold up the entire compliance filing.
Are TaxTech vendors addressing this?
As I detailed in The Future Cost of TaxTech, the unit economics of AI compliance are brutal at scale. A usage cap doesn’t fix unit economics. It just tells you when you’ve run out.
The AWS analogy again: imagine AWS gave you a monthly compute budget and no load balancer. You’d burn the budget on poorly routed requests, hit the ceiling, and call it a month. Nobody would accept that from a cloud provider.
A cap is not a substitute for architecture. It is an admission that architecture is missing.
(That sentence. Read it again. That’s the one to send your ELTs.)
Five Questions. Before You Sign Anything.
If you are evaluating or renewing a TaxTech AI contract in the next twelve months, your vendor should be able to answer all five of these. No exceptions. No “we’ll get back to you on that.”
If they stall, escalate to the product team. If the product team stalls, you have your answer.
1️⃣ How does your system determine which model tier handles a given request?
A mature answer: routing logic based on task complexity, instruction length, and domain. A weak answer: one model handles everything, or worse — “the user selects the model.” (Yes, this happens. No, it’s not acceptable.)
2️⃣ What is your AI-specific uptime SLA, and what is your failover path when a primary LLM provider degrades?
A mature answer: named failover path, detection latency, recovery time objective. A weak answer: “we rely on our provider’s SLA.” That is not your SLA. That is their SLA, applied to them, not to you.
3️⃣ Do you implement semantic caching for repeated compliance queries?
A mature answer: yes, with a description of how near-identical queries are matched. A weak answer: no mention of caching. (Semantic caching alone can cut compute costs by 40–60% for structured compliance workloads. If your vendor isn’t doing it, you are paying for redundant compute every single day.)
4️⃣ How do you prevent runaway token consumption in agentic workflows?
A mature answer: circuit breakers, per-task token budgets, human escalation triggers. A weak answer: usage caps at the contract level. See the previous section on why that’s not the same thing.
5️⃣ Can you provide task-level compute telemetry — model tier used, token consumption, latency, and cost — per request?
A mature answer: yes, available in account reporting. A weak answer: aggregate spend data only. (If you can’t see task-level telemetry, you cannot manage your Token Effectiveness Ratio. You’re flying blind with a very expensive instrument panel.)
Your cloud provider answered all five of these before you signed. Your TaxTech AI vendor should too.
If they can’t — you’re not buying managed AI infrastructure. You’re buying model access with a compliance-themed UI on top. The infrastructure risk sits on your P&L. Not theirs.
The companies that asked the right cloud infrastructure questions in 2012 built the most resilient platforms of the 2010s.
The ones signing enterprise AI contracts in 2026 without asking the infrastructure questions are building the most expensive compliance technical debt of the 2030s. And they’ll call it “AI transformation” right up until the end.
Don’t be that company. Ask the questions.
What answers have you gotten from your TaxTech AI vendors? I’m genuinely curious. Drop them in the comments.
Frequently Asked Questions
What is an AI orchestration layer, and why does it matter for enterprise tax compliance? An AI orchestration layer sits between your application and LLM providers, routing each request to the appropriate model tier based on task complexity, cost, and latency. Without it, every request — from a simple tax code lookup to a complex cross-border determination — hits the same model at the same cost. At enterprise transaction volumes, that’s not a technical inconvenience. It’s a structural margin problem.
How should I evaluate AI infrastructure when selecting a TaxTech vendor? Ask five questions before signing: How is model tier selection governed? What is the AI-specific uptime SLA and failover path? Is semantic caching implemented? How are runaway agentic workflows stopped before they exhaust compute budgets? Is task-level token telemetry available in account reporting? A vendor who cannot answer all five is selling model access, not managed AI infrastructure.
What is the difference between an AI gateway and a usage cap on LLM consumption? A usage cap stops spending when a threshold is hit. An AI gateway optimises spending before it happens — routing simple tasks to cheaper models, caching repeated queries, preventing runaway loops. A cap is a circuit breaker. A gateway is the circuit design. You need both, in that order.
Why does model selection matter for automated tax calculations? Different compliance tasks have wildly different complexity profiles. Recalculating a single tax line requires a fraction of the compute needed to reconcile multi-jurisdictional indirect tax exposure across 90 countries. Without intelligent routing, both tasks hit the same frontier model. At scale, that’s millions of dollars in unnecessary compute spend per year.
What are the infrastructure risks of single-vendor LLM lock-in in TaxTech? If your TaxTech vendor integrates deeply with one LLM provider, your compliance infrastructure inherits that provider’s pricing volatility, uptime performance, and geographic access restrictions. A vendor with multi-model routing can arbitrage across providers during outages or price spikes. A single-provider vendor cannot — and when that provider has a bad quarter, so does your compliance stack.
References & Further Reading
Hybrid SLM + LLM Orchestration: The 2026 Strategy for Cost-Effective Enterprise AI — Medium/GenAI Protos — Accessed June 2026
LLM Orchestration in 2026: Frameworks + Best Practices — orq.ai — Accessed June 2026
5 Enterprise AI Gateways for LLM Cost Control in 2026 — Maxim AI — Accessed June 2026
Routing, Load Balancing, and Failover in LLM Systems — DEV Community — Accessed June 2026
Avalara Agentic Tax & Compliance — Avalara — Accessed June 2026
AI Prices Are Going Up, Up, Up — Josh Bersin — May 2026
Gartner: LLM Inference Cost to Drop 90% by 2030 — Gartner — March 2026
Anrok Raises $55M Series C — Business Wire — October 2025
Agentic AI Cost Management: Stopping Margin Erosion — Kong Inc. — Accessed June 2026
The Future Cost of TaxTech: Managing the New Unit Economics of Compliance — dmihaylov.com — April 2026
Intelligent LLM Routing: How Multi-Model AI Cuts Costs by 85% — Swfte AI — Accessed June 2026
Failover Routing Strategies for LLMs in Enterprise AI Applications — Maxim AI — Accessed June 2026
AWS Introduces New Cybersecurity, Coding Tools at NY Summit — The Information (Catherine Perloff) — June 2026








