Companies are deploying “AI agents” at full speed, often without clearly distinguishing what they are actually putting into production. Is it an LLM generating text? An orchestrated workflow? An autonomous agent making decisions on the fly? Salesforce has published some thought-provoking numbers: 58% success rate for supervised agents, compared to 35% for “free” agents. The gap is significant.
A recent Hacker News thread exploded: “LLMs are not suited for logic-based decisions”. The debate highlighted a tension that many tech leads feel but rarely express:
LLMs are extraordinary at reasoning, synthesizing, and rephrasing. But when it comes to guaranteeing a deterministic decision that complies with business rules, they become catastrophically unpredictable.
This article will break down why delegating business decisions to an LLM is an architectural error—not a matter of model performance. We’ll talk about security, auditability, AI Act compliance, and—most importantly—how serious architectures really operate.
LLM, Workflow, Agent: Do You Really Know What You’re Deploying?
The Bare LLM: A Probabilistic Engine, Not a Decision System
An LLM is a token prediction engine. It generates the most probable next sequence given some context. Nothing more.
It doesn’t “understand” your business rules, it imitates them. When you ask, “Should I approve this €847 refund?”, it generates a plausible answer based on statistical patterns—not by executing deterministic logic.
Imagine a brilliant intern without industry experience. They’ll often answer correctly, sometimes impressively. But they might also make up a rule out of nowhere. That’s exactly what an LLM does.
AI Workflow: A Predefined Path, but the Human Stays the Decision-Maker
An AI workflow is a controlled sequence where the LLM is used at specific stages: information extraction, classification, rephrasing. But decisions are made by coded business rules or by a human. The LLM understands and suggests; another component decides.
Think of an Airbus autopilot. It executes predefined procedures with precision. But it never invents a new landing maneuver. The rules are hard-coded into the system—not generated on the fly.
The Autonomous Agent: The LLM Takes Over Decision-Making (And This Is Where It Gets Complicated)
An autonomous agent is an LLM that loops on itself, plans, uses tools, and makes chained decisions.
It can call an API, write to a database, trigger payments—all without supervision at every step.
It’s powerful for exploration and prototyping. But in production on critical processes? The data speaks for itself: 65% of enterprise AI failures stem from “context drift”: the model gradually strays from the original intention over the course of iterations.
| Architecture | Decision-Maker | Capabilities | Risk Level |
|---|---|---|---|
| Bare LLM | Human | Generation, synthesis, rephrasing | Low (if output not executed) |
| AI workflow | Business rules / Human | Controlled orchestration, LLM as support | Moderate (manageable) |
| Autonomous agent | LLM | Planning, tools, recurring actions | High (non-deterministic) |
Why An LLM Should Not Execute Your Business Logic
Non-Determinism: Same Input, Different Outputs
Ask the same question to an LLM ten times. You’ll get variations—sometimes subtle, sometimes significant.
For text generation, that’s fine. But for a compliance rule stating “if amount > €10,000 AND new customer THEN mandatory escalation”? That’s unacceptable.
A business rule must yield the same result every time it’s run. An LLM generates the most probable result—that’s not the same thing.
CrossData proved this by comparing deterministic SQL pipelines with LLM agents on validation tasks: as the number of steps grew, so did the agent’s error rate.
Logical Hallucinations: The LLM Generates a “Plausible” But Incorrect Rule
We often talk about factual hallucinations (the LLM invents a source). But logical hallucinations are even sneakier.
An LLM generates a rule that looks like a business rule, uses the right vocabulary, follows a coherent structure—but this rule doesn’t actually exist in your framework.
The likelihood of error rises with the number of steps: each reasoning iteration adds to the probability of a hallucination, creating a snowball effect on long decision chains.
Back to the banking analogy: an LLM might “understand” that international transfers require additional checks. But it may generate a validation rule that omits the OFAC check if that pattern was less common in its training data. A rules engine, on the other hand, executes exactly what is coded—no more, no less.
Business Logic Abuse: The Leading Attack Vector by 2026
BrightSec and the OWASP 2026 reports identify business logic abuse as the emerging threat for AI agent systems. The principle: manipulate the LLM so it bypasses business safeguards without technically “breaking” the system.
- Approval skipping: a malicious prompt convinces the agent that a validation step “has already been done”
- Access circumvention: the LLM generates a justification that bypasses an authorization check
- Premature execution: manipulating the sequence to trigger an action before verification steps
The system sees a “legitimate” action: the format is correct, the right tools are used. But the business intent is violated.
And this is almost impossible to detect using standard technical controls.
Lack of Auditability: No Replay, No Certification
The AI Act requires traceability for automated decisions in high-risk systems. Simple question: Can you replay the exact reasoning behind a decision made three months ago?
With a rules engine: yes, easily. The rule is versioned, inputs are logged, the result is deterministic.
With an LLM agent: no. Even given the same inputs and prompt, the model might produce a different reasoning. Worse: if the model was updated in the meantime, you can’t even reconstruct the exact conditions.
If you can’t prove why your system made a specific decision, you can’t certify its compliance. Autonomous agents without granular logs and without a deterministic decision layer structurally fail auditability requirements.
What Serious Architectures Actually Do
Least Privilege for LLMs: Scope the Context, Restrict the Tools
An LLM does not need access to your entire database to answer a support query. Robust architectures apply the principle of least privilege: minimal context, restricted tools, granular permissions.
Every tool exposed to an agent must have runtime checks. Not just “the agent can call the payment API,” but: “the agent can initiate payments of less than €500 to pre-validated accounts, with human confirmation required for more.”
LLM as Reasoner, Rules Engine as Decider
The split between understanding versus execution is the winning pattern. LLMs excel at extracting unstructured data, classifying intent, and generating recommendations. But final decisions should be made by a deterministic rules engine.
CrossData uses this approach: the LLM parses and interprets incoming data, generates an SQL query, then a deterministic pipeline executes the business logic.
If you’re working in the field of AI governance, this separation is your best friend.
Human-in-the-Loop Intelligence: Only on High-Cost Error Nodes
Human-in-the-loop does not mean validation at every step. That’s not scalable. The smart approach: identify nodes with a high cost of error (financial decisions, sensitive data access, irreversible actions) and focus human supervision there.
The best production agents don’t decide alone—they escalate intelligently. They ask questions rather than guess. It’s counterintuitive, but it works.
Controlled Pipelines vs. Autonomous Agents: When to Choose Which?
- Controlled pipeline: repetitive tasks, stable rules, need for auditability. Customer support, document extraction, classification.
- Autonomous agent (supervised): exploration, research, tasks where unpredictability is tolerable. Never for critical processes without safeguards.
The simple rule: if you can draw a flowchart of the logic, use a pipeline. If the “how-to” depends on context and can’t be predetermined, a supervised agent can help—but final decisions should remain in a layer you control.
The 3 Questions Before You Let an LLM Make a Decision
Decision-making checklist:
1. Is the environment stable or changing?
Fixed, documented business rules → rules engine. Variable context that requires interpretation → LLM as support, decision made elsewhere.2. What is the cost of an error?
Low-impact, reversible errors → some tolerance for non-determinism. High-cost or irreversible errors (payments, compliance, security) → deterministic decision required.3. Does the decision need to be auditable or certifiable?
Yes → the LLM cannot be the final decision-maker. You need a replayable, versioned, deterministic record.
These three questions screen out 90% of use cases. If your answers are “stable,” “costly,” and “auditable,” the LLM has no place in the decision layer.
Conclusion
LLMs are extraordinary tools for understanding, synthesizing, and recommending. But they are not decision systems.
Confusing the two exposes you to security risks (business logic abuse), compliance risks (AI Act), and hidden costs (uncontrolled inference loops).
Good architecture draws a clear line: the LLM reasons and suggests, a rules engine or human decides and executes.
This isn’t a technical limitation—it’s a matter of responsible design. Use AI for what it does best; keep decisions in a layer you control.
FAQ
Can an LLM still be used for decisions if we turn the temperature down to 0?
Even with temperature at 0, an LLM remains non-deterministic. Variations can result from batching, model updates, or processing order. For critical decisions, only a rules engine guarantees true reproducibility.
What’s the difference between a factual hallucination and a logical hallucination?
A factual hallucination invents information (false source, wrong date). A logical hallucination generates a rule or reasoning that appears correct but doesn’t match your actual business logic. The latter is more dangerous—much harder to spot.
Does fine-tuning solve the problem of non-determinism?
No. Fine-tuning improves answer relevance for a given domain, but it won’t turn a probabilistic model into a deterministic system. The core nature of an LLM remains unchanged.
How do you implement “least privilege” for an LLM agent?
Limit the context given to only what’s strictly needed, restrict accessible tools with granular permissions, add runtime checks on every action, and implement quotas (number of calls, maximum amounts).
What is “context drift” as mentioned in the article?
Context drift occurs when, over multiple reasoning iterations, an agent strays farther from the original intent. Each step can introduce tiny deviations that accumulate, leading to outcomes far from the goal.
Does the AI Act apply to every system using LLMs?
The AI Act classifies systems by risk level. Strict traceability obligations apply to “high-risk” systems (HR, finance, healthcare, justice). However, even for others, documenting your automated decisions is always a sound practice.
Is it possible to use an LLM in production for critical tasks with systematic human-in-the-loop?
Yes, that’s the recommended approach. The LLM analyzes, synthesizes, and proposes. The human validates and decides. Focus supervision on high-cost-of-error nodes for scalability.
How can you detect business logic abuse attempts on an agent?
Monitor for unusual patterns: actions out of sequence, overly elaborate justifications that bypass a rule, repeated attempts after denial. Specialized monitoring tools are beginning to emerge for this specific threat type.
Are controlled pipelines really less expensive than autonomous agents?
Yes, often 5 to 20 times cheaper in inference costs. An agent in a loop consumes tokens with every reasoning iteration. A linear pipeline requires a fixed, predictable number of calls.
Are there frameworks for implementing LLM/rules engine separation?
Several approaches are emerging: using the LLM to generate SQL queries or structured API calls, coupling with rules engines like Drools or workflow systems like Temporal. The key is always the same: the LLM transforms, the deterministic system executes.
Related Articles
Reddit blocks AI scraping: what it means for LLMs and open source
On March 25, 2026, Reddit sent shockwaves through the AI community: the platform is shutting its doors to automated scrapers, requiring biometric verification for suspicious accounts, and removing 100,000 bot…
Claude Mythos: what the Capybara leak reveals about Anthropic’s next model
On March 26, 2026, two cybersecurity researchers stumbled across something Anthropic never meant to show: roughly 3,000 internal assets exposed publicly on the company’s blog, including draft posts revealing the…