The Context
Many organizations sense that AI agents have potential, but struggle with where to begin. At Pantalytics we recently got the chance to make this concrete together with a logistics company. Not through a large implementation project, but through a few workshops. The goal was clear: explore how AI agents can add value on top of existing systems, without immediately overhauling everything.
In this blog we share what we did and, more importantly, what we learned together. Both from the client's perspective and from our own as a partner.
A Shared Mailbox as Starting Point
The trigger was recognizable. Within the organization, a large part of operational communication comes in through a shared mailbox. Think of questions like where is my order, what's the status of this truck, has damage been reported, or a request for status information. In the past, ticketing systems were considered, but they proved difficult to fully implement.
The question was not to implement a classic ticketing system after all, but to explore whether AI agents could handle this more intelligently. For example, by automatically classifying emails, enriching them with context and where possible already drafting a suggested response or next step.
The Approach
We set this up as a workshop with solid follow-up with the client's IT and data team. The approach was explicitly hands-on. No slides and grand visions, but building and testing together.
In the workshops we looked at how AI agents can be added on top of existing systems. We experimented with N8N as a low-code solution, Supabase as a lightweight database, Outlook for mailbox integration and evaluation tooling to measure output. On the AI side we worked with OpenAI, but explicitly noted that a model like Mistral would also have been possible.
The core was an agent that processes emails from the shared mailbox, extracts relevant information, structures and stores it, and generates an initial interpretation or action proposal. Actions run through labelling (assigning categories).
An important observation was that with low-code tooling you can get surprisingly far surprisingly fast. N8N made it possible to build a working end-to-end flow in a short time. For the team this was also valuable to understand how this type of tooling compares to other platforms like MuleSoft, Make, Zapier, Microsoft Power Automate or enterprise automation solutions.
At the same time it became clear where the friction lies. Authorization and security are not an afterthought. A workflow that gets access to a mailbox needs quite extensive permissions. Customer data is read, forwarded to an AI model and partly stored in a database. That means privacy, compliance and governance are immediately in play.
The client's existing Oracle database also made integration more complex than when working with a more modern cloud database. That's not an insurmountable problem, but it slows down experiments and requires more coordination.
Ultimately a proof of concept then needs to be adopted into the already chosen stack, and that comes with a significant amount of friction.
Challenges
Technically it was relatively straightforward to achieve 80 to 90 percent accuracy in classification and interpretation. The real challenge starts after that. How do you measure that accuracy structurally (answer: good evaluation)? How do you handle exceptions? How do you build feedback loops so the system learns from mistakes?
This is where prompt engineering, evaluation and monitoring come together. Prompts need to be refined, output needs to be assessed and edge cases need to be explicitly caught. This is not a one-time exercise, but an ongoing process. And that's precisely where the underestimation often lies.
One of the most important lessons for the client was that technology is only part of the story. Ultimately the knowledge needs to land within the organization itself. The IT team must be able to understand, adapt and maintain AI agents.
That's extra challenging when the organization works with enterprise platforms like Workato, MuleSoft, SAP or Power Automate. These environments are robust, but also more complex and slower to experiment in. The skill set to properly design AI agents within them is still scarce.
We deliberately kept the ERP system out of scope. At the same time it's clear that opportunities lie there in the longer term, for example toward helpdesk, sales, and CRM modules. But that requires a next step and more fundamental choices.
What We Take Away as a Partner
For us at Pantalytics this case confirmed a number of things. Small, well-scoped experiments already transfer a lot of knowledge to an IT and data team. Start with a concrete process where friction exists. Work together with the team that will eventually own it. And be honest about what's easy and what remains structurally difficult.
AI agents are not a plug-and-play solution. They require technical maturity, but above all organizational willingness to work differently.
We also learned that a proof of concept is very different from actual implementation. In user stories you always have the following format: As a <Role>... I want to... <Feature>, so I can <The why>. This last part, the why, is less relevant for a proof of concept. It's about learning, and learning a lot about many different cases is more important than getting something 100% operational.
What's Next
The client can now continue independently with this internally. The foundational knowledge is in place and the first building blocks are there. At the same time it's also clear that real value creation requires further steps. A clear strategy will need to be developed.
For organizations that truly want to reap the benefits of AI agents, some foundational systems will need to be re-evaluated. That's precisely where targeted experiments can help make those choices well-informed.
If you're also interested in a workshop to gain insight into what AI and modern ERP can mean for your organization, I can be reached at Rutger@Pantalytics.com