Wednesday, March 4, 2026

Aurimas Griciūnas on AI Groups and Dependable AI Techniques – O’Reilly

SwirlAI founder Aurimas Griciūnas helps tech professionals transition into AI roles and works with organizations to create AI technique and develop AI programs. Aurimas joins Ben to debate the adjustments he’s seen over the previous couple years with the rise of generative AI and the place we’re headed with brokers. Aurimas and Ben dive into a number of the variations between ML-focused workloads and people applied by AI engineers—significantly round LLMOps and agentic workflows—and discover a number of the issues animating agent programs and multi-agent programs. Alongside the best way, they share some recommendation for maintaining your expertise pipeline transferring and your abilities sharp. Right here’s a tip: Don’t dismiss junior engineers.

Concerning the Generative AI within the Actual World podcast: In 2023, ChatGPT put AI on everybody’s agenda. In 2026, the problem might be turning these agendas into actuality. In Generative AI within the Actual World, Ben Lorica interviews leaders who’re constructing with AI. Study from their expertise to assist put AI to work in your enterprise.

Take a look at different episodes of this podcast on the O’Reilly studying platform or comply with us on YouTube, Spotify, Apple, or wherever you get your podcasts.

Transcript

This transcript was created with the assistance of AI and has been frivolously edited for readability.

00.44
All proper. So in the present day for our first episode of this podcast in 2026, we’ve Aurimas Griciūnas of SwirlAI. And he was beforehand at Neptune.ai. Welcome to the podcast, Aurimas. 

01.02
Hello, Ben, and thanks for having me on the podcast. 

01.07
So really, I wish to begin with a little bit little bit of tradition earlier than we get into some technical issues. I seen now it looks like you’re again to educating individuals a number of the newest ML and AI stuff. In fact, earlier than the arrival of generative AI, the phrases we have been utilizing have been ML engineer, MLOps. . . Now it looks like it’s AI engineer and perhaps LLMOps. I’m assuming you employ this terminology in your educating and consulting as effectively.

So in your thoughts, Aurimas, what are a number of the largest distinctions between that transfer from ML engineer to AI engineer, from MLOps to LLMOps? What are two to 3 of the largest issues that individuals ought to perceive?

02.05
That’s an amazing query, and the reply will depend on the way you outline AI engineering. I feel how the general public in the present day outline it’s a self-discipline that builds programs on prime of already present giant language fashions, perhaps some fine-tuning, perhaps some tinkering with the fashions. However it’s not in regards to the mannequin coaching. It’s about constructing programs or programs on prime of the fashions that you have already got.

So the excellence is kind of huge as a result of we’re now not creating fashions. We’re reusing fashions that we have already got. And therefore the self-discipline itself turns into much more much like software program engineering than precise machine studying engineering. So we’re not coaching fashions. We’re constructing on prime of the fashions. However a number of the similarities stay as a result of each of the programs that we used to construct as machine studying engineers and now we construct as AI engineers are nondeterministic of their nature.

So some analysis and practices of how we might consider these programs stay. Typically, I might even go so far as to say that, there are extra variations than similarities in these two disciplines, and it’s actually, actually laborious to correctly distinguish three most important ones. Proper?

03.38
So I might say software program engineering, proper. . . 

03.42
So, I suppose, primarily based in your description there, the personas have modified as effectively.

So within the earlier incarnation, you had ML groups, knowledge science groups—they have been largely those chargeable for doing lots of the constructing of the fashions. Now, as you level out, at most individuals are doing a little kind of posttraining from fine-tuning. Possibly the extra superior groups are doing a little kind of RL, however that’s actually restricted, proper?

So the persona has modified. However however, at some degree, Aurimas, it’s nonetheless a mannequin, so then you definitely nonetheless want the information scientist to interpret a number of the metrics and the evals, right? In different phrases, for those who run with utterly simply “Right here’s a bunch of software program engineers; they’ll do every thing,” clearly you are able to do that, however is that one thing you suggest with out having any ML experience within the staff? 

04.51
Sure and no. A 12 months in the past or two years in the past, perhaps one and a half years in the past, I might say that machine studying engineers have been nonetheless the most effective match for AI engineering roles as a result of we have been used to coping with nondeterministic programs.

They knew consider one thing that the output of which is a probabilistic perform. So it’s extra of a mindset of working with these programs and the practices that come from really constructing machine studying programs beforehand. That’s very, very helpful for coping with these programs.

05.33
However these days, I feel already many individuals—many specialists, many software program engineers—have already tried to upskill on this nondeterminism and study rather a lot [about] how you’d consider these sorts of programs. And probably the most useful specialist these days, [the one who] can really, I might say, carry probably the most worth to the businesses constructing these sorts of programs is somebody who can really construct end-to-end, and so has all types of abilities, ranging from having the ability to determine what sort of merchandise to construct and really implementing some POC of that product, delivery it, exposing it to the customers and having the ability to react [to] the suggestions [from] the evals that they constructed out for the system. 

06.30
However the eval half could be discovered. Proper. So you must spend a while on it. However I wouldn’t say that you just want a devoted knowledge scientist or machine studying engineer particularly coping with evals anymore. Two years in the past, in all probability sure. 

06.48
So primarily based on what you’re seeing, individuals are starting to arrange accordingly. In different phrases, the popularity right here is that for those who’re going to construct a few of these trendy AI programs or agentic programs, it’s actually not in regards to the mannequin. It’s a programs and software program engineering downside. So due to this fact we want people who find themselves of that mindset. 

However however, it’s nonetheless knowledge. It’s nonetheless a data-oriented system, so that you would possibly nonetheless have pipelines, proper? Knowledge pipelines to knowledge groups that knowledge engineers usually keep. . . And there’s all the time been this lamentation even earlier than the rise of generative AI: “Hey, these knowledge pipelines maintained by knowledge engineers are nice, however they don’t have the identical software program engineering rigor that, you realize, the individuals constructing net functions are used to.” What’s your sense when it comes to the rigor that these groups are bringing to the desk when it comes to software program engineering practices? 

08.09
It will depend on who’s constructing the system. AI engineers [comprise an] extraordinarily wide selection. An engineer could be an AI engineer. A software program engineer might be an AI engineer, and a machine studying engineer could be an AI engineer. . .

08.31 
Let me rephrase that, Aurimas. In your thoughts, [on] the most effective groups, what’s the everyday staffing sample? 

08.39
It will depend on the dimensions of the undertaking. If it’s only a undertaking that’s beginning out, then I might say a full stack engineer can rapidly really begin off a undertaking, construct A, B, or C, and proceed increasing it. After which. . .

08.59
Primarily counting on some kind of API endpoint for the mannequin?

09.04
Not essentially. So it may be a Relaxation API-based system. It may be a stream processing-based system. It may be only a CLI script. I might by no means encourage [anyone] to construct a system which is extra complicated than it must be, as a result of fairly often when you will have an concept, simply to show that it really works, it’s sufficient to construct out, you realize, an Excel spreadsheet with a column of inputs and outputs after which simply give the outputs to the stakeholder and see if it’s helpful.

So it’s not all the time wanted to start out with a Relaxation API. However usually, relating to who ought to begin it off, I feel it’s people who find themselves very generalist. As a result of on the very starting, it’s good to perceive finish to finish—from product to software program engineering to sustaining these programs.

10.01
However as soon as this method evolves in complexity, then very seemingly the subsequent individual you’d be bringing on—once more, relying on the product—very seemingly can be somebody who is sweet at knowledge engineering. As a result of as you talked about earlier than, a lot of the programs are counting on a really excessive, very sturdy integration of those already present knowledge programs [that] you’re constructing for an enterprise, for instance. And that’s a tough factor to do proper. And the information engineers do it fairly [well]. So undoubtedly a really helpful individual to have within the staff. 

10.43
And perhaps finally, as soon as these evals come into play, relying on the complexity of the product, the staff would possibly profit from having an ML engineer or knowledge scientist in between. However then that is extra sort of concentrating on these instances the place the product is complicated sufficient that you just really want some allowances for judges, after which it’s good to consider these LLMs as judges in order that your evals are evaluated as effectively.

In the event you simply want some easy evals—as a result of a few of them could be precise assertion-based evals—these can simply be performed, I feel, by somebody who doesn’t have previous machine studying expertise.

11.36
One other cultural query I’ve is the next. I might say two years in the past, 18 months in the past, most of those AI initiatives have been carried out. . . Principally, it was a little bit extra decentralized, in different phrases. So right here’s a gaggle right here. They’re going to do one thing. They’re going to construct one thing on their very own after which perhaps attempt to deploy that. 

However now just lately I’m listening to, Aurimas, and I don’t know if you’re listening to the identical factor, that, not less than in a few of these huge corporations, they’re beginning to have way more of a centralized staff that may assist different groups.

So in different phrases, there’s a centralized staff that one way or the other has the best expertise and has constructed a number of of this stuff. After which now they will sort of consolidate all these learnings after which assist different groups. If I’m in one among these organizations, then I method these specialists. . . I suppose within the outdated, outdated days—I hate this time period—they’d use some middle of excellence sort of factor. So you’re going to get some kind of playbook and they’ll make it easier to get going. Form of like in your earlier incarnation at Neptune.ai. . . It’s virtually such as you had this centralized software and experiment tracker the place somebody can go in and study what others are doing after which study from one another.

Is that this one thing that you just’re listening to that individuals are going for extra of this sort of centralized method? 

13.31
I do hear about these sorts of conditions, however naturally, it’s all the time a giant enterprise that’s managed to tug that off. And I consider that’s the best method as a result of that’s additionally what we’ve been doing earlier than GenAI. We had these facilities of excellence. . . 

13.52
I suppose for our viewers, clarify why you assume that is the best method. 

13.58
So, two issues why I feel it’s the proper method. The very first thing is that we used to have these platform groups that might construct out a shared pool of software program that may be reused by different groups. So we sort of outlined the requirements of how these programs ought to be operated, and the manufacturing and the event. And they might determine what sort of applied sciences and tech stack ought to be used throughout the firm. So I feel it’s a good suggestion to not unfold too extensively within the instruments that you just’re utilizing. 

Additionally, have template repositories that you would be able to simply pool and reuse. As a result of then not solely is it simpler to kick off and begin your construct out of the undertaking, however it additionally helps management how effectively this data can really be centralized, as a result of. . .

14.59
And likewise there’s safety, then there’s governance as effectively. . . 

15.03
For instance, sure. The platform aspect is a type of—simply use the identical stack and assist others construct it simpler and sooner. And the second piece is that clearly GenAI programs are nonetheless very younger. So [it’s] very early and we actually do not need, as some would say, sufficient reps in constructing these sorts of programs.

So we study as we go. With common machine studying, we already had every thing discovered. We simply wanted some apply. Now, if we study on this distributed approach after which we don’t centralize learnings, we endure. So principally, that’s why you’d have a central staff that holds the data. However then it ought to, you realize, assist different groups implement some new kind of system after which carry these learnings again into the central core after which unfold these learnings again to different groups.

However that is additionally how we used to function in these platform groups within the outdated days, three years, 4 years in the past. 

16.12
Proper, proper, proper, proper, proper, proper, proper. However then, I suppose, what occurred with the discharge of generative AI is that the platform groups may need moved too sluggish for the rank and file. And so therefore you began listening to about what they name shadow AI, the place individuals would use instruments that weren’t precisely blessed by the platform staff. However now I feel the platform groups are beginning to arrest a few of that. 

16.42
I’m wondering whether it is platform groups who’re sort of catching up, or is it the instruments that [are] maturing and the practices which are maturing? I feel we’re getting increasingly reps in constructing these programs, and now it’s simpler to meet up with every thing that’s happening. I might even go so far as to say it was not possible to be on prime of it, and perhaps it wouldn’t even make sense to have a central staff.

17.10
Numerous these demos look spectacular—generative AI demos, brokers—however they fail whenever you deploy them within the wild. So in your thoughts, what’s the single largest hurdle or the most typical motive why lots of these demos or POCs fall quick or turn out to be unreliable in manufacturing? 

17.39
That once more, will depend on the place we’re deploying the system. However one of many most important causes is that it is rather straightforward to construct a POC, after which it targets a really particular and slender set of real-world situations. And we sort of consider that it solves [more than it does]. It simply doesn’t generalize effectively to different forms of situations. And that’s the largest downside.

18.07
In fact there are safety points and all types of stability points, even with the largest labs and the largest suppliers of LLMs, as a result of these APIs are additionally not all the time steady, and it’s good to maintain that. However that’s an operational situation. I feel the largest situation shouldn’t be operational. It’s really evaluation-based, and generally even use case-based: Possibly the use case shouldn’t be the right one. 

18.36
You already know, earlier than the arrival of generative AI, ML groups and knowledge groups have been simply beginning to get happening observability. After which clearly AI generative AI comes into the image. So what adjustments so far as LLMs and generative AI relating to observability? 

19.00
I wouldn’t even name observability of normal machine studying programs and [of] AI programs the identical factor.

Going again to a earlier parallel, generative AI observability is much more much like common software program observability. It’s all about tracing your software after which on prime of these traces that you just accumulate in the identical approach as you’d accumulate from the common software program software, you add some extra metadata in order that it’s helpful for performing analysis actions in your agent AI kind of system.

So I might even distinction machine studying observability with GenAI observability as a result of I feel these are two separate issues.

19.56
Particularly relating to brokers and the brokers that contain some kind of software use, then you definitely’re actually entering into sort of software program traces and software program observability at that time. 

20.13
Precisely. Software use is only a perform name. A perform name is only a appreciable software program span, let’s say. Now what’s essential for GenAI is that you just additionally know why that software was chosen for use. And that’s the place you hint outputs of your LLMs. And you realize why that LLM name, that technology, has determined to make use of this and never the opposite software.

So issues like prompts, token counts, and the way a lot time to first token it took for which technology, these sorts of issues are what’s extra to be traced in comparison with common, software program tracing. 

20.58
After which, clearly, there’s additionally. . . I suppose one of many most important adjustments in all probability this 12 months might be multimodality, if there’s several types of modes and knowledge concerned.

21.17
Proper. For some motive I didn’t contact upon that, however you’re proper. There’s lots of distinction right here as a result of inputs and outputs, it’s laborious. Initially, it’s laborious to hint these sorts of issues like, let’s say, audio enter and output [or] video pictures. However I feel [an] even tougher sort of downside with that is how do you make it possible for the information that you just hint is beneficial?

As a result of these observability programs which are being constructed out, like LangSmith, Langfuse, and all of others, you realize, how do you make it in order that it’s handy to truly have a look at the information that you just hint, which isn’t textual content and never common software program spans? How [do] you construct, [or] even correlate, two completely different audio inputs to one another? How do you do this? I don’t assume that downside is solved but. And I don’t even assume that we all know what we wish to see relating to evaluating this sort of knowledge subsequent to one another. 

22.30
So let’s speak about brokers. A good friend of mine really requested me yesterday, “So, Ben, are brokers actual, particularly on the buyer aspect?” And my good friend was saying he doesn’t assume it’s actual. So I mentioned, really, it’s extra actual than individuals assume within the following sense: Initially, deep analysis, that’s brokers. 

After which secondly, individuals is likely to be utilizing functions that contain brokers, however they don’t comprehend it. So, for instance, they’re interacting with the system and that system entails some kind of knowledge pipeline that was written and is being monitored and maintained by an agent. Positive, the precise software shouldn’t be an agent. However beneath there’s brokers concerned within the software

So to that extent, I feel brokers are undoubtedly actual within the knowledge engineering and software program engineering house. However I feel there is likely to be extra client apps that beneath there’s some brokers concerned that customers don’t find out about. What’s your sense? 

23.41
Fairly comparable. I don’t assume there are actual, full-fledged brokers which are uncovered. 

23.44
I feel individuals when individuals consider brokers, they consider it as like they’re interacting with the agent immediately. And that will not be the case but. 

24.04
Proper. So then, it will depend on the way you outline the agent. Is it a completely autonomous agent? What’s an agent to you? So, GenAI usually may be very helpful on many events. It doesn’t essentially should be a tool-using self-autonomous agent.

24.21
So like I mentioned, the canonical instance for shoppers can be deep analysis. These are brokers.

24.27
These are brokers, that’s for certain. 

24.30
In the event you consider that instance, it’s a bunch of brokers looking throughout completely different knowledge collections, after which perhaps a central agent unifying and presenting it to the person in a coherent approach.

So from that perspective, there in all probability are brokers powering client apps. However they will not be the precise interface of the buyer app. So the precise interface would possibly nonetheless be rule-based or one thing. 

25.07
True. Like knowledge processing. Some automation is going on within the background. And a deep analysis agent, that is uncovered to the person. Now that’s comparatively straightforward to construct since you don’t must very strongly consider this sort of system. Since you count on the person to finally consider the outcomes. 

25.39
Or within the case of Google, you may current each: They’ve the AI abstract, after which they nonetheless have the search outcomes. After which primarily based on the person alerts of what the person is definitely consuming, then they will proceed to enhance their deep analysis agent. 

25.59
So let’s say the disasters that may occur from flawed outcomes weren’t that dangerous. Proper? So. 

26.06
Oh, no, it may be dangerous for those who deploy it contained in the enterprise, and also you’re utilizing it to arrange your CFO for some earnings name, proper?

26.17
True, true. However then you realize whose duty is it? The agent’s, that supplied 100%…? 

26.24
You possibly can argue that’s nonetheless an agent, however then the finance staff will take these outcomes and scrutinize [them] and ensure they’re right. However an agent ready the preliminary model. 

26.39
Precisely, precisely. So it nonetheless wants evaluation.

26.42
Yeah. So the rationale I carry up brokers is, do brokers change something out of your perspective when it comes to eval, observability, and the rest? 

26.55
They do some bit, in comparison with agent workflows that aren’t, full brokers, the one change that actually occurs. . . And we’re speaking now about multi-agent programs, the place a number of brokers could be chained or looped in collectively. So actually the one distinction there’s that the size of the hint shouldn’t be deterministic. And the quantity of spans shouldn’t be deterministic. So within the sense of observability itself, the distinction is minimal so long as these brokers and multi-agent programs are operating in a single runtime.

27.44
Now, relating to evals and analysis, it’s completely different since you consider completely different facets of the system. You attempt to uncover completely different patterns of failures. For instance, for those who’re simply operating your agent workflow, then you realize what sort of steps could be taken, and then you definitely could be virtually 100% certain that your entire path out of your preliminary intent to the ultimate reply is accomplished. 

Now with agent programs and multi-agent programs, you may nonetheless obtain, let’s say, input-output. However then what occurs within the center shouldn’t be a black field, however it is rather nondeterministic. Your brokers can begin looping the identical questions between one another. So it’s good to additionally search for failure alerts that aren’t current in agentic workflows, like too many back-and-forth [responses] between the brokers, which wouldn’t occur in a daily agentic workflow.

Additionally, for software use and planning, it’s good to determine if the instruments are being executed within the right order. And comparable issues. 

29.09
And that’s why I feel in that state of affairs, you undoubtedly want to gather fine-grained traces, as a result of there’s additionally the communication between the brokers. One agent is likely to be mendacity to a different agent in regards to the standing of completion and so forth and so forth. So it’s good to actually sort of have granular degree traces at that time. Proper? 

29.37
I might even say that you just all the time must have written the lower-level items, even for those who’re operating a easy RAG system, which you’ll study by the technology system, you continue to want these granular traces for every of the actions.

29.52
However undoubtedly, interagent communication introduces extra factors of failure that you really want to just remember to additionally seize. 

So in closing, I suppose, this can be a fast-moving subject, proper? So there’s the problem for you, the person, in your skilled growth. However then there’s additionally the problem for you as an AI staff in how you retain up. So any ideas at each the person degree and on the staff degree, apart from going to SwirlAI and taking programs? [laughs] What different sensible ideas would you give a person within the staff? 

30.47
So for people, for certain, study fundamentals. Don’t depend on frameworks alone. Perceive how every thing is basically working underneath the hood; perceive how these programs are literally linked.

Simply take into consideration how these prompts and context [are] really glued collectively and handed from an agent to an agent. Don’t assume that it is possible for you to to only mount a framework proper on prime of your system, write [a] few prompts, and every thing will magically work. You could perceive how the system works from the primary rules.

So yeah. Go deep. That’s for particular person practitioners. 

31.32
Relating to groups, effectively, that’s an excellent query and a really laborious query. As a result of, you realize, within the upcoming one or two years, every thing can change a lot. 

31.44
After which one of many challenges, Aurimas, for instance, within the knowledge engineering house. . . It was once, a number of years in the past, I’ve a brand new knowledge engineer within the staff. I’ve them construct some primary pipelines. Then they get assured, [and] then they construct extra complicated pipelines and so forth and so forth. After which that’s the way you get them in control and get them extra expertise.

However the problem now’s lots of these primary pipelines could be constructed with brokers, and so there’s some quantity of entry-level work that was once the place the place you may practice your entry-level individuals. These are disappearing, which additionally impacts your expertise pipeline. In the event you don’t have individuals at the start, then you definitely gained’t have skilled individuals in a while.

So any ideas for groups and the problem of the pipeline for expertise?

32.56
That’s such a tough query. I wish to say, don’t dismiss junior engineers. Practice them. . .

33.09
Oh, I yeah, I agree utterly. I agree utterly.

33.14
However that’s a tough choice to make, proper? As a result of it’s good to be fascinated by the longer term.

33.26
I feel, Aurimas, the mindset individuals need to [have is to] say, okay, so the normal coaching grounds we had, on this instance of the information engineer, have been these primary pipelines. These are gone. Nicely, then we discover a completely different approach for them to enter. It is likely to be they begin managing some brokers as a substitute of constructing pipelines from scratch. 

33.56
We’ll see. We’ll see. However we don’t know. 

33.58
Yeah. Yeah. We don’t know. The brokers even within the knowledge engineering house are nonetheless human-in-the-loop. So in different phrases a human nonetheless wants to watch [them] and ensure they’re working. In order that might be the entry-level for junior knowledge engineers. Proper? 

34.13
Proper. However you realize that’s the laborious half about this query. Then reply is, that might be, however we have no idea, and for now perhaps it doesn’t make sense. . .

34.28
My level is that for those who cease hiring these juniors, I feel that’s going to harm you down the street. So that you simply employed a junior and employed the junior after which stick them in a distinct monitor, after which, as you say, issues would possibly change, however then they will adapt. In the event you rent the best individuals, they are going to be capable to adapt. 

34.50
I agree, I agree, however then, there are additionally people who find themselves probably not proper for that function, let’s say, and you realize, what I. . . 

35.00
However that’s true even whenever you employed them and also you assigned them to construct pipelines. So similar factor, proper? 

35.08
The identical factor. However the factor I see with the juniors and fewer senior people who find themselves at present constructing is that we’re relying an excessive amount of on vibe coding. I might additionally recommend searching for some methods on onboard somebody new and make it possible for the individual really learns the craft and never simply is available in and vibe codes his or her approach round, making extra points for senior engineers then really helps. 

35.50
Yeah, this can be a huge subject, however one of many challenges, all I can say is that, you realize, the AI instruments are getting higher at coding at some degree as a result of the individuals constructing these fashions are utilizing reinforcement studying and the sign in reinforcement studying is “Does the code run?” So then what individuals are ending up with now with this newer technology of those fashions is [that] they vibe code and they’ll get code that runs as a result of that’s what the reinforcement studying is optimizing for.

However that doesn’t imply that that code doesn’t introduce correct to the best. However on the face of it, it’s operating, proper? An skilled individual clearly can in all probability deal with that. 

However anyway, so final phrase, you get the final phrase, however take us on a optimistic notice. 

36.53
[laughs] I do consider that the longer term is brilliant. It’s not grim, not darkish. I’m very enthusiastic about what is going on within the AI house. I do consider that it’ll not be as quick. . . All this AGI and AI taking up human jobs, it won’t occur as quick as everyone seems to be saying. So that you shouldn’t be frightened about that, particularly relating to enterprises. 

I consider that we already had [very powerful] expertise one or one and a half years in the past. [But] for enterprises to even make the most of that sort of expertise, which we already had one and a half years in the past, will nonetheless take one other 5 years or so to totally really get probably the most out of it. So there might be sufficient work and jobs for not less than the upcoming 10 years. And I feel, individuals shouldn’t be frightened an excessive amount of about it.

38.06
However usually, finally, even those who will lose their jobs will in all probability respecialize in that lengthy time period to some extra useful function. 

38.18
I suppose I’ll shut with the next recommendation: The primary factor that you are able to do is simply preserve utilizing these instruments and continue to learn. I feel the excellence might be more and more between those that know use these instruments effectively and people who don’t.

And with that, thanks, Aurimas.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles