- Samy Rahim's Newletter
- Posts
- How to become an AI Agents Developer?
How to become an AI Agents Developer?
How can I learn to build AI Agents? I've been getting this question a lot in the past few months, so I built this guide to help you navigate your journey. This roadmap is pretty comprehensive, so you can always refer back to it to guide you along the way.
While this is only a theoretical roadmap, you can find videos explaining how to implement anything I talk about here just by searching for the keyword + the language that you use.
Prerequisites
To follow this guide, I assume you already have some programming knowledge either with Python or JavaScript, or you're willing to learn a no-code tool that's used in AI agents development like n8n/Flowise/Langflow.
A. Beginner: LLM Chains
Step 1: Get an LLM API Key
Unfortunately, LLMs aren't free. If you want to get into this path, you need to get your hands on an API provider. Of course, most LLM providers have a paid option, but assuming you're just starting out, you probably need a free way to use an LLM.
The most generous free offer is Microsoft for Startups, offering $1k in Azure AI credits. But it takes a bit to get approved. So an alternative to get started right now would be to use Cloudflare AI (Llama) or Google AI (Gemini) while you're waiting for your Azure plan.
Step 2: An LLM Call
This is the base of all LLM apps - an LLM API call. You can get started by chatting with the LLM or building a simple question-answer chatbot.
Step 3: Customizing the LLM
The next phase would be to use code to inject instructions or variables inside the LLM prompt. This will allow you to create agentic apps and workflows. You can also use the System prompt to customize the LLM for your needs.
Step 4: JSON Mode (Structured Outputs)
The goal here is to be able to extract data from the LLM response as variables to use in your apps. You can use these variables to perform logical operations with booleans, gather info with strings, etc.
Project: AI Helper App
Now you can start to build many AI apps that can help you with various tasks. As an example, the first AI app that I built was a Chrome extension that would block you from entering websites that were irrelevant to your todo list. You would explain to the LLM what to do in the system prompt, give it the website content and the todo list as user input, and make it return a boolean whether to block the website or not.
B. Intermediate: AI Agents
In this phase, we'll transition from static LLM apps (also called chains in the LangChain ecosystem) that need to be specifically called, to more autonomous "agents" that are less rigid and have some freedom in what actions to perform.
Step 1: A Chatbot
You might have noticed that the LLM can't remember what we said to it before. The next step would be to allow the LLM to have memory of the conversation to build a ChatGPT-like chatbot. The key is to allow it to read past messages with the user prompt. So instead of providing just one prompt, you'll provide a list that contains the history of the prompts + the last one.
Step 2: Choose an LLM Framework
Although technically you can do everything in this roadmap without a framework, using one will help you get rid of boilerplate code and repeated stuff and allow you to focus on the high-level design and functions of your LLM Agents.
The most popular base LLM frameworks are LangChain and OpenAI Assistants API. Both are available in Python, JavaScript, and no-code tools such as n8n. If you're using code, I recommend using Assistants API as its syntax is a little cleaner. If you're not using OpenAI or you're using a no-code tool, then use LangChain.
Your choice won't matter a lot because in the long run you'll probably have to learn both when working on different projects.
Your goal here is to recreate the LLM apps that we previously created in part A but with this new framework.
Step 3: RAG
Retrieval Augmented Generation (RAG) allows the LLM to have knowledge that's outside its training data and too big to fit in its context window (prompt). This can be enterprise-specific knowledge, laws in a particular country, a CSV containing product info for a retail shop, or a self-help book. This would allow you to create specialized GPTs like you see in many products like AI lawyer, AI doctor, etc.
Basic RAG needs 2 things: embedding and a vector store. If you're using Assistants API, this is pretty easy to do, but it's a little trickier if using LangChain. In LangChain, you need to care about stuff like embeddings model, the vector store provider, document loaders... but this will help you get a fundamental understanding of how RAG works under the hood.
Step 4: Function Calling (Tool Calling)
This will allow the LLM to take actions on its own. It can call a function that will be run, and the LLM will be informed of the response of that function.
The first use case is similar to structured outputs in that it allows you to extract data from the LLM and then do something with this data, such as extracting the name of a lead and storing it in a database. The difference is that in Tool Calling, the agent chooses when to call the tool and not just what to put in the tool. While in structured outputs, it's always called. So with function calling, the function would only be called when the lead provided their name. But in structured outputs, the LLM will always try to return a name, but it would be empty until the lead provides their name.
The other use case of function calling is real-time and structured RAG. So instead of dumping a large dataset as knowledge base, you can use functions to allow the LLM to get very specific parts of the knowledge base, fetch an API to get the most relevant data, or use Language to SQL query functions.
Project: Specialized Conversational Agent
This would be the project that encompasses everything we learned in this part. It's a chatbot that has specialized information about a task with a well-crafted prompt, the ability to do stuff using function calling, and relevant data using RAG or real-time RAG with function calling.
There are many examples of chatbots that look like this; most of them fall in line with automating a customer-facing job like sales rep, customer support, Airbnb host assistant, receptionist, etc.
After this step, you'll be pretty confident in your AI skills, and you can probably navigate the rest on your own or choose a different path altogether. The advanced section might not be relevant to you then, but I've covered the most popular paths to take next in part C.
C. Advanced: Iterative and Multi-Agent Systems
As your agents grow more complex, you might need to split the workflow into multiple specialized agents. As LLMs have limited context windows and cognitive capacity, it will be hard for a single agent to handle a large set of tools. So we can split each related set of tools with their related part of the prompt into separate specialized agents.
This is the part that I'm hesitant about outlining because I consider myself still learning this part currently. So I appreciate feedback if you have any.
Step 1: Supervisor-Workers Architecture
This is the most common multi-agent approach. It's where a main agent (usually called supervisor/orchestrator or manager) will handle the input/output such as talking to users, and then routes to different specialized agents depending on the task at hand.
For example, a chatbot agent for a retail store might have a supervisor agent that talks to clients and then either routes the client's inquiry to the customer support agent or the product recommendation agent.
This is the most natural approach that things evolve to, and you might implement it yourself without knowing. A common way to do this is to make the tools that an agent calls within themselves agents. So the supervisor agent will call tools that will be specialized agents, and these agents will return some outputs to the supervisor so it can return them to the users.
This approach might evolve to where a specialized agent will act more as a supervisor agent for even more specialized agents. This is called hierarchical architecture.
Step 2: Advanced/Graph Agents
While the supervisor agent is the simplest and most common architecture, it's not the most reliable one for production-ready apps. The supervisor agent architecture works well in a chatbot setting because the role of the supervisor is pretty simple. But in more complex workflows where there is a predefined flow that the agent needs to follow that has routes and loops, it will be necessary to declare the routing logic ourselves.
We can do this by creating custom routing logic using function calling or structured outputs, if/else statements, for and while loops.
Step 3: Learn an Advanced Framework
While you can build advanced agents yourself like we stated in the steps above, a prettier solution might be to use more dedicated frameworks for this sort of stuff. There are many agent frameworks out there such as CrewAI, AutoGen, and OpenAI Swarm. But most of them are pretty basic and usually implement an architecture called the Network architecture, which is not suited for production-ready apps. So the framework that I recommend is the LangGraph framework. It is the most low-level framework of the bunch and allows you to create the most customizable agents.
The other good thing about LangGraph is that it supports both JavaScript and Python and can be used in no-code with FlowiseAI.
Step 4: Human in the Loop
As you may notice while building AI agents, these agents are not always reliable. In fact, they're reliably unreliable. So unless you're building a basic agent, it's always better to have a human check the outputs of your agents. That won't be as tedious as the human doing all the stuff manually and won't result in a decrease in effectiveness like fully autonomous AI agents.
You can either implement the human in the loop logic yourself or use LangGraph.
Project: Capstone Project
Honestly, you can do whatever you imagine here. Just think of the most complex AI agent that you need and try to build it. You'll learn a lot along the way. I'm currently in this phase right now, and I'm building a LinkedIn Outreach Agent that does everything from lead scraping, connection requests, outreach, and booking meetings.
Reply