I have a simple AI app (written in javascript) with a form that sends a request to OpenAI and then builds a set of LinkedIn Ads with the response it gets back. I want to decouple my prompts from my source code so I’ve been reviewing my options for different prompt management systems. I’m after some sort of CMS that I could share with non-technical prompt engineers who can then help me improve my prompts.
This post is a review of Agenta AI, a prompt management tool that should be able to get the job done.
The Set Up
Once you sign up to Agenta and create a new project your dashboard for creating a new prompt looks like this.
When you publish your prompt a new endpoint will appear under the ‘Deployments’ tab on the left hand panel. This endpoint tab will have some implementation code that you can copy directly into your project. You get a choice or Python, Typescript or cURL implementations. The Typescript implementation relied on axios, which I didn’t want to add to my project so I just rewrote my own implementation below.
There is no Agenta client, they simply publish your prompt to an http endpoint. At the time of writing this the ends point are unsecured, so adding the API key is kind of redundant. I have spoken to their support team, who have been tremendously helpful as I was setting everything up, and they have assured me that they are releasing a secure version of their endpoints.
The JSON response form this endpoint is an object with a message field that contain your response form the LLM, as well as fields for token usage, cost and latency information. So if you’re used to working with OpenAI’s API and pull the choices[0].message.content
value from the completion then you don’t have to do that anymore.
Agenta’s endpoints work with their own default LLM keys out of the box for easy setup. You can replace these with your own keys once you are ready to use the endpoints in production.
Gently Stress Testing Their System
If I’m going to work with a prompting team via a prompt manager I want to be reasonably confident that the prompt manager is helping both of us avoid obvious error and mistakes that will force me to waste ages debugging.
Changing the shape of the payload ❌
So I switched the shape of the payload I’m sending Agenta to see how it would handle this type of catastrophic change.
In both cases the call failed silently.
Prompt Manager - The prompt doesn’t get called
The prompt did not run in the prompt manager. They have an observability tab that shows you all of your prompt calls, much like a debugger. I would have liked to see the call with an error associated. This would at least let me tell the development team that something critical changed on their end.
Codebase - The function returns undefined.
There were no errors in the code base. I wrapped the fetch call in a try catch block and still nothing. The function ran normally and just returned undefined
.
Sending too much information 🙈
Next I tried adding an extra key value to the object I was passing into my prompt. So instead of sending a payload object with data from 5 form fields, I sent an object with 6 keys.
Prompt Manager - The prompt worked as normal and just ignored the extra information. However it did show the extra input variable being passed in in the observability panel.
Codebase - No errors or warnings, the redundant data was just ignored.
Not sending enough information 🤐
I added a new input variable to the prompt in the prompt manager to see if calling a prompt with insufficient data would throw any kind of warning.
Prompt Manager - The prompt worked as normal and just ignored the missing information. I could not find any way to mark a input variable as required. they all just seem to be optional by default.
Codebase - Same, there were no errors or warnings in my console.
Other notes
Rollback to a previous version of the prompt once I’d made a mess of thing was super easy. They have a history tab on the endpoints section under the deployment tab that lets you revert to which ever version you want.
I tried to set the response output to JSON and then reduce the max tokens down the force the JSON to be returned incomplete, just to see how it would handle the error. Unfortunately it told me JSON mode doesn’t work with gpt-4o (which it does) and the tokens setting was also non response since I set the token limit to 4000 but they are showing up as -1 in the observability panel trace details (despite returning plenty of tokens)
Agenta is a useful platform but it has quite a lot of rough edges to clean up. What I like most about it is that the interface isn’t overly technical. I would feel comfortable sharing this interface with a non-technical contributor. Whereas a lot of the other prompt managers I’ve been reviewing are almost built as developer tools and completely overwhelm non-technical users.