Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

On the unpredictable nature of LLM output and type safety in LangChain TS

Tags: error string tool

Posted on Oct 11 • Originally published at octomind.dev At Octomind, we are using Large Language Models (LLMs) to interact with web app UIs and extract test case steps that we want to generate. We use the LangChain library to build interaction chains with LLMs. The LLM receives a task prompt, and we as developers provide tools the model can utilize to solve the task. The unpredictable and non-deterministic nature of the LLM output makes ensuring type safety quite a challenge. LangChain's approach to parsing input and handling errors often leads to unexpected and inconsistent outcomes within the type system. I’d like to share what I learned about parsing and Error handling of LangChain.I will explain: *** all code examples are using LangChain TS on the main branch on September 22nd, 2023 (roughly version 0.0.153).There are two languages supported by LangChain - Python and JS/TypeScript. There were some pros and some cons with TypeScript:We decided to go for the TypeScript version of LangChain to implement parts of our AI-based test discoveries.Full disclosure, I didn’t look into how the Python version handles the issues described below. Have you found similar issues in the python version? Feel free to share them directly in the GitHub issue I created, find the link at the end of the article.In LangChain, you can provide a set of tools that may be called by the model if it deems it necessary. For our purposes, a tool is simply a class with a _call function that does something that the model can not do on its own, like click on a button on a web page. The arguments to that function are provided by the model.When your tool implementation depends on the developer knowing the input format (in contrast to just doing something with text generated by the model), LangChain provides a class called StructuredTool.The StructuredTool adds a zod schema to the tool, which is used to parse whatever the model decides to call the tool with, so that we can use this knowledge in our code.Let's build our "click" example under the assumption that we want the model to give us a query selector to click on:Now when you look at this class, it seems reasonably simple without a lot of potential for things to go wrong. But how does the model actually know what schema to supply? It has no intrinsic functionality for this, it just generates a string response to a prompt.When LangChain informs the model about the tools at its disposal, it will generate format instructions for each tool. These instructions define what JSON is, and what the specific input schema the model should generate to use a tool.For this, LangChain will generate an addition to your own prompt that looks something like this:You have access to the following tools.You must format your inputs to these tools to match their "JSON schema" definitions below."JSON Schema" is a declarative language that allows you to annotate and validate JSON documents.For example, the example "JSON Schema" instance {"properties": {"foo": {"description": "a list of test words", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}would match an object with one required property, "foo". The "type" property specifies "foo" must be an "array", and the "description" property semantically describes it as "a list of test words". The items within "foo" must be strings.Thus, the object {"foo": ["bar", "baz"]} is a well-formatted instance of this example "JSON Schema". The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.Here are the JSON Schema instances for the tools you have access to:click: left click on an element on a web page represented by a query selector, args: {"selector":{"type":"string","description":"The query selector to click on."}}Now we have a best-effort way to make the model call our tool with inputs in the correct schema. Best effort unfortunately does not guarantee anything. It is entirely possible, that the model generates input that does not adhere to the schema.So let's take a look at the implementation of StructuredTool to see how it deals with that issue. StructuredTool.call is the function that eventually calls our _call method from above.It starts like this:The signature of arg is interpreted as follows:If after parsing the tool’s schema, the output can be just a string, this can also be a string, or whatever object the schema defines as input. This is the case if you define your schema as schema = z.string().In our case, our schema can not be parsed to a string, so this simplifies to the type { selector: string }, or ClickSchema.But is this actually the case?According to the implementation, we only check that the input actually adheres to the schema inside of call. The signature reads like we have already made some assumptions about the input.So one might replace the signature with something like:But looking at it further, even this has issues. The only thing we know for certain is that the model will give us a string. This means there are two options:1. call really should have the following signature:2. There is another element to this:Something must have already decided that the string returned by the model is valid JSON and have parsed it.In case that z.output extends string, something somewhere must have already decided that string is an acceptable input format for the tool, and we do not need to parse JSON. (A string by itself is not valid JSON, JSON.parse("foo") will result in a SyntaxError).Of course, the second option is what is happening. For this use case, LangChain provides a concept called OutputParser.Let's take a look at the default one (StructuredChatOuputParser) and its [parse method](https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/agents/structured_chat/outputParser.ts#L112) in particular.We don't need to understand every detail, but we can see that this is where the string that the model produces is parsed to JSON, and errors are thrown if it is not valid JSON. So, from this we either get AgentAction or AgentFinish. We don't need to concern ourselves with AgentFinish, since it is just a special case to indicate that the interaction with the model is done.AgentAction is defined as:By now you might have already seen - neither AgentAction nor the StructuredChatOutputParserWithRetries is generic, and there is no way to connect the type of toolInput with our ClickSchema.Since we don't know which tool the agent has actually selected, we can not (easily) use generics to represent the actual type, so this is expected. But worse, toolInput is typed as string, even though we just used JSON.parse to get it!Consider the positive case where the model produced output that matches our schema, let's say the string "{\"selector\": \"myCoolButton\"}" (wrapped in all the extra fluff LangChain requires to correctly parse). Using JSON.parse, this will deserialize to an object { selector: "myCoolButton" } and not a string.But because JSON.parse's return type is any, the typescript compiler has no chance of realizing this. Unfortunately for us, this also means that we, as developers, have a hard time to realize this.To understand why this is troublesome, we need to look into the execution loop where the AgentActions are used to actually invoke the tool.This happens here in AgentExecutor._call. We don't really need to understand everything that this class does. Think of it as the wrapper that handles the interaction of the model with the tool implementations to actually call them.The _call method is quite long, so here is a reduced version that only contains parts relevant for our problem (these methods are simplified parts of _call and not in the actual code base of LangChain).The first thing that happens in the loop is to look for the next action to execute. This is where the parsing using the OutputParser comes in, and where its exceptions are handled.You can see that in the case of an error, the toolInput field will always be a string (if this.handleParsingErrors is a function, the return type is also string).But we have just seen above, that in the non-error case toolInput will be parsed JSON! This is inconsistent behavior, we never parse the output of handleParsingErrors to JSON.Let's look at how the loop continues. The next step is to call the selected tool with the given input:We only pass the previously computed output on to the tool in tool.call(action.toolInput)!And in case this causes another error, we re-use the same function to handle parsing errors that will return a string that is supposed to be the tool output in the error case.Let's summarize all the issues:At Octomind, we are using the AgentSteps to extract the test case steps that we want to generate. We noticed that the model often makes the same errors with the tool input format.Recall our ClickSchema, which is just { selector: string }.In our clicking example it would either generate according to the schema, or { element: string }, or just a string which is the value we want, like "myCoolButton".So we built an auto-fixer for these common error cases. The fixer basically just checks whether it can fix the input using either of the options above. The earliest we can inject this code without overwriting a lot of the planning logic that LangChain provides is in StructuredTool.call.We can not handle it using handleParsingErrors, since that receives only the error as input, and not the original input. Once you are overwriting StructuredTool.call, you are relying on the signature of that function to be correct, which we just saw is not the case.At this point, I was stuck having to figure out all of the above to see why I am getting wrongly typed inputs. While these hurdles can be frustrating, they also present opportunities to take a deep dive into the library and come up with possible solutions instead of complaining.I have opened two issues at LangChain JS/TS to discuss ideas on how to solve these problems: This issue requires a bit of a lengthy explanation, but the overall problem is:The types of StructuredTool, AgentAction, parsing error handling in AgentExecutor and StructuredChatOutputParser don't fit together, and it only typechecks kind of by accident at the moment.See also the summary at the bottom.I am going to explain the issue with an example. Let's assume we have a StructuredTool that can click on an element on a web page:When you look at the signature and implementation of StructuredTool.call, it seems like we already know what the input is, but in reality, the validation only happens inside of that function:In our case, our schema can not be string, so this simplifies to the type { selector: string }.The signature reads like we have already made some assumptions about the input, where in reality we are only at a signature that looks like this:But even that has more issues:This is where the OutputParser comes in. The part that we really care about is the parse method:This is where the string that the model produces is parsed to JSON, and errors are thrown if it is not valid JSON.A non-json string will throw a syntax error if passed into JSON.parse.From parsing, we get an AgentAction (we can ignore AgentFinish for now) that looks like this:toolInput is typed as string, even though we just used JSON.parse to get it!Consider the positive case where the model produced output that matches our schema, let's say the string "{\"selector\": \"myCoolButton\"}".Using JSON.parse, this will deserialize to an object { selector: "myCoolButton" }, and not a string.But because JSON.parse's return type is any, the typescript compiler has no chance of realizing this.To understand why this is troublesome, we need to look into the execution loop where the AgentActions are used to actually invoke the tool.This happens here in AgentExecutor._call. I've split the relevant parts of the method into these two smaller methods and simplied a bit to show my point:This is where the parsing using the OutputParser comes in, and where its exceptions are handled.You can see that in the case of an error, the toolInput field will always be a string (if this.handleParsingErrors is a function, the return type is also string).But we have just seen above, that in the non-error case toolInput will be parsed JSON!This is inconsistent behavior, we never parse the output of handleParsingErrors to JSON, so we are now in a state where toolInput is sometimes a string, and sometimes parsed json.The next step is to call the selected tool with the given input:We only pass the previously computed output on to the tool in tool.call(action.toolInput)! We do not actually have any guarantees for the input types to the tool!And in case this causes another error, we re-use the same function to handle parsing errors that will return a string that is supposed to be the tool output in the error case.We noticed that the model often times makes the same errors with the tool input format.Recall our ClickSchema, which is just { selector: string }.In our clicking example it would either generate according to the schema, or { element: string }, or just a string which was the value we want, like "myCoolButton".So we built and auto-fixer for these common error cases.The fixer basically just checks whether it can fix the input using either of the options above.The earliest we can inject this code without overwriting a lot of the planning logic that LangChain provides is in StructuredTool.call.We can not handle it using handleParsingErrors, since that receives only the error as input, and not the causing text.Once you are overwriting StructuredTool.call, you are relying on the signature of that function to be correct, which we just saw is not the case.It would also be great if the corrected tool input could be serialized in the intermediate steps, which we can only do through some hacks at the moment, because the steps are not part of the error handling process. Separate issue for this: #2711 .At this point, you are stuck having to figure out all of the above to see why you are getting wrongly typed inputs to call and in the resulting intermediateSteps.Unfortunately, anything that really fixes this is a breaking change. Nonetheless, this is what I would propose:I would be willing to contribute here if we can find a good solution.I have not looked into the python code for this, but assume it has the same problem.This ticket is related to #2710 in the sense that I stumbled into that issue when trying to implement what I am proposing here, and some of the improvements go hand in hand.I was trying to build an error correction for structured tool inputs that fixes common errors that the model does over and over automatically, without calling the model again.Since each tool has different such errors, it needs to happen within StructuredTool.call so that I can customize it per tool.At the moment it looks like this:My first proposal is to extract the parsing part into its own method that can be overwritten in subclasses, and only pass that to call afterward. That would improve the typing issues in #2710.Then comes the next issue:Inside AgentExecutor._call, we aggregate the AgentSteps that get returned to the user using returnIntermediateSteps=true.We can not modify these steps at the moment. So if it is possible to auto-correct the input to the tool, I would like this to be reflected in the toolInput field of AgentAction.For this we have two options: Either pass the current step into the parsing method to be modified in-place, or overwrite the the field inside AgentExecutor._call with the return value of the parsing method. This would also help to solve the typing issue described in #2710.In case we detect that we can not use the auto-fixing in a specific error case, we need to fall back to the default error handling, which is currently done inside AgentExecutor._call as well. For this, we need to be able to throw ToolInputParsingException, which can currently not be imported. It is marked as export in tools/base, but when you try to import it, it only semi-works from the dist-folder and results in an immediate crash:Will result inSo this needs to be exposed so that it can actually be used by the user. This would also be beneficial for AgentExecutor.handleParsingErrors. At the moment, the callback you can pass in for that receives the the error, so you can do something different depending on whether it is an output parser error, or a tool input error. But you can not use instanceof ToolInputParsingException as a switch, because you can't import the error type.You also can not do anything meaningful here, because you only receive the error, but not the input that caused the error.So another option to implement these kind of custom error fixes would be to change the input arguments for handleParsingErrors toPlease provide some feedback, I would be willing to contribute here. This should be solved together with #2710 to get the typing correct.Feel free to jump in!Veith RöthlingshöferML engineer at OctomindTemplates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse zvone187 - Sep 6 Sloan the DEV Moderator - Sep 24 anasrin - Sep 24 Serif COLAKEL - Sep 24 Once suspended, daniel-octomind will not be able to comment or publish posts until their suspension is removed. Once unsuspended, daniel-octomind will be able to comment and publish posts again. Once unpublished, all posts by daniel-octomind will become hidden and only accessible to themselves. If daniel-octomind is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to daniel-octomind. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag daniel-octomind: daniel-octomind consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging daniel-octomind will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

On the unpredictable nature of LLM output and type safety in LangChain TS

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×