Examples
🚧 Docs under construction 🚧
Below are some examples for inspecting and checking different chains.
📄️ Agent VectorDB Question Answering Benchmarking
Here we go over how to benchmark performance on a question answering task using an agent to route between multiple vectordatabases.
📄️ Comparing Chain Outputs
Suppose you have two different prompts (or LLMs). How do you know which will generate "better" results?
📄️ Data Augmented Question Answering
This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your proprietary data.
📄️ Evaluating an OpenAPI Chain
This notebook goes over ways to semantically evaluate an OpenAPI Chain, which calls an endpoint defined by the OpenAPI specification using purely natural language.
📄️ Question Answering Benchmarking: Paul Graham Essay
Here we go over how to benchmark performance on a question answering task over a Paul Graham essay.
📄️ Question Answering Benchmarking: State of the Union Address
Here we go over how to benchmark performance on a question answering task over a state of the union address.
📄️ QA Generation
This notebook shows how to use the QAGenerationChain to come up with question-answer pairs over a specific document.
📄️ Question Answering
This notebook covers how to evaluate generic question answering problems. This is a situation where you have an example containing a question and its corresponding ground truth answer, and you want to measure how well the language model does at answering those questions.
📄️ SQL Question Answering Benchmarking: Chinook
Here we go over how to benchmark performance on a question answering task over a SQL database.