Examples

🚧 Docs under construction 🚧

Below are some examples for inspecting and checking different chains.

📄️ Agent VectorDB Question Answering Benchmarking

Here we go over how to benchmark performance on a question answering task using an agent to route between multiple vectordatabases.

📄️ Comparing Chain Outputs

Suppose you have two different prompts (or LLMs). How do you know which will generate "better" results?

📄️ Data Augmented Question Answering

This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your proprietary data.

📄️ Evaluating an OpenAPI Chain

This notebook goes over ways to semantically evaluate an OpenAPI Chain, which calls an endpoint defined by the OpenAPI specification using purely natural language.

📄️ Question Answering Benchmarking: Paul Graham Essay

Here we go over how to benchmark performance on a question answering task over a Paul Graham essay.

📄️ Question Answering Benchmarking: State of the Union Address

Here we go over how to benchmark performance on a question answering task over a state of the union address.

📄️ QA Generation

This notebook shows how to use the QAGenerationChain to come up with question-answer pairs over a specific document.

📄️ Question Answering

This notebook covers how to evaluate generic question answering problems. This is a situation where you have an example containing a question and its corresponding ground truth answer, and you want to measure how well the language model does at answering those questions.

📄️ SQL Question Answering Benchmarking: Chinook

Here we go over how to benchmark performance on a question answering task over a SQL database.