π Getting Started
The EvidenceSeeker Boilerplate is in an early stage of development. Currently, we offer:
- The EvidenceSeeker Demo App: An EvidenceSeeker instance based on all 2024 APuZ editions as knowledge base with a minimalistic user interface.
- The
evidence-seeker
Python package: The EvidenceSeeker Boilerplate is a Python package, which can be used to build your own EvidenceSeeker instance.
For subsequent releases, we are working on (hosted) versions of the boilerplate, which should make the setup and integration of EvidenceSeeker instances even easier.
EvidenceSeeker Demo App
We have set up a small EvidenceSeeker with all 2024 APuZ editions as knowledge base and provide access to this EvidenceSeeker via a small Gradio app, which provides a minimal user interface.
Relevant links:
- Our APuZ-EvidenceSeeker Demo App on Hugging Face: If you are interested in experimenting with this demo app, contact us for access information!
- EvidenceSeeker Demo App Results: This is a collection of fact-checks and their results performed by our demo app. Visit this site to get a first impression of the capabilities of the EvidenceSeeker demo app.
Side note: The EvidenceSeeker demo app is part of our Python package. You can run the demo app locally if you want to experiment with your knowledge base or if you want to use a different language model.
The evidence-seeker
Python package
The evidence-seeker
Python package is available on PyPI. Follow the subsequent steps to set up your own EvidenceSeeker instance.
1. Prerequisites
You need to have Python (3.11 or 3.12) and pip
installed.
- Installing Python: You can choose between different ways to install Python, depending on your operating system. You can find instructions for installing Python, for instance,
- on the Python wiki or
- in the Real-Python installation guide.
- Installing
pip
: If you have installed Python, you should also havepip
installed. You can check this by runningpip --version
in your terminal. If you do not havepip
installed, you can find instructions on installing it here.
2. Installing the evidence-seeker
package
Open a terminal and use pip
to install the evidence-seeker
package from PyPI:
pip install evidence-seeker
3. Preparation
Generating a directory structure might be helpful before you begin configuring your EvidenceSeeker instance. There, you can put your configuration files, the knowledge base, the index, and possibly other resources. This is not strictly necessary as long as you specify the different locations in the configuration files.
The EvidenceSeeker boilerplate comes with a command line interfaceβthe evse
cliβthat can create a directory structure for your EvidenceSeeker instance. Calling the evse
cli with:
evse init --name name_of_your_evidence_seeker
will create the following directory structure in the current working directory, and will contain configuration files with default values.
name_of_your_evidence_seeker/
βββ config/ # Directory for configuration files
β βββ preprocessor.yaml
β βββ retriever.yaml
β βββ confirmation_analysis.yaml
β βββ demo_app_config.yaml
β βββ api_keys.txt # File for API keys
βββ knowledge_base/
β βββ metadata.json # Metadata for the knowledge base files
β βββ data_files/ # Directory for the knowledge base files (e.g., PDF files)
β βββ file1.pdf
β βββ file2.pdf
β βββ ...
βββ logs/ # Directory for logging
βββ embeddings/ # Directory for the index
For the following steps, navigate to the directory of your EvidenceSeeker instance:
cd name_of_your_evidence_seeker
4. Configuration
There are various ways to configure your EvidenceSeeker instance to your needsβeither in your Python code or via YAML configuration files. At least, you have to specify the language models used during the different steps of the EvidenceSeeker pipeline and the indexed knowledge base used for fact-checking. For all other settings, sensible defaults allow starting with a minimal configuration.
For details, see the Configuration section.
5. Building the index
EvidenceSeeker instances fact-check statements relative to a specified knowledge base. The knowledge base can, for instance, comprise a set of PDF files.
For an EvidenceSeeker instance to work, you need to create a searchable index from the documents of your knowledge base. This process converts your documents (like PDFs) into vector embeddingsβa numerical representation that allows the system to find semantically similar content when fact-checking statements. Think of it like creating a detailed catalog: instead of manually searching through hundreds of documents, the system can quickly identify which parts of your knowledge base are most relevant to any given claim. Accordingly, you have to create such an index using an embedding model.
If you used evse init
to create the directory structure, you can use the evse
CLI to create an index in the following way:
- Copy all PDF files you want to use as knowledge base into the
knowledge_base/data_files
directory of your EvidenceSeeker instance. - If you want to provide the fact checker with metadata for the files in your knowledge base, create a file
meta_data.json
in theknowledge_base/
directory that contains metadata for each PDF file. The metadata should be in JSON format and can include fields liketitle
,author
,date
, etc. For example:
{
"file1.pdf": {
"title": "Title of File 1",
"author": "Author of File 1",
"date": "2024-01-01"
},
"file2.pdf": {
"title": "Title of File 2",
"author": "Author of File 2",
"date": "2024-02-01"
}
}
- If not already done, navigate to the directory of your EvidenceSeeker instance, e.g.,
cd name_of_your_evidence_seeker/
. - Ensure you have set the environment variables for your API keys in the file
config/api_keys.txt
of your EvidenceSeeker instance. - Now you can build the index using
evse build-index
. This will create an index of the PDF files in theknowledge_base/data_files
directory and store it in theembeddings/
directory. The index will be created using the embedding model specified in the configuration file.
Alternatively, you can use and adapt the following Python snippet to create an index:
from evidence_seeker import RetrievalConfig, IndexBuilder
= RetrievalConfig(
config ### Using local embedding model (via Huggingface API)
="huggingface",
embed_backend_type="sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
embed_model_name# Path to the directory containing your PDF files
="path/to/your/pdf_files",
document_input_dir# Path where the index will be stored
="path/to/your/index",
index_persist_path
)= IndexBuilder(config=config)
index_builder index_builder.build_index()
For further details on configuring the EvidenceSeeker Retriever component, see here and here.
6. Executing the Pipeline
Once you built the index, you can run the EvidenceSeeker pipeline to fact-check statements against the knowledge base.
Using the evse
CLI
With the evse
cli, you can run the pipeline with:
evse run --input "your statement to fact-check"
The output will be written as a Markdown file into the logs/
directory of your EvidenceSeeker instance.
You can also specify the location and name of the output file with the --output
option:
evse run --input "your statement to fact-check" --output "path/to/output/file.md"
Using the EvidenceSeeker Demo App
You can also run the pipeline via the EvidenceSeeker Demo App, a small User Interface based on Gradio, which will be accessible via a web browser.
You can run the app locally with:
evse demo-app
This will start the app on your local machine, and you can access it via http://localhost:7860 in your web browser.
Programmatically using the evidence-seeker
package
Alternatively, you can run the pipeline with the following Python snippet:
from evidence_seeker import EvidenceSeeker
import asyncio
= EvidenceSeeker(
pipeline ="path/to/retrieval_config.yaml",
retrieval_config_file="path/to/confirmation_analysis_config.yaml",
confirmation_analysis_config_file="path/to/preprocessing_config.yaml",
preprocessing_config_file
)# run the pipeline
= asyncio.run(pipeline("your statement to fact-check")) results
7. π§ Integration into Existing Workflows
Currently, there are two ways to integrate EvidenceSeeker into your existing workflows:
- Programmatically integrating your EvidenceSeeker instance by using the
evidence-seeker
package, or - by exposing the EvidenceSeeker Demo App as an MCP server (for details, see this HF blog post).
Development Version
If you want to have more control over your EvidenceSeeker instance or if you want to implement an additional feature, you can use the development version of the EvidenceSeeker Boilerplate by git-cloning the EvidenceSeeker repository:
git clone git@github.com:debatelab/evidence-seeker.git
By default, we use hatch for Python package and environment management. The repository contains the description of a development environment with pinned dependencies. You can create and spawn a corresponding Python environment with
hatch -v shell evse-dev_env.py3.11
or
hatch -v shell evse-dev_env.py3.12