⚙️ Minimal Configuration
There are sensible defaults for most of the configuration parameters. However, you have to specify at least:
- The language models used during the different steps of the EvidenceSeeker pipeline (including API keys if necessary).
- Where to find and/or store the indexed knowledge base used for fact-checking.
The following sections describe how to configure the EvidenceSeeker pipeline minimally for the preprocessor, confirmation analyser, and retriever components.
The minimal configuration requires that your LLM provider supports the following features:
- Constrained decoding/structured output via JSON schemata: Some steps in the EvidenceSeeker pipeline require the language model to return structured output in a specific format (typically JSON).
- Log probabilities: The EvidenceSeeker pipeline relies on the language model to return logprobs (also known as ‘logits’) for the generated tokens.
If your LLM (provider) does not support these features, use an alternative configuration as described under “Advanced Configuration”.
Preprocessor and Confirmation Analyser
You can configure the language models and their API keys for both the preprocessor and the confirmation analyser via configuration files: Create two YAML files, preprocessor_config.yaml
and/or confirmation_analysis_config.yaml
, with the following content:
used_model_key: your_model_identifier
# optional, if you want to use a file for setting up API keys
env_file: path/to/your/api_keys.txt
models:
your_model_identifier:
api_key_name: name_of_your_api_key
backend_type: backend_type_of_your_model
base_url: base_url_of_your_language_model
description: description_of_your_model
max_tokens: 1024
model: model_name_or_identifier
name: name_of_your_model
temperature: 0.2
timeout: 260
Clarifications:
- Both
your_model_identifier
andname_of_your_model
are arbitrary identifiers that you can choose. - Both components expect the API key to be set as an environment variable with the name specified by
api_key_name
. If you use a file with environment variables viaenv_file
, the file should contain a line like this:name_of_your_api_key=your_api_key
. Alternatively, you can set the environment variable directly in your shell or script before running the EvidenceSeeker pipeline.- If you do not need an API key (e.g., if you use a local model), you can omit the
env_file
andapi_key_name
parameters.
- If you do not need an API key (e.g., if you use a local model), you can omit the
base_url
andmodel
are important since they specify the endpoint and model.- For instance, if you use HuggingFace as inference provider with Llama-3.3-70B-Instruct you would set:
base_url
to “https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.3-70B-Instruct/v1” andmodel
to “meta-llama/Llama-3.3-70B-Instruct”
- For instance, if you use HuggingFace as inference provider with Llama-3.3-70B-Instruct you would set:
- The
backend_type
determines which API client to use (e.g., OpenAI, HuggingFace, etc.). For a list of supported backends, see the here.
Retriever Component
The retriever component uses an embedding model to create and search your indexed knowledge base. It can be minimally configured using a YAML configuration file retrieval_config.yaml
with the following content
env_file: path/to/your/api_keys.txt
api_key_name: api_key_name_for_your_embedding_model
embed_backend_type: huggingface_inference_api
embed_base_url: base_url_of_your_embedding_model
embed_model_name: model_name_or_identifier_of_your_embedding_model
# path to your knowledge base that is used to
# create the index of your knowledge base
document_input_dir: path/to/your/knowledge_base
# path to the directory where the index is stored
index_persist_path: path/to/your/index
Clarifications:
Both embed_base_url
and embed_model_name
are important since they specify the endpoint and model. If you use embedding models hosted by HuggingFace and choose, for instance, sentence-transformers/paraphrase-multilingual-mpnet-base-v2 as the model, you would set:
embed_base_url
to “https://router.huggingface.co/hf-inference/models/sentence-transformers/paraphrase-multilingual-mpnet-base-v2” andembed_model_name
to “sentence-transformers/paraphrase-multilingual-mpnet-base-v2”
Executing the Pipeline
Using these configuration files you can fact-check a statement against your knowledge base in the following way:
from evidence_seeker import EvidenceSeeker
import asyncio
= EvidenceSeeker(
pipeline ="path/to/retrieval_config.yaml",
retrieval_config_file="path/to/confirmation_analysis_config.yaml",
confirmation_analysis_config_file="path/to/preprocessing_config.yaml",
preprocessing_config_file
)# run the pipeline
= asyncio.run(pipeline("your statement to fact-check")) results