Example Configuration: Local Models with LMStudio
The EvidenceSeeker pipeline can be configured to use local models. This allows you to run the pipeline without relying on external APIs, which can be beneficial for privacy, cost, and performance reasons. We illustrate how to set up the EvidenceSeeker pipeline using local models with LMStudio and Hugging Face. We will use:
- LMStudio to serve Llama-3.2-1B-Instruct as LLM for the preprocessor and confirmation analyser components, and
- paraphrase-multilingual-mpnet-base-v2 as embedding model, which will be loaded locally by using the Hugging Face Transformers library.
Prerequisites
To run the EvidenceSeeker pipeline with local models served by LMStudio, you have to install LMStudio, download the Llama-3.2-1B-Instruct model within LMStudio, and load the model in LMStudio (see here for details). LMStudio will expose the model via a local HTTP server, which the EvidenceSeeker pipeline can access.
Configuration
Retriever Component
The retriever component can be configured with the following YAML configuration file:
embed_backend_type: huggingface
embed_model_name: sentence-transformers/paraphrase-multilingual-mpnet-base-v2
env_file: path/to/your/api_keys.txt
# Hugging Face Hub Path (optional), from where the index can be loaded. We use
# EvidenceSeeker's example index hub path: DebateLabKIT/apuz-index-es
# see: https://huggingface.co/datasets/DebateLabKIT/apuz-index-es
index_hub_path: DebateLabKIT/apuz-index-es
# token name for accessing index from Hugging Face Hub
hub_key_name: your_hub_key_name
# Path where the index is stored locally and/or loaded from.
index_persist_path: path/for/storing/the/index
Clarifications:
- Setting
embed_backend_type
tohuggingface
tells the retriever component to use the Hugging Face Transformers library to load the embedding model locally. hub_key_name
is the name of the environment variable that contains your API key for accessing the index from Hugging Face Hub.- Here, were load the APUZ example index from a Hugging Face Hub repository (via
index_hub_path
) and store it locally as specified byindex_persist_path
(for alternative configuration see here).
Preprocessor
Similar to the configuration of using Hugging Face’s Inference Provider, the preprocessor component can be configured by specifying the model and endpoint in the configuration file. The following example shows how to configure the preprocessor component with the Llama-3.2-1B-Instruct model served by LMStudio:
timeout: 1200
used_model_key: lmstudio
models:
lmstudio:
name: llama-3.2-1b-instruct
description: Local model served via LMStudio
base_url: http://127.0.0.1:1234/v1/
model: llama-3.2-1b-instruct
backend_type: openai
max_tokens: 1024
temperature: 0.2
api_key: not_needed
timeout: 260
Clarifications:
- We define a model with the identifier
lmstudio
. The defined model is set as the global model for the preprocessor component viaused_model_key
. (Alternatively, you can assign models to specific steps in the preprocessor pipeline. See here for details.) backend_type
is set toopenai
since LMStudio allows inference calls to be made via an OpenAI-compatible API.base_url
is set to the local URL where LMStudio serves the model. If you have configured LMStudio to use a different port, make sure to adjust the port number.model
is set to the model’s name as it is configured in LMStudio.name
anddescription
are arbitrary identifiers you can choose to describe the model.
Confirmation Analyser
The configuration of the confirmation analyser component is trickier since it requires the correct configuration of how to calculate degrees of confirmation (see here for details) within the pipeline of the confirmation analyser:
timeout: 1200
used_model_key: lmstudio
models:
lmstudio:
name: llama-3.2-1b-instruct
description: Local model served via LMStudio
base_url: http://127.0.0.1:1234/v1/
model: llama-3.2-1b-instruct
backend_type: openai
max_tokens: 1024
temperature: 0.2
api_key: not_needed
timeout: 260
# step configuration of the multiple choice task
multiple_choice_confirmation_analysis:
description: Multiple choice RTE task given CoT trace.
name: multiple_choice_confirmation_analysis
llm_specific_configs:
lmstudio:
guidance_type: json
logprobs_type: estimate
n_repetitions_mcq: 30
Clarifications:
- The model configuration is the same as that of the preprocessor component.
- The
multiple_choice_confirmation_analysis
section configures the multiple choice step of the confirmation analyser component. Withlmstudio
underllm_specific_configs
, we specify a model-specific step configuration.- Since LMStudio does not return logprobs, we set
logprobs_type
toestimate
andn_repetitions_mcq
to \(30\) to estimate the logprobs by repeating the inference request \(30\) times (for details, see here).
- Since LMStudio does not return logprobs, we set
There is a trade-off when estimating logprobs by repeating inference requests: To increase the accuracy of logprobs estimation, you should set a sufficiently high value for n_repetitions_mcq
(>100). However, this will also increase the inference time and cost. Using n_repetitions_mcq=30
should be considered as a mere proof of concept. Using a model that supports an explicit logprobs output is always preferable.