LLMs
Module for Large Language Models.
api_llm
Module to interface with various language models through their respective APIs.
APILLM
Bases: BaseLLM
A class to interface with language models through their respective APIs.
This class provides a unified interface for making API calls to language models using the OpenAI client library. It handles rate limiting through semaphores and supports both synchronous and asynchronous operations.
Attributes:
Name | Type | Description |
---|---|---|
model_id |
str
|
Identifier for the model to use. |
client |
AsyncOpenAI
|
The initialized API client. |
max_tokens |
int
|
Maximum number of tokens in model responses. |
semaphore |
Semaphore
|
Semaphore to limit concurrent API calls. |
Source code in promptolution/llms/api_llm.py
__init__(api_url=None, model_id=None, api_key=None, max_concurrent_calls=50, max_tokens=512, config=None)
Initialize the APILLM with a specific model and API configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_url
|
str
|
The base URL for the API endpoint. |
None
|
model_id
|
str
|
Identifier for the model to use. |
None
|
api_key
|
str
|
API key for authentication. Defaults to None. |
None
|
max_concurrent_calls
|
int
|
Maximum number of concurrent API calls. Defaults to 50. |
50
|
max_tokens
|
int
|
Maximum number of tokens in model responses. Defaults to 512. |
512
|
config
|
ExperimentConfig
|
Configuration for the LLM, overriding defaults. |
None
|
Raises:
Type | Description |
---|---|
ImportError
|
If required libraries are not installed. |
Source code in promptolution/llms/api_llm.py
base_llm
Base module for LLMs in the promptolution library.
BaseLLM
Bases: ABC
Abstract base class for Language Models in the promptolution library.
This class defines the interface that all concrete LLM implementations should follow. It's designed to track which configuration parameters are actually used.
Attributes:
Name | Type | Description |
---|---|---|
config |
LLMModelConfig
|
Configuration for the language model. |
input_token_count |
int
|
Count of input tokens processed. |
output_token_count |
int
|
Count of output tokens generated. |
Source code in promptolution/llms/base_llm.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
__init__(config=None)
Initialize the LLM with a configuration or direct parameters.
This constructor supports both config-based and direct parameter initialization for backward compatibility.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
ExperimentConfig
|
Configuration for the LLM, overriding defaults. |
None
|
Source code in promptolution/llms/base_llm.py
get_response(prompts, system_prompts=None)
Generate responses for the given prompts.
This method calls the _get_response method to generate responses for the given prompts. It also updates the token count for the input and output tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompts
|
str or List[str]
|
Input prompt(s). If a single string is provided, it's converted to a list containing that string. |
required |
system_prompts
|
(Optional, str or List[str])
|
System prompt(s) to provide context to the model. |
None
|
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: A list of generated responses, one for each input prompt. |
Source code in promptolution/llms/base_llm.py
get_token_count()
Get the current count of input and output tokens.
Returns:
Name | Type | Description |
---|---|---|
dict |
A dictionary containing the input and output token counts. |
Source code in promptolution/llms/base_llm.py
reset_token_count()
set_generation_seed(seed)
Set the random seed for reproducibility per request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seed
|
int
|
Random seed value. |
required |
update_token_count(inputs, outputs)
Update the token count based on the given inputs and outputs.
It uses a simple tokenization method (splitting by whitespace) to count tokens in the base class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs
|
List[str]
|
A list of input prompts. |
required |
outputs
|
List[str]
|
A list of generated responses. |
required |
Source code in promptolution/llms/base_llm.py
local_llm
Module for running LLMs locally using the Hugging Face Transformers library.
LocalLLM
Bases: BaseLLM
A class for running language models locally using the Hugging Face Transformers library.
This class sets up a text generation pipeline with specified model parameters and provides a method to generate responses for given prompts.
Attributes:
Name | Type | Description |
---|---|---|
pipeline |
Pipeline
|
The text generation pipeline. |
Methods:
Name | Description |
---|---|
get_response |
Generate responses for a list of prompts. |
Source code in promptolution/llms/local_llm.py
__del__()
__init__(model_id, batch_size=8, config=None)
Initialize the LocalLLM with a specific model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The identifier of the model to use (e.g., "gpt2", "facebook/opt-1.3b"). |
required |
batch_size
|
int
|
The batch size for text generation. Defaults to 8. |
8
|
config
|
ExperimentConfig
|
"ExperimentConfig" overwriting defaults. |
None
|
Note
This method sets up a text generation pipeline with bfloat16 precision, automatic device mapping, and specific generation parameters.
Source code in promptolution/llms/local_llm.py
vllm
Module for running language models locally using the vLLM library.
VLLM
Bases: BaseLLM
A class for running language models using the vLLM library.
This class sets up a vLLM inference engine with specified model parameters and provides a method to generate responses for given prompts.
Attributes:
Name | Type | Description |
---|---|---|
llm |
LLM
|
The vLLM inference engine. |
tokenizer |
PreTrainedTokenizer
|
The tokenizer for the model. |
sampling_params |
SamplingParams
|
Parameters for text generation. |
Methods:
Name | Description |
---|---|
get_response |
Generate responses for a list of prompts. |
update_token_count |
Update the token count based on the given inputs and outputs. |
Source code in promptolution/llms/vllm.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
__init__(model_id, batch_size=None, max_generated_tokens=256, temperature=0.1, top_p=0.9, model_storage_path=None, dtype='auto', tensor_parallel_size=1, gpu_memory_utilization=0.95, max_model_len=2048, trust_remote_code=False, seed=42, llm_kwargs=None, config=None)
Initialize the VLLM with a specific model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The identifier of the model to use. |
required |
batch_size
|
int
|
The batch size for text generation. Defaults to 8. |
None
|
max_generated_tokens
|
int
|
Maximum number of tokens to generate. Defaults to 256. |
256
|
temperature
|
float
|
Sampling temperature. Defaults to 0.1. |
0.1
|
top_p
|
float
|
Top-p sampling parameter. Defaults to 0.9. |
0.9
|
model_storage_path
|
str
|
Directory to store the model. Defaults to None. |
None
|
dtype
|
str
|
Data type for model weights. Defaults to "float16". |
'auto'
|
tensor_parallel_size
|
int
|
Number of GPUs for tensor parallelism. Defaults to 1. |
1
|
gpu_memory_utilization
|
float
|
Fraction of GPU memory to use. Defaults to 0.95. |
0.95
|
max_model_len
|
int
|
Maximum sequence length for the model. Defaults to 2048. |
2048
|
trust_remote_code
|
bool
|
Whether to trust remote code. Defaults to False. |
False
|
seed
|
int
|
Random seed for the model. Defaults to 42. |
42
|
llm_kwargs
|
dict
|
Additional keyword arguments for the LLM. Defaults to None. |
None
|
config
|
ExperimentConfig
|
Configuration for the LLM, overriding defaults. |
None
|
Note
This method sets up a vLLM engine with specified parameters for efficient inference.
Source code in promptolution/llms/vllm.py
set_generation_seed(seed)
Set the random seed for text generation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seed
|
int
|
Random seed for text generation. |
required |
update_token_count(inputs, outputs)
Update the token count based on the given inputs and outputs.
Uses the tokenizer to count the tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs
|
List[str]
|
A list of input prompts. |
required |
outputs
|
List[str]
|
A list of generated responses. |
required |