LLMs
Module for Large Language Models.
get_llm(model_id, *args, **kwargs)
Factory function to create and return a language model instance based on the provided model_id.
This function supports three types of language models: 1. DummyLLM: A mock LLM for testing purposes. 2. LocalLLM: For running models locally. 3. VLLM: For running models using the vLLM library. 4. APILLM: For API-based models (default if not matching other types).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
Identifier for the model to use. Special cases: - "dummy" for DummyLLM - "local-{model_name}" for LocalLLM - "vllm-{model_name}" for VLLM - Any other string for APILLM |
required |
*args
|
Variable length argument list passed to the LLM constructor. |
()
|
|
**kwargs
|
Arbitrary keyword arguments passed to the LLM constructor. |
{}
|
Returns:
Type | Description |
---|---|
An instance of DummyLLM, LocalLLM, or APILLM based on the model_id. |
Source code in promptolution/llms/__init__.py
api_llm
Module to interface with various language models through their respective APIs.
APILLM
Bases: BaseLLM
A class to interface with various language models through their respective APIs.
This class supports Claude (Anthropic), GPT (OpenAI), and LLaMA (DeepInfra) models. It handles API key management, model initialization, and provides methods for both synchronous and asynchronous inference.
Attributes:
Name | Type | Description |
---|---|---|
model |
The initialized language model instance. |
Methods:
Name | Description |
---|---|
get_response |
Synchronously get responses for a list of prompts. |
get_response_async |
Asynchronously get responses for a list of prompts. |
Source code in promptolution/llms/api_llm.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
__init__(model_id, token=None, **kwargs)
Initialize the APILLM with a specific model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
Identifier for the model to use. |
required |
token
|
str
|
API key for the model. |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If an unknown model identifier is provided. |
Source code in promptolution/llms/api_llm.py
get_response_async(prompts, max_concurrent_calls=200)
async
Asynchronously get responses for a list of prompts.
This method uses a semaphore to limit the number of concurrent API calls.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompts
|
list[str]
|
List of input prompts. |
required |
max_concurrent_calls
|
int
|
Maximum number of concurrent API calls allowed. |
200
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: List of model responses. |
Source code in promptolution/llms/api_llm.py
invoke_model(prompt, model, semaphore)
async
Asynchronously invoke a language model with retry logic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt
|
str
|
The input prompt for the model. |
required |
model
|
The language model to invoke. |
required | |
semaphore
|
Semaphore
|
Semaphore to limit concurrent calls. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
The model's response content. |
Raises:
Type | Description |
---|---|
ChatDeepInfraException
|
If all retry attempts fail. |
Source code in promptolution/llms/api_llm.py
base_llm
Base module for LLMs in the promptolution library.
BaseLLM
Bases: ABC
Abstract base class for Language Models in the promptolution library.
This class defines the interface that all concrete LLM implementations should follow.
Methods:
Name | Description |
---|---|
get_response |
An abstract method that should be implemented by subclasses to generate responses for given prompts. |
Source code in promptolution/llms/base_llm.py
__init__(*args, **kwargs)
get_response(prompts)
Generate responses for the given prompts.
This method calls the _get_response method to generate responses for the given prompts. It also updates the token count for the input and output tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompts
|
str or List[str]
|
Input prompt(s). If a single string is provided, it's converted to a list containing that string. |
required |
Returns:
Type | Description |
---|---|
str
|
List[str]: A list of generated responses, one for each input prompt. |
Source code in promptolution/llms/base_llm.py
get_token_count()
Get the current count of input and output tokens.
Returns:
Name | Type | Description |
---|---|---|
dict |
A dictionary containing the input and output token counts. |
Source code in promptolution/llms/base_llm.py
reset_token_count()
update_token_count(inputs, outputs)
Update the token count based on the given inputs and outputs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs
|
List[str]
|
A list of input prompts. |
required |
outputs
|
List[str]
|
A list of generated responses. |
required |
Source code in promptolution/llms/base_llm.py
DummyLLM
Bases: BaseLLM
A dummy implementation of the BaseLLM for testing purposes.
This class generates random responses for given prompts, simulating the behavior of a language model without actually performing any complex natural language processing.
Source code in promptolution/llms/base_llm.py
local_llm
Module for running language models locally using the Hugging Face Transformers library.
LocalLLM
Bases: BaseLLM
A class for running language models locally using the Hugging Face Transformers library.
This class sets up a text generation pipeline with specified model parameters and provides a method to generate responses for given prompts.
Attributes:
Name | Type | Description |
---|---|---|
pipeline |
Pipeline
|
The text generation pipeline. |
Methods:
Name | Description |
---|---|
get_response |
Generate responses for a list of prompts. |
Source code in promptolution/llms/local_llm.py
__del__()
Cleanup method to delete the pipeline and free up GPU memory.
__init__(model_id, batch_size=8)
Initialize the LocalLLM with a specific model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The identifier of the model to use (e.g., "gpt2", "facebook/opt-1.3b"). |
required |
batch_size
|
int
|
The batch size for text generation. Defaults to 8. |
8
|
Note
This method sets up a text generation pipeline with bfloat16 precision, automatic device mapping, and specific generation parameters.
Source code in promptolution/llms/local_llm.py
vllm
Module for running language models locally using the vLLM library.
VLLM
Bases: BaseLLM
A class for running language models using the vLLM library.
This class sets up a vLLM inference engine with specified model parameters and provides a method to generate responses for given prompts.
Attributes:
Name | Type | Description |
---|---|---|
llm |
LLM
|
The vLLM inference engine. |
tokenizer |
PreTrainedTokenizer
|
The tokenizer for the model. |
sampling_params |
SamplingParams
|
Parameters for text generation. |
Methods:
Name | Description |
---|---|
get_response |
Generate responses for a list of prompts. |
update_token_count |
Update the token count based on the given inputs and outputs. |
Source code in promptolution/llms/vllm.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
__del__()
__init__(model_id, batch_size=None, max_generated_tokens=256, temperature=0.1, top_p=0.9, model_storage_path=None, dtype='auto', tensor_parallel_size=1, gpu_memory_utilization=0.95, max_model_len=2048, trust_remote_code=False, seed=42, **kwargs)
Initialize the VLLM with a specific model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The identifier of the model to use. |
required |
batch_size
|
int
|
The batch size for text generation. Defaults to 8. |
None
|
max_generated_tokens
|
int
|
Maximum number of tokens to generate. Defaults to 256. |
256
|
temperature
|
float
|
Sampling temperature. Defaults to 0.1. |
0.1
|
top_p
|
float
|
Top-p sampling parameter. Defaults to 0.9. |
0.9
|
model_storage_path
|
str
|
Directory to store the model. Defaults to None. |
None
|
dtype
|
str
|
Data type for model weights. Defaults to "float16". |
'auto'
|
tensor_parallel_size
|
int
|
Number of GPUs for tensor parallelism. Defaults to 1. |
1
|
gpu_memory_utilization
|
float
|
Fraction of GPU memory to use. Defaults to 0.95. |
0.95
|
max_model_len
|
int
|
Maximum sequence length for the model. Defaults to 2048. |
2048
|
trust_remote_code
|
bool
|
Whether to trust remote code. Defaults to False. |
False
|
seed
|
int
|
Random seed for the model. Defaults to 42. |
42
|
**kwargs
|
Additional keyword arguments to pass to the LLM class initialization. |
{}
|
Note
This method sets up a vLLM engine with specified parameters for efficient inference.
Source code in promptolution/llms/vllm.py
update_token_count(inputs, outputs)
Update the token count based on the given inputs and outputs.
Uses the tokenizer to count the tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs
|
List[str]
|
A list of input prompts. |
required |
outputs
|
List[str]
|
A list of generated responses. |
required |