|
36 | 36 | @component
|
37 | 37 | class HuggingFaceLocalChatGenerator:
|
38 | 38 | """
|
39 |
| - A Chat Generator component that uses models available on Hugging Face Hub to generate chat responses locally. |
| 39 | + Generates chat responses using models from Hugging Face that run locally. |
40 | 40 |
|
41 |
| - The `HuggingFaceLocalChatGenerator` class is a component designed for generating chat responses using models from |
42 |
| - Hugging Face's model hub. It is tailored for local runtime text generation tasks and provides a convenient interface |
43 |
| - for working with chat-based models, such as `HuggingFaceH4/zephyr-7b-beta` or `meta-llama/Llama-2-7b-chat-hf` |
44 |
| - etc. |
| 41 | + Use this component with chat-based models, |
| 42 | + such as `HuggingFaceH4/zephyr-7b-beta` or `meta-llama/Llama-2-7b-chat-hf`. |
| 43 | + LLMs running locally may need powerful hardware. |
| 44 | +
|
| 45 | + ### Usage example |
45 | 46 |
|
46 |
| - Usage example: |
47 | 47 | ```python
|
48 | 48 | from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
|
49 | 49 | from haystack.dataclasses import ChatMessage
|
@@ -86,44 +86,39 @@ def __init__(
|
86 | 86 | """
|
87 | 87 | Initializes the HuggingFaceLocalChatGenerator component.
|
88 | 88 |
|
89 |
| - :param model: The name or path of a Hugging Face model for text generation, |
90 |
| - for example, `mistralai/Mistral-7B-Instruct-v0.2`, `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`, etc. |
91 |
| - The important aspect of the model is that it should be a chat model and that it supports ChatML messaging |
| 89 | + :param model: The Hugging Face text generation model name or path, |
| 90 | + for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. |
| 91 | + The model must be a chat model supporting the ChatML messaging |
92 | 92 | format.
|
93 |
| - If the model is also specified in the `huggingface_pipeline_kwargs`, this parameter will be ignored. |
94 |
| - :param task: The task for the Hugging Face pipeline. |
95 |
| - Possible values are "text-generation" and "text2text-generation". |
96 |
| - Generally, decoder-only models like GPT support "text-generation", |
97 |
| - while encoder-decoder models like T5 support "text2text-generation". |
98 |
| - If the task is also specified in the `huggingface_pipeline_kwargs`, this parameter will be ignored. |
99 |
| - If not specified, the component will attempt to infer the task from the model name, |
100 |
| - calling the Hugging Face Hub API. |
101 |
| - :param device: The device on which the model is loaded. If `None`, the default device is automatically |
102 |
| - selected. If a device/device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. |
| 93 | + If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. |
| 94 | + :param task: The task for the Hugging Face pipeline. Possible options: |
| 95 | + - `text-generation`: Supported by decoder models, like GPT. |
| 96 | + - `text2text-generation`: Supported by encoder-decoder models, like T5. |
| 97 | + If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. |
| 98 | + If not specified, the component calls the Hugging Face API to infer the task from the model name. |
| 99 | + :param device: The device for loading the model. If `None`, automatically selects the default device. |
| 100 | + If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. |
103 | 101 | :param token: The token to use as HTTP bearer authorization for remote files.
|
104 |
| - If the token is also specified in the `huggingface_pipeline_kwargs`, this parameter will be ignored. |
105 |
| - :param chat_template: This optional parameter allows you to specify a Jinja template for formatting chat |
106 |
| - messages. While high-quality and well-supported chat models typically include their own chat templates |
107 |
| - accessible through their tokenizer, there are models that do not offer this feature. For such scenarios, |
108 |
| - or if you wish to use a custom template instead of the model's default, you can use this parameter to |
109 |
| - set your preferred chat template. |
110 |
| - :param generation_kwargs: A dictionary containing keyword arguments to customize text generation. |
111 |
| - Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`, etc. |
| 102 | + If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. |
| 103 | + :param chat_template: Specifies an optional Jinja template for formatting chat |
| 104 | + messages. Most high-quality chat models have their own templates, but for models without this |
| 105 | + feature or if you prefer a custom template, use this parameter. |
| 106 | + :param generation_kwargs: A dictionary with keyword arguments to customize text generation. |
| 107 | + Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. |
112 | 108 | See Hugging Face's documentation for more information:
|
113 | 109 | - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
|
114 | 110 | - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
|
115 |
| - - The only generation_kwargs we set by default is max_new_tokens, which is set to 512 tokens. |
116 |
| - :param huggingface_pipeline_kwargs: Dictionary containing keyword arguments used to initialize the |
| 111 | + The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. |
| 112 | + :param huggingface_pipeline_kwargs: Dictionary with keyword arguments to initialize the |
117 | 113 | Hugging Face pipeline for text generation.
|
118 | 114 | These keyword arguments provide fine-grained control over the Hugging Face pipeline.
|
119 | 115 | In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
|
120 |
| - See Hugging Face's [documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task) |
121 |
| - for more information on the available kwargs. |
| 116 | + For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). |
122 | 117 | In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
|
123 |
| - :param stop_words: A list of stop words. If any one of the stop words is generated, the generation is stopped. |
124 |
| - If you provide this parameter, you should not specify the `stopping_criteria` in `generation_kwargs`. |
| 118 | + :param stop_words: A list of stop words. If the model generates a stop word, the generation stops. |
| 119 | + If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. |
125 | 120 | For some chat models, the output includes both the new text and the original prompt.
|
126 |
| - In these cases, it's important to make sure your prompt has no stop words. |
| 121 | + In these cases, make sure your prompt has no stop words. |
127 | 122 | :param streaming_callback: An optional callable for handling streaming responses.
|
128 | 123 | """
|
129 | 124 | torch_and_transformers_import.check()
|
@@ -240,7 +235,7 @@ def run(self, messages: List[ChatMessage], generation_kwargs: Optional[Dict[str,
|
240 | 235 | """
|
241 | 236 | Invoke text generation inference based on the provided messages and generation parameters.
|
242 | 237 |
|
243 |
| - :param messages: A list of ChatMessage instances representing the input messages. |
| 238 | + :param messages: A list of ChatMessage objects representing the input messages. |
244 | 239 | :param generation_kwargs: Additional keyword arguments for text generation.
|
245 | 240 | :returns:
|
246 | 241 | A list containing the generated responses as ChatMessage instances.
|
|
0 commit comments