|
19 | 19 | @component
|
20 | 20 | class AzureOpenAIGenerator(OpenAIGenerator):
|
21 | 21 | """
|
22 |
| - A Generator component that uses OpenAI's large language models (LLMs) on Azure to generate text. |
| 22 | + Generates text using OpenAI's large language models (LLMs). |
23 | 23 |
|
24 |
| - It supports gpt-4 and gpt-3.5-turbo family of models. |
| 24 | + It works with the gpt-4 and gpt-3.5-turbo family of models. |
| 25 | + You can customize how the text is generated by passing parameters to the |
| 26 | + OpenAI API. Use the `**generation_kwargs` argument when you initialize |
| 27 | + the component or when you run it. Any parameter that works with |
| 28 | + `openai.ChatCompletion.create` will work here too. |
25 | 29 |
|
26 |
| - Users can pass any text generation parameters valid for the `openai.ChatCompletion.create` method |
27 |
| - directly to this component via the `**generation_kwargs` parameter in __init__ or the `**generation_kwargs` |
28 |
| - parameter in `run` method. |
29 | 30 |
|
30 |
| - For more details on OpenAI models deployed on Azure, refer to the Microsoft |
31 |
| - [documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/). |
| 31 | + For details on OpenAI API parameters, see |
| 32 | + [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). |
| 33 | +
|
| 34 | +
|
| 35 | + ### Usage example |
32 | 36 |
|
33 |
| - Usage example: |
34 | 37 | ```python
|
35 | 38 | from haystack.components.generators import AzureOpenAIGenerator
|
36 | 39 | from haystack.utils import Secret
|
@@ -69,38 +72,40 @@ def __init__(
|
69 | 72 | """
|
70 | 73 | Initialize the Azure OpenAI Generator.
|
71 | 74 |
|
72 |
| - :param azure_endpoint: The endpoint of the deployed model, e.g. `https://example-resource.azure.openai.com/` |
73 |
| - :param api_version: The version of the API to use. Defaults to 2023-05-15 |
| 75 | + :param azure_endpoint: The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. |
| 76 | + :param api_version: The version of the API to use. Defaults to 2023-05-15. |
74 | 77 | :param azure_deployment: The deployment of the model, usually the model name.
|
75 | 78 | :param api_key: The API key to use for authentication.
|
76 |
| - :param azure_ad_token: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id) |
77 |
| - :param organization: The Organization ID, defaults to `None`. See |
78 |
| - [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). |
79 |
| - :param streaming_callback: A callback function that is called when a new token is received from the stream. |
80 |
| - The callback function accepts StreamingChunk as an argument. |
81 |
| - :param system_prompt: The prompt to use for the system. If not provided, the system prompt will be |
82 |
| - :param timeout: The timeout to be passed to the underlying `AzureOpenAI` client, if not set it is |
83 |
| - inferred from the `OPENAI_TIMEOUT` environment variable or set to 30. |
84 |
| - :param max_retries: Maximum retries to establish contact with AzureOpenAI if it returns an internal error, |
85 |
| - if not set it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. |
86 |
| - :param generation_kwargs: Other parameters to use for the model. These parameters are all sent directly to |
87 |
| - the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for |
| 79 | + :param azure_ad_token: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). |
| 80 | + :param organization: Your organization ID, defaults to `None`. For help, see |
| 81 | + [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). |
| 82 | + :param streaming_callback: A callback function called when a new token is received from the stream. |
| 83 | + It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) |
| 84 | + as an argument. |
| 85 | + :param system_prompt: The system prompt to use for text generation. If not provided, the Generator |
| 86 | + omits the system prompt and uses the default system prompt. |
| 87 | + :param timeout: Timeout for AzureOpenAI client. If not set, it is inferred from the |
| 88 | + `OPENAI_TIMEOUT` environment variable or set to 30. |
| 89 | + :param max_retries: Maximum retries to establish contact with AzureOpenAI if it returns an internal error. |
| 90 | + If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. |
| 91 | + :param generation_kwargs: Other parameters to use for the model, sent directly to |
| 92 | + the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for |
88 | 93 | more details.
|
89 | 94 | Some of the supported parameters:
|
90 | 95 | - `max_tokens`: The maximum number of tokens the output text can have.
|
91 |
| - - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. |
| 96 | + - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. |
92 | 97 | Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
|
93 | 98 | - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
|
94 |
| - considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens |
| 99 | + considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens |
95 | 100 | comprising the top 10% probability mass are considered.
|
96 |
| - - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, |
97 |
| - it will generate two completions for each of the three prompts, ending up with 6 completions in total. |
| 101 | + - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, |
| 102 | + the LLM will generate two completions per prompt, resulting in 6 completions total. |
98 | 103 | - `stop`: One or more sequences after which the LLM should stop generating tokens.
|
99 |
| - - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean |
100 |
| - the model will be less likely to repeat the same token in the text. |
101 |
| - - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. |
102 |
| - Bigger values mean the model will be less likely to repeat the same token in the text. |
103 |
| - - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the |
| 104 | + - `presence_penalty`: The penalty applied if a token is already present. |
| 105 | + Higher values make the model less likely to repeat the token. |
| 106 | + - `frequency_penalty`: Penalty applied if a token has already been generated. |
| 107 | + Higher values make the model less likely to repeat the token. |
| 108 | + - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the |
104 | 109 | values are the bias to add to that token.
|
105 | 110 | """
|
106 | 111 | # We intentionally do not call super().__init__ here because we only need to instantiate the client to interact
|
|
0 commit comments