Add Bedrock/Anthropic prompt caching #2560

larryhudson · 2025-08-14T19:17:23Z

DouweM

Thanks @larryhudson, I've left some notes on the implementation. I'll review the tests and examples once the implementation itself has stablized.

DouweM · 2025-08-15T16:01:54Z

pydantic_ai_slim/pydantic_ai/messages.py

@@ -449,7 +461,7 @@ class ToolReturn:


 # Ideally this would be a Union of types, but Python 3.9 requires it to be a string, and strings don't work with `isinstance``.
-MultiModalContentTypes = (ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent)
+MultiModalContentTypes = (ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent, CachePoint)


I don't think this should be added here, as cache points are not multi-modal content

Done, thanks

DouweM · 2025-08-15T16:09:20Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+                if isinstance(item, CachePoint):
+                    if last_block is not None:
+                        # Add cache_control to the previous content block
+                        last_block['cache_control'] = {'type': 'ephemeral'}


If a ModelRequest has two UserPromptParts, or a SystemPromptPart followed by a UserPromptPart, and the cache point is the first item on the second part, I think we should apply this to the last part of the first part. That means we should probably do this one level up in _map_message. We can allow _map_user_prompt to yield a CachePoint directly and handle it in a special way.

That will also allow putting this after a ToolReturnPart, and having it be added to the tool return part itself.

This would also simplify the example because you can always just add a new UserPromptPart with a CachePoint instead of adding it to the existing one and having to check whether it's a str etc.

Done, I've implemented this pattern in models/bedrock.py and will apply the same logic in models/anthropic.py. I think handling the case when the UserPromptPart starts with a CachePoint, immediately following the SystemPromptPart, makes the code a little hard to follow! So open to any ideas to make it clearer / simpler for the user

DouweM · 2025-08-15T16:11:49Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+    cache_creation_tokens = details.get('cache_creation_input_tokens', 0)
+    cache_read_tokens = details.get('cache_read_input_tokens', 0)
+
+    request_tokens = input_tokens + cache_creation_tokens + cache_read_tokens

    return usage.Usage(


Can we leave the Usage changes off for now and do them in a followup PR? We just did some refactoring there: #2378 so I'll have to check how this corresponds to the other places we're going to be using the usage data.

Done, thanks

DouweM · 2025-08-15T16:12:49Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

+        profile = self._provider.model_profile(self.model_name)
+        if isinstance(profile, BedrockModelProfile):
+            return profile.bedrock_supports_prompt_caching
+        return False


We should use BedrockModelProfile.from_profile(self.profile).bedrock_supports_prompt_caching. We can inline that where we use it and drop this helper method

Done, have cleaned it up

pydantic_ai_slim/pydantic_ai/providers/bedrock.py

DouweM · 2025-08-16T00:38:53Z

@larryhudson Just a heads-up that I'll be out this coming week and will be back the 25th. Assuming this is not urgent I'll review it then. If it is, please ping Kludex! Appreciate the patience :)

larryhudson · 2025-08-16T18:07:55Z

@larryhudson Just a heads-up that I'll be out this coming week and will be back the 25th. Assuming this is not urgent I'll review it then. If it is, please ping Kludex! Appreciate the patience :)

Thanks @DouweM, I've spent some time on this today and will spend a little more time cleaning it up. All good to wait until you're back, have a good time off!

pydantic_ai_slim/pydantic_ai/models/bedrock.py

rany2 · 2025-09-01T14:42:46Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

@@ -669,9 +731,14 @@ def model_name(self) -> str:
        return self._model_name

    def _map_usage(self, metadata: ConverseStreamMetadataEventTypeDef) -> usage.RequestUsage:
+        print('DEBUG: raw usage', metadata['usage'])
+
        return usage.RequestUsage(
            input_tokens=metadata['usage']['inputTokens'],


Same remark as above.

DouweM · 2025-09-01T20:50:41Z

examples/pydantic_ai_examples/test_prompt_caching.py

@@ -0,0 +1,479 @@
+#!/usr/bin/env python3
+"""Example script to test prompt caching with AWS Bedrock.


Could (a version of) this be under tests instead of examples?

Yep I can do that, would you want me to rewrite the file so that it is a bunch of tests, following the patterns in the tests directory?

@larryhudson Yep, we can't merge this PR without 100% test coverage, so you can use this as a starting point to write the tests, or just write them from scratch.

DouweM · 2025-09-01T20:53:04Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

@@ -387,7 +389,15 @@ async def _map_message(self, messages: list[ModelMessage]) -> tuple[str, list[Be
                        system_prompt_parts.append(request_part.content)
                    elif isinstance(request_part, UserPromptPart):
                        async for content in self._map_user_prompt(request_part):
-                            user_content_params.append(content)
+                            if isinstance(content, dict) and content.get('type') == 'ephemeral':


I'd rather have _map_user_prompt yield the CachePoint itself so we don't have to do dict introspection here

Done, good idea

DouweM · 2025-09-01T20:53:20Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

-                            user_content_params.append(content)
+                            if isinstance(content, dict) and content.get('type') == 'ephemeral':
+                                if user_content_params:
+                                    # TODO(larryhudson): Ensure the last user content param supports cache_control


We should raise a UserError if there was no previous part to add this to

Done, this is now done in '_add_cache_content_to_last_param'

DouweM · 2025-09-01T21:58:30Z

pydantic_ai_slim/pydantic_ai/providers/bedrock.py

+
+
+# Supported models: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
+ANTHROPIC_CACHING_SUPPORTED_MODELS = ['claude-3-5-sonnet', 'claude-3-5-haiku', 'claude-3-7-sonnet', 'claude-sonnet-4']


Let's make this and the next one sets

Suggested change

ANTHROPIC_CACHING_SUPPORTED_MODELS = ['claude-3-5-sonnet', 'claude-3-5-haiku', 'claude-3-7-sonnet', 'claude-sonnet-4']

ANTHROPIC_CACHING_SUPPORTED_MODELS = {'claude-3-5-sonnet', 'claude-3-5-haiku', 'claude-3-7-sonnet', 'claude-sonnet-4'}

DouweM · 2025-09-01T21:59:08Z

pydantic_ai_slim/pydantic_ai/providers/bedrock.py

+    if any(supported in model_name for supported in ANTHROPIC_CACHING_SUPPORTED_MODELS):
+        return BedrockModelProfile(
+            bedrock_supports_tool_choice=False,
+            bedrock_send_back_thinking_parts=True,


Instead of duplicating these 2 fields, can we build just one BedrockModelProfile with bedrock_supports_prompt_caching=any(supported in model_name for supported in ANTHROPIC_CACHING_SUPPORTED_MODELS)

DouweM · 2025-09-01T22:01:01Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

                    if isinstance(part, SystemPromptPart):
                        system_prompt.append({'text': part.content})
                    elif isinstance(part, UserPromptPart):
-                        bedrock_messages.extend(await self._map_user_prompt(part, document_count))
+                        # Handle case where UserPromptPart starts with a CachePoint and follows the SystemPromptPart
+                        cache_point_for_system_prompt, user_prompt_part = self._extract_leading_cache_point(


Instead of extracting, could we have _map_user_prompt return tuple[leading_cache_point: bool, list[MessageUnionTypeDef]]?

Done, much better!

DouweM · 2025-09-01T22:01:50Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

+
+        if (
+            immediately_follows_system_prompt
+            and isinstance(part.content, list)


part.content could also itself be a CachePoint right? So maybe we should check if isinstance(list(part.content)[0], CachePoint)

rany2 · 2025-09-03T15:35:51Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

+                if isinstance(item, CachePoint):
+                    if supports_caching:
+                        # TODO: update the boto3 bedrock type defs so 'cachePoint' is available
+                        content.append({'cachePoint': {'type': 'default'}})


Something worth noting about cachePoint in Bedrock is that BinaryContent should always be first and user prompt should always be last. This is because otherwise, the cachePoint we add to the messages ends up yielding the following error: The model returned the following errors: messages.0.content.2.type: Field required.

I'm not sure if Pydantic AI should workaround this issue by rearranging the order of the messages to ensure that BinaryContent is always first. This issue was really hard for me to fix and figure out so I think it might be worth doing this, as you can see the error AWS returns is extremely vague and there's no information about it online. I reported it to AWS and they simply responded by telling me that prompt caching works for them and refused to actually look into it properly :/

@rany2 Hmm, reordering the messages would be very unexpected I think. Can you show me the payload that triggers that error? Because the BinaryContent blocks do have a type field unlike the error message suggests:

pydantic-ai/pydantic_ai_slim/pydantic_ai/models/anthropic.py

Lines 496 to 509 in 7e8ebec

if item.is_image:

yield BetaImageBlockParam(

source={'data': io.BytesIO(item.data), 'media_type': item.media_type, 'type': 'base64'}, # type: ignore

type='image',

)

elif item.media_type == 'application/pdf':

yield BetaBase64PDFBlockParam(

source=BetaBase64PDFSourceParam(

data=io.BytesIO(item.data),

media_type='application/pdf',

type='base64',

),

type='document',

)

@DouweM This reproduces the issue I'm describing.

import asyncio import os from pydantic_ai import Agent, BinaryContent from pydantic_ai.messages import CachePoint from pydantic_ai.models.bedrock import BedrockConverseModel from pydantic_ai.providers.bedrock import BedrockProvider agent = Agent( model=BedrockConverseModel( model_name=os.environ["MODEL"], provider=BedrockProvider( region_name=os.environ["AWS_REGION"], aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"], aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"], ), ), ) async def amain(): # This works. response = await agent.run( [ "What is 2+2? Provide the answer only.", CachePoint(), ] ) print(response) # This also works. response = await agent.run( [ BinaryContent( data="What is 2+2? Provide the answer only.", media_type="text/plain", ), "Process the attached text file. Return the answer only.", CachePoint(), ] ) print(response) # botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: # The model returned the following errors: messages.0.content.2.type: Field required response = await agent.run( [ "Process the attached text file. Return the answer only.", BinaryContent( data="What is 2+2? Provide the answer only.", media_type="text/plain", ), CachePoint(), ] ) print(response) def main(): asyncio.run(amain()) if __name__ == "__main__": main()

To be clear, this is only an issue when you use CachePoint(). Without prompt caching these all work just fine.

@rany2 The error message is incorrect then, right, because item at index 2 (the cache point) does have a type: {'cachePoint': {'type': 'default'}}.

@rany2 Can you please share the request payload that reproduces this? I'll see if I can get this to the attention of someone at Amazon.

Returning a more useful error message sounds like the best option for now

@DouweM This results in a 400 response when sent to the converse endpoint:

{ "messages": [ { "role": "user", "content": [ { "text": "Process the attached text file. Return the answer only." }, { "document": { "name": "Document 1", "format": "txt", "source": { "bytes": "V2hhdCBpcyAyKzI/IFByb3ZpZGUgdGhlIGFuc3dlciBvbmx5Lg==" } } }, { "cachePoint": { "type": "default" } } ] } ], "system": [], "inferenceConfig": {} }

@rany2 Thanks, I've sent it to our contacts at Amazon, let's see what they say.

Thanks both :) I haven't done anything regarding binary content yet - will wait for the response from Amazon

This is the response I got :

I’ve raised this with service team, they’re now looking into that. In the meantime, leaving the text/ document sequence as is and adding another text right before the cachePoint could be a potential workaround:

[ {"text": "Good example:"}, {"document": {"name": "Doc_1", ...}} {"text": "Bad example:"}, {"document": {"name": "Doc_2", ...}}, {"text": "Which is the good example?"}, {"cachePoint": {"type": "default"}} ]

I wonder if the API is happy if we just include an empty {"text": ""}, that'd be worth a try!

DouweM · 2025-09-05T17:56:05Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

                    if isinstance(part, SystemPromptPart):
                        system_prompt.append({'text': part.content})
                    elif isinstance(part, UserPromptPart):
-                        bedrock_messages.extend(await self._map_user_prompt(part, document_count))
+                        # Handle case where UserPromptPart starts with a CachePoint and follows the SystemPromptPart


If there is no SystemPromptPart, I suppose we should add a leading CachePoint to the toolConfig like in the "tools checkpoint" example here: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html

larryhudson · 2025-09-06T16:48:33Z

Hey @DouweM sorry for the delay on this. I've gone through and taken in your changes. I've also updated the boto3 + mypy boto3 runtime dependencies so that 'CachePoint' works as expected. I think this is ready for another review when you have time

DouweM · 2025-09-08T17:10:09Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

        tools = self._get_tools(model_request_parameters)
        if not tools:
            return None

+        if should_add_cache_point:
+            tools[-1]['cachePoint'] = {'type': 'default'}


This is causing test_bedrock_anthropic_tool_with_thinking to fail with this error:

Invalid number of parameters set for tagged union structure toolConfig.tools[0]. Can only set one of the following keys: toolSpec, cachePoint.

It sounds like the cachePoint has to be a separate item in tools, not a field on the last item.

DouweM · 2025-09-08T17:11:17Z

examples/pydantic_ai_examples/test_prompt_caching.py

@@ -0,0 +1,479 @@
+#!/usr/bin/env python3
+"""Example script to test prompt caching with AWS Bedrock.


@larryhudson Yep, we can't merge this PR without 100% test coverage, so you can use this as a starting point to write the tests, or just write them from scratch.

DouweM · 2025-09-08T17:12:50Z

pydantic_ai_slim/pydantic_ai/_otel_messages.py

@@ -46,7 +46,13 @@ class ThinkingPart(TypedDict):
    content: NotRequired[str]


-MessagePart: TypeAlias = 'TextPart | ToolCallPart | ToolCallResponsePart | MediaUrlPart | BinaryDataPart | ThinkingPart'
+class CachePointPart(TypedDict):


I don't think we need this here, as it mentions at the top of the file, these are supposed to match the OpenTelemetry GenAI spec, which I don't think has cache points.

DouweM · 2025-09-08T17:13:22Z

pydantic_ai_slim/pydantic_ai/messages.py

@@ -637,6 +649,8 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
                if settings.include_content and settings.include_binary_content:
                    converted_part['content'] = base64.b64encode(part.data).decode()
                parts.append(converted_part)
+            elif isinstance(part, CachePoint):
+                parts.append(_otel_messages.CachePointPart(type=part.kind))


See above, I think we should skip it instead -- unless you've confirmed that the OTel spec does have cache points.

DouweM · 2025-09-08T17:13:48Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+    | BetaImageBlockParam
+    | BetaToolResultBlockParam
+)
+"""Content block parameter types that support cache_control."""


Can you please link to the doc this came from?

DouweM · 2025-09-08T17:20:35Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

-        tool_config = self._map_tool_config(model_request_parameters)
+        tool_config = self._map_tool_config(
+            model_request_parameters,
+            should_add_cache_point=(


This is now always adding a cache point, independent of has_leading_cache_point. I think we have map_messages return that value if it wasn't already used by adding to the system prompt, and then use it here. That may require some reordering.

DouweM · 2025-09-08T17:21:55Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

+                            part, document_count, profile.bedrock_supports_prompt_caching
+                        )
+                        if has_leading_cache_point:
+                            system_prompt.append({'cachePoint': {'type': 'default'}})


We should only add it to the system prompt if this is the first ModelRequest -- if there were previous ModelResponses from the assistant, we should add it to the last assistant message, I think.

DouweM · 2025-09-08T17:23:54Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

+                elif isinstance(item, CachePoint):
+                    if supports_caching:
+                        content.append({'cachePoint': {'type': 'default'}})
+                    continue


I don't think we need this continue

DouweM · 2025-09-08T17:24:49Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

        if isinstance(part.content, str):
            content.append({'text': part.content})
        else:
-            for item in part.content:
+            if part.content and isinstance(part.content[0], CachePoint):


You can move this to the isinstance(item, CachePoint) below by changing for item in part.content to for i, item in enumerate(part.content) and then checking if i == 0 to know if it was the first element

DouweM · 2025-09-08T17:25:28Z

pydantic_ai_slim/pydantic_ai/models/bedrock.py

        if isinstance(part.content, str):
            content.append({'text': part.content})
        else:
-            for item in part.content:
+            if part.content and isinstance(part.content[0], CachePoint):
+                has_leading_cache_point = True


Are we sure that bedrock needs this special behavior of moving this to the system prompt? It doesn't support a cachePoint at the start of user content?

larryhudson force-pushed the larryhudson/add-bedrock-prompt-caching branch from accb936 to 84c745c Compare August 14, 2025 19:40

DouweM self-assigned this Aug 14, 2025

DouweM requested changes Aug 15, 2025

View reviewed changes

DouweM added the awaiting author revision label Aug 15, 2025

DouweM mentioned this pull request Aug 15, 2025

Anthropic prompt caching (inc. Anthropic on Bedrock) #1041

Open

larryhudson force-pushed the larryhudson/add-bedrock-prompt-caching branch 11 times, most recently from ef70980 to ac6a93c Compare August 16, 2025 17:59

rany2 suggested changes Sep 1, 2025

View reviewed changes

DouweM requested changes Sep 1, 2025

View reviewed changes

rany2 reviewed Sep 3, 2025

View reviewed changes

DouweM reviewed Sep 5, 2025

View reviewed changes

larryhudson force-pushed the larryhudson/add-bedrock-prompt-caching branch 4 times, most recently from f6c4db7 to 300871f Compare September 6, 2025 16:42

Add Bedrock/Anthropic prompt caching

a2222e1

larryhudson force-pushed the larryhudson/add-bedrock-prompt-caching branch from 300871f to a2222e1 Compare September 6, 2025 16:45

larryhudson requested a review from DouweM September 6, 2025 16:48

DouweM requested changes Sep 8, 2025

View reviewed changes

		@@ -0,0 +1,479 @@
		#!/usr/bin/env python3
		"""Example script to test prompt caching with AWS Bedrock.



		# Supported models: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
		ANTHROPIC_CACHING_SUPPORTED_MODELS = ['claude-3-5-sonnet', 'claude-3-5-haiku', 'claude-3-7-sonnet', 'claude-sonnet-4']

	ANTHROPIC_CACHING_SUPPORTED_MODELS = ['claude-3-5-sonnet', 'claude-3-5-haiku', 'claude-3-7-sonnet', 'claude-sonnet-4']
	ANTHROPIC_CACHING_SUPPORTED_MODELS = {'claude-3-5-sonnet', 'claude-3-5-haiku', 'claude-3-7-sonnet', 'claude-sonnet-4'}

	if item.is_image:
	yield BetaImageBlockParam(
	source={'data': io.BytesIO(item.data), 'media_type': item.media_type, 'type': 'base64'}, # type: ignore
	type='image',
	)
	elif item.media_type == 'application/pdf':
	yield BetaBase64PDFBlockParam(
	source=BetaBase64PDFSourceParam(
	data=io.BytesIO(item.data),
	media_type='application/pdf',
	type='base64',
	),
	type='document',
	)

Add Bedrock/Anthropic prompt caching #2560

Are you sure you want to change the base?

Add Bedrock/Anthropic prompt caching #2560

Uh oh!

Conversation

larryhudson commented Aug 14, 2025 • edited by DouweM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DouweM commented Aug 16, 2025

Uh oh!

larryhudson commented Aug 16, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

larryhudson commented Aug 14, 2025 •

edited by DouweM

Loading