FEAT: Add Image functionality to TAP #1036

awksrj · 2025-07-30T15:52:29Z

Description

This PR adds a code cell to tree_of_attacks_with_pruning.ipynb to demonstrate an image target example and modifies tree_of_attacks.py to adapt the Tree of Attacks orchestrator for image targets, particularly adding a dictionary to map content_policy_violation errors to an objective_score of 0.0, which ensure nodes are kept in the completed nodes list until the branch width limit is exceeded to prevent premature pruning.

Related Issue

Closes: #585

Tests and Documentation

No tests included in this commit.

nina-msft · 2025-07-30T23:05:07Z

doc/code/orchestrators/tree_of_attacks_with_pruning.py

@@ -45,6 +46,60 @@
 await result.print_conversation_async()  # type: ignore
 print(result.tree_visualization)

+# %% [markdown]
+# ## Image Target Example


Thanks for using the TreeOfAttacksWithPruningAttack 🫡 We likely do not want to merge this doc update as it stands right now because we are working on updating all documentation to use Attack > Orchestrator and this change will mix the two techniques.

If you would like to showcase that this functionality works with image, I'd suggest simply calling the TreeOfAttacksWithPruningOrchestrator that is in the first code block with an objective_target=dalle_target to validate that it works as expected.

In order to execute the code in the Jupyter Notebook, you should run jupytext --execute --to notebook ./doc/code/orchestrators/tree_of_attacks_with_pruning.py

Counterpoint:
... aren't we going to switch everything to the attack one shortly anyway?

Ok so now that we have the notebook under executor/attacks we can probably migrate these changes there. @awksrj can you do that?

nina-msft · 2025-07-30T23:10:47Z

pyrit/attacks/multi_turn/tree_of_attacks.py

@@ -482,6 +484,29 @@ async def _score_response_async(self, *, response: PromptRequestResponse, object
            Higher scores indicate more successful attacks and influence which branches
            the TAP algorithm explores in subsequent iterations.
        """
+        response_piece = response.request_pieces[0]


It's not quite clear to me how this change adds image support for TAP - could you elaborate? I see in your description that by adding a dictionary to map scores with errors that this "ensures nodes are kept in the completed nodes list until the branch width limit is exceeded to prevent premature pruning". Was the issue previously with images that the nodes would be removed prematurely when set against image targets?

Any branch that encounters errors is pruned. @awksrj found that this can prune all branches if the initial prompts are too aggressive. Unlike crescendo, it cannot correct later if it has no branches left. Our idea here is to keep them with the error (eg content filter) as score 0 and then they only get pruned if we exceed the width. Getting blocked is actually useful feedback.

maybe I'm missing something but if the idea is that the score is always 0 for these errors, why do we need a map ? can't we just keep track of the error strings themselves in a set ?

Well, it depends on the type of error. Content filters would be non-prune. connection rejected would not.

@hannahwestra25 @nina-msft does this make sense?

nina-msft · 2025-07-30T23:11:25Z

pyrit/attacks/multi_turn/tree_of_attacks.py

@@ -482,6 +484,29 @@ async def _score_response_async(self, *, response: PromptRequestResponse, object
            Higher scores indicate more successful attacks and influence which branches
            the TAP algorithm explores in subsequent iterations.
        """
+        response_piece = response.request_pieces[0]


are we able to validate the behavior change through a unit test?

hannahwestra25 · 2025-07-31T15:12:42Z

doc/code/orchestrators/tree_of_attacks_with_pruning.ipynb

+    "    AttackScoringConfig,\n",
+    ")\n",
+    "from pyrit.attacks.multi_turn.tree_of_attacks import (\n",
+    "    TreeOfAttacksWithPruningAttack as TAPAttack,\n",


I think we should just leave it as TreeOfAttacksWithPruningAttack rather than using an alias to be pedantic

Ironically, there is an alias TAPAttack already (see last line of the file that defines the attack). You just need to import it.

hannahwestra25 · 2025-07-31T15:15:28Z

doc/code/orchestrators/tree_of_attacks_with_pruning.ipynb

+    "    )\n",
+    ")\n",
+    "\n",
+    "tap = TAPAttack(\n",


Also would rename this variable to tree_of_attacks_with_pruning_attack to follow our general naming schema

awksrj · 2025-08-01T16:13:42Z

Thanks for all the comments. I'll go through them and push changes soon!

…PyRIT into feature/tap-image-target

awksrj · 2025-08-06T15:19:38Z

I added two unit tests to cover the pruning logic, ensuring blocked responses are scored as 0.0 and pruning only occurs when we exceed tree_width. I also updated the example in tree_of_attacks_with_pruning.py, which used to show how the old TreeOfAttacksWithPruningOrchestrator worked with text targets. I replaced it with the new TAPAttack class to reflect the current implementation, which hopefully makes the documentation more complete.

romanlutz

One of the maintainers should run the notebook as well once it exists. Just to make sure we aren't missing anything

romanlutz · 2025-08-31T03:35:09Z

pyrit/attacks/multi_turn/tree_of_attacks.py

+            self.objective_score = Score(
+                score_value=str(assigned_score),  # Convert float to string
+                score_value_description=f"Assigned score {assigned_score} for {response_piece.response_error} response",
+                score_type="float_scale",  # Adjust if ScoreType is an enum


Suggested change

score_type="float_scale", # Adjust if ScoreType is an enum

score_type="float_scale",

It can't be

romanlutz · 2025-08-31T18:28:45Z

pyrit/attacks/multi_turn/tree_of_attacks.py

@@ -211,6 +211,7 @@ def __init__(
        memory_labels: Optional[dict[str, str]] = None,
        parent_id: Optional[str] = None,
        prompt_normalizer: Optional[PromptNormalizer] = None,
+        error_score_map: dict[str, float],


This should be optional but have a reasonable default. That is

Suggested change

error_score_map: dict[str, float],

error_score_map: Optional[dict[str, float]] = None

And if None is passed (default) then we just catch content filter errors and make them 0

romanlutz · 2025-08-31T18:35:46Z

tests/unit/attacks/test_tree_of_attacks.py

+    def test_prune_blocked_nodes_with_score_zero(self, basic_attack, node_factory, helpers):
+        """Test that nodes with 'blocked' response are assigned objective_score=0 and only pruned when width exceeded."""
+        # Configure error_score_map to assign 0.0 for blocked responses
+        basic_attack._error_score_map = {"blocked": 0.0}


We should update the AttackBuilder at the top of this file to support the new arg rather than editing the object after creation

awksrj and others added 4 commits June 21, 2025 01:43

ran pre commit successfully

0231945

Merge remote-tracking branch 'upstream/main'

98aa9ac

update tap and add code cell to tap notebook

d8d686f

Merge branch 'main' into feature/tap-image-target

26ff5c1

romanlutz self-assigned this Jul 30, 2025

nina-msft reviewed Jul 30, 2025

View reviewed changes

hannahwestra25 reviewed Jul 31, 2025

View reviewed changes

awksrj added 2 commits August 6, 2025 11:13

add unit tests and update tap notebook

9e328da

Merge branch 'feature/tap-image-target' of https://github.com/awksrj/…

055565a

…PyRIT into feature/tap-image-target

romanlutz reviewed Aug 31, 2025

View reviewed changes

resolved comments

0d8bc70

	score_type="float_scale", # Adjust if ScoreType is an enum
	score_type="float_scale",

	error_score_map: dict[str, float],
	error_score_map: Optional[dict[str, float]] = None

FEAT: Add Image functionality to TAP #1036

Are you sure you want to change the base?

FEAT: Add Image functionality to TAP #1036

Conversation

awksrj commented Jul 30, 2025

Description

Related Issue

Tests and Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awksrj commented Aug 1, 2025

Uh oh!

awksrj commented Aug 6, 2025

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!