Skip to content

Conversation

awksrj
Copy link
Contributor

@awksrj awksrj commented Jul 30, 2025

Description

This PR adds a code cell to tree_of_attacks_with_pruning.ipynb to demonstrate an image target example and modifies tree_of_attacks.py to adapt the Tree of Attacks orchestrator for image targets, particularly adding a dictionary to map content_policy_violation errors to an objective_score of 0.0, which ensure nodes are kept in the completed nodes list until the branch width limit is exceeded to prevent premature pruning.

Related Issue

Closes: #585

Tests and Documentation

No tests included in this commit.

@romanlutz romanlutz self-assigned this Jul 30, 2025
@@ -45,6 +46,60 @@
await result.print_conversation_async() # type: ignore
print(result.tree_visualization)

# %% [markdown]
# ## Image Target Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for using the TreeOfAttacksWithPruningAttack 🫡 We likely do not want to merge this doc update as it stands right now because we are working on updating all documentation to use Attack > Orchestrator and this change will mix the two techniques.

If you would like to showcase that this functionality works with image, I'd suggest simply calling the TreeOfAttacksWithPruningOrchestrator that is in the first code block with an objective_target=dalle_target to validate that it works as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to execute the code in the Jupyter Notebook, you should run jupytext --execute --to notebook ./doc/code/orchestrators/tree_of_attacks_with_pruning.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterpoint:
... aren't we going to switch everything to the attack one shortly anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so now that we have the notebook under executor/attacks we can probably migrate these changes there. @awksrj can you do that?

@@ -482,6 +484,29 @@ async def _score_response_async(self, *, response: PromptRequestResponse, object
Higher scores indicate more successful attacks and influence which branches
the TAP algorithm explores in subsequent iterations.
"""
response_piece = response.request_pieces[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite clear to me how this change adds image support for TAP - could you elaborate? I see in your description that by adding a dictionary to map scores with errors that this "ensures nodes are kept in the completed nodes list until the branch width limit is exceeded to prevent premature pruning". Was the issue previously with images that the nodes would be removed prematurely when set against image targets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any branch that encounters errors is pruned. @awksrj found that this can prune all branches if the initial prompts are too aggressive. Unlike crescendo, it cannot correct later if it has no branches left. Our idea here is to keep them with the error (eg content filter) as score 0 and then they only get pruned if we exceed the width. Getting blocked is actually useful feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm missing something but if the idea is that the score is always 0 for these errors, why do we need a map ? can't we just keep track of the error strings themselves in a set ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it depends on the type of error. Content filters would be non-prune. connection rejected would not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahwestra25 @nina-msft does this make sense?

@@ -482,6 +484,29 @@ async def _score_response_async(self, *, response: PromptRequestResponse, object
Higher scores indicate more successful attacks and influence which branches
the TAP algorithm explores in subsequent iterations.
"""
response_piece = response.request_pieces[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to validate the behavior change through a unit test?

" AttackScoringConfig,\n",
")\n",
"from pyrit.attacks.multi_turn.tree_of_attacks import (\n",
" TreeOfAttacksWithPruningAttack as TAPAttack,\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just leave it as TreeOfAttacksWithPruningAttack rather than using an alias to be pedantic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically, there is an alias TAPAttack already (see last line of the file that defines the attack). You just need to import it.

" )\n",
")\n",
"\n",
"tap = TAPAttack(\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would rename this variable to tree_of_attacks_with_pruning_attack to follow our general naming schema

@awksrj
Copy link
Contributor Author

awksrj commented Aug 1, 2025

Thanks for all the comments. I'll go through them and push changes soon!

@awksrj
Copy link
Contributor Author

awksrj commented Aug 6, 2025

I added two unit tests to cover the pruning logic, ensuring blocked responses are scored as 0.0 and pruning only occurs when we exceed tree_width. I also updated the example in tree_of_attacks_with_pruning.py, which used to show how the old TreeOfAttacksWithPruningOrchestrator worked with text targets. I replaced it with the new TAPAttack class to reflect the current implementation, which hopefully makes the documentation more complete.

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the maintainers should run the notebook as well once it exists. Just to make sure we aren't missing anything

self.objective_score = Score(
score_value=str(assigned_score), # Convert float to string
score_value_description=f"Assigned score {assigned_score} for {response_piece.response_error} response",
score_type="float_scale", # Adjust if ScoreType is an enum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
score_type="float_scale", # Adjust if ScoreType is an enum
score_type="float_scale",

It can't be

@@ -211,6 +211,7 @@ def __init__(
memory_labels: Optional[dict[str, str]] = None,
parent_id: Optional[str] = None,
prompt_normalizer: Optional[PromptNormalizer] = None,
error_score_map: dict[str, float],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be optional but have a reasonable default. That is

Suggested change
error_score_map: dict[str, float],
error_score_map: Optional[dict[str, float]] = None

And if None is passed (default) then we just catch content filter errors and make them 0

def test_prune_blocked_nodes_with_score_zero(self, basic_attack, node_factory, helpers):
"""Test that nodes with 'blocked' response are assigned objective_score=0 and only pruned when width exceeded."""
# Configure error_score_map to assign 0.0 for blocked responses
basic_attack._error_score_map = {"blocked": 0.0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the AttackBuilder at the top of this file to support the new arg rather than editing the object after creation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEAT: Add Image functionality to TAP
4 participants