-
Notifications
You must be signed in to change notification settings - Fork 560
FEAT: Add Image functionality to TAP #1036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -45,6 +46,60 @@ | |||
await result.print_conversation_async() # type: ignore | |||
print(result.tree_visualization) | |||
|
|||
# %% [markdown] | |||
# ## Image Target Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for using the TreeOfAttacksWithPruningAttack
🫡 We likely do not want to merge this doc update as it stands right now because we are working on updating all documentation to use Attack > Orchestrator and this change will mix the two techniques.
If you would like to showcase that this functionality works with image, I'd suggest simply calling the TreeOfAttacksWithPruningOrchestrator
that is in the first code block with an objective_target=dalle_target
to validate that it works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to execute the code in the Jupyter Notebook, you should run jupytext --execute --to notebook ./doc/code/orchestrators/tree_of_attacks_with_pruning.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counterpoint:
... aren't we going to switch everything to the attack one shortly anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so now that we have the notebook under executor/attacks we can probably migrate these changes there. @awksrj can you do that?
@@ -482,6 +484,29 @@ async def _score_response_async(self, *, response: PromptRequestResponse, object | |||
Higher scores indicate more successful attacks and influence which branches | |||
the TAP algorithm explores in subsequent iterations. | |||
""" | |||
response_piece = response.request_pieces[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite clear to me how this change adds image support for TAP - could you elaborate? I see in your description that by adding a dictionary to map scores with errors that this "ensures nodes are kept in the completed nodes list until the branch width limit is exceeded to prevent premature pruning". Was the issue previously with images that the nodes would be removed prematurely when set against image targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any branch that encounters errors is pruned. @awksrj found that this can prune all branches if the initial prompts are too aggressive. Unlike crescendo, it cannot correct later if it has no branches left. Our idea here is to keep them with the error (eg content filter) as score 0 and then they only get pruned if we exceed the width. Getting blocked is actually useful feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm missing something but if the idea is that the score is always 0 for these errors, why do we need a map ? can't we just keep track of the error strings themselves in a set ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it depends on the type of error. Content filters would be non-prune. connection rejected would not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hannahwestra25 @nina-msft does this make sense?
@@ -482,6 +484,29 @@ async def _score_response_async(self, *, response: PromptRequestResponse, object | |||
Higher scores indicate more successful attacks and influence which branches | |||
the TAP algorithm explores in subsequent iterations. | |||
""" | |||
response_piece = response.request_pieces[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we able to validate the behavior change through a unit test?
" AttackScoringConfig,\n", | ||
")\n", | ||
"from pyrit.attacks.multi_turn.tree_of_attacks import (\n", | ||
" TreeOfAttacksWithPruningAttack as TAPAttack,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just leave it as TreeOfAttacksWithPruningAttack rather than using an alias to be pedantic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ironically, there is an alias TAPAttack already (see last line of the file that defines the attack). You just need to import it.
" )\n", | ||
")\n", | ||
"\n", | ||
"tap = TAPAttack(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would rename this variable to tree_of_attacks_with_pruning_attack to follow our general naming schema
Thanks for all the comments. I'll go through them and push changes soon! |
…PyRIT into feature/tap-image-target
I added two unit tests to cover the pruning logic, ensuring blocked responses are scored as 0.0 and pruning only occurs when we exceed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the maintainers should run the notebook as well once it exists. Just to make sure we aren't missing anything
self.objective_score = Score( | ||
score_value=str(assigned_score), # Convert float to string | ||
score_value_description=f"Assigned score {assigned_score} for {response_piece.response_error} response", | ||
score_type="float_scale", # Adjust if ScoreType is an enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score_type="float_scale", # Adjust if ScoreType is an enum | |
score_type="float_scale", |
It can't be
@@ -211,6 +211,7 @@ def __init__( | |||
memory_labels: Optional[dict[str, str]] = None, | |||
parent_id: Optional[str] = None, | |||
prompt_normalizer: Optional[PromptNormalizer] = None, | |||
error_score_map: dict[str, float], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be optional but have a reasonable default. That is
error_score_map: dict[str, float], | |
error_score_map: Optional[dict[str, float]] = None |
And if None is passed (default) then we just catch content filter errors and make them 0
def test_prune_blocked_nodes_with_score_zero(self, basic_attack, node_factory, helpers): | ||
"""Test that nodes with 'blocked' response are assigned objective_score=0 and only pruned when width exceeded.""" | ||
# Configure error_score_map to assign 0.0 for blocked responses | ||
basic_attack._error_score_map = {"blocked": 0.0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the AttackBuilder at the top of this file to support the new arg rather than editing the object after creation
Description
This PR adds a code cell to
tree_of_attacks_with_pruning.ipynb
to demonstrate an image target example and modifiestree_of_attacks.py
to adapt the Tree of Attacks orchestrator for image targets, particularly adding a dictionary to mapcontent_policy_violation
errors to anobjective_score
of 0.0, which ensure nodes are kept in the completed nodes list until the branch width limit is exceeded to prevent premature pruning.Related Issue
Closes: #585
Tests and Documentation
No tests included in this commit.