Skip to content

Commit 8e61161

Browse files
authored
fix(llmobs): [MLOB-3863] openai agents support reasoning messages (#14522)
While QA-ing our agentic integrations, I noticed a bug in the current Open AI Agent's output message parsing. The content field on OpenAI's `ResponseReasoningItem` is optional ([ref](https://github.com/openai/openai-python/blob/2adf11112988e998fcf5adb805bae38501d22318/src/openai/types/responses/response_reasoning_item.py#L27-L51)) which we were not handling properly leading to potential `NoneType` errors like the one below: ``` File "/Users/nicole.cybul/go/src/github.com/DataDog/dd-trace-py/ddtrace/llmobs/_integrations/utils.py", line 1161, in llmobs_output_messages for content in item.content: ^^^^^^^^^^^^ TypeError: 'NoneType' object is not iterable ``` While investigating this issue, I noticed that our OpenAI integration had a nearly identical implementation for parsing input and output messages, so I decided to extract the common logic into a helper function that could be reused across the OpenAI and OpenAI Agents integrations. This fixed the `NoneType` error since now we do not try to iterate over `item.content` for reasoning message types. ## Manual Testing Here is a [trace](https://app.datadoghq.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&sp=%5B%7B%22sp%22%3A%7B%22width%22%3A%22min%28100%25%2C%20max%28calc%28100%25%20-%20var%28--ui-page-left-offset%29%20-%2016px%29%2C%20900px%29%29%22%7D%2C%22p%22%3A%7B%22eventId%22%3A%22AwAAAZkvInhdwYmtigAAABhBWmt2SW5oZEFBRExtZktfRkZzZEFBQUEAAAAkMDE5OTJmMjMtOTQxOC00OTliLTg1ZWYtYjc5ZjBmNGU5YjkzAABCrQ%22%7D%2C%22i%22%3A%22llm-obs-panel%22%7D%5D&spanId=9743349247239047563&start=1757431678372&end=1757432578372&paused=false) that I created with this feature branch which can be compared with this [trace](https://app.datadoghq.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&sp=%5B%7B%22sp%22%3A%7B%22width%22%3A%22min%28100%25%2C%20max%28calc%28100%25%20-%20var%28--ui-page-left-offset%29%20-%2016px%29%2C%20900px%29%29%22%7D%2C%22p%22%3A%7B%22eventId%22%3A%22AwAAAZkvJpU3H4D3EQAAABhBWmt2SnBVM0FBQlh3NUdHTDY2SEFBQUEAAAAkZjE5OTJmMjYtY2MyMy00YTQxLThlZWQtZWJlNmZlNWYyOTE4AAAI6g%22%7D%2C%22i%22%3A%22llm-obs-panel%22%7D%5D&spanId=10719453409381675278&start=1757431833983&end=1757432733983&paused=false) from the main branch. | | Before | After | |---|---|---| | OpenAI Agents LLM spans were missing output messages due to the `NoneType` error). | <img width="1056" height="992" alt="image" src="https://github.com/user-attachments/assets/3e94b71f-6d39-4044-af4d-bba901e3f850" /> | <img width="1236" height="994" alt="image" src="https://github.com/user-attachments/assets/266b4085-14d4-487e-946d-a5ea0af990f6" /> | | Tool Results for OpenAI Agents LLM spans were not being captured | <img width="542" height="116" alt="image" src="https://github.com/user-attachments/assets/8829ae88-add1-415c-964e-c2253478a595" /> | <img width="1206" height="104" alt="image" src="https://github.com/user-attachments/assets/26e829ff-efc8-4228-a289-1c1f316411f9" /> | | Spans were not being linked properly due to missing output messages | <img width="1690" height="864" alt="image" src="https://github.com/user-attachments/assets/b7c290a1-7474-4b7a-8cab-fca1e75732f5" /> | <img width="548" height="1104" alt="image" src="https://github.com/user-attachments/assets/b5542829-784d-4780-b33c-3a9843753bcb" /> | ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
1 parent 2b8bd63 commit 8e61161

File tree

4 files changed

+116
-132
lines changed

4 files changed

+116
-132
lines changed

ddtrace/llmobs/_integrations/openai_agents.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
from ddtrace.llmobs._utils import _get_nearest_llmobs_ancestor
3838
from ddtrace.llmobs._utils import _get_span_name
3939
from ddtrace.llmobs._utils import load_data_value
40+
from ddtrace.llmobs._utils import safe_json
4041
from ddtrace.trace import Span
4142

4243

@@ -232,13 +233,13 @@ def _llmobs_set_response_attributes(self, span: Span, oai_span: OaiSpanAdapter)
232233
if oai_span.response and oai_span.response.output:
233234
messages, tool_call_outputs = oai_span.llmobs_output_messages()
234235

235-
for tool_id, tool_name, tool_args in tool_call_outputs:
236+
for tool_call_output in tool_call_outputs:
236237
core.dispatch(
237238
DISPATCH_ON_LLM_TOOL_CHOICE,
238239
(
239-
tool_id,
240-
tool_name,
241-
tool_args,
240+
tool_call_output["tool_id"],
241+
tool_call_output["name"],
242+
safe_json(tool_call_output["arguments"]),
242243
{
243244
"trace_id": format_trace_id(span.trace_id),
244245
"span_id": str(span.span_id),

ddtrace/llmobs/_integrations/utils.py

Lines changed: 61 additions & 117 deletions
Original file line numberDiff line numberDiff line change
@@ -550,13 +550,35 @@ def openai_get_input_messages_from_response_input(
550550
Returns:
551551
- A list of processed messages
552552
"""
553+
processed, _ = _openai_parse_input_response_messages(messages)
554+
return processed
555+
556+
557+
def _openai_parse_input_response_messages(
558+
messages: Optional[Union[str, List[Dict[str, Any]]]], system_instructions: Optional[str] = None
559+
) -> Tuple[List[Dict[str, Any]], List[str]]:
560+
"""
561+
Parses input messages from the openai responses api into a list of processed messages
562+
and a list of tool call IDs.
563+
564+
Args:
565+
messages: A list of output messages
566+
567+
Returns:
568+
- A list of processed messages
569+
- A list of tool call IDs
570+
"""
553571
processed: List[Dict[str, Any]] = []
572+
tool_call_ids: List[str] = []
573+
574+
if system_instructions:
575+
processed.append({"role": "system", "content": system_instructions})
554576

555577
if not messages:
556-
return processed
578+
return processed, tool_call_ids
557579

558580
if isinstance(messages, str):
559-
return [{"role": "user", "content": messages}]
581+
return [{"role": "user", "content": messages}], tool_call_ids
560582

561583
for item in messages:
562584
processed_item: Dict[str, Union[str, List[ToolCall], List[ToolResult]]] = {}
@@ -574,7 +596,7 @@ def openai_get_input_messages_from_response_input(
574596
processed_item["role"] = item["role"]
575597
elif "call_id" in item and ("arguments" in item or "input" in item):
576598
# Process `ResponseFunctionToolCallParam` or ResponseCustomToolCallParam type from input messages
577-
arguments_str = item.get("arguments", "{}") or item.get("input", "{}")
599+
arguments_str = item.get("arguments", "") or item.get("input", OAI_HANDOFF_TOOL_ARG)
578600
arguments = safe_load_json(arguments_str)
579601

580602
tool_call_info = ToolCall(
@@ -585,7 +607,7 @@ def openai_get_input_messages_from_response_input(
585607
)
586608
processed_item.update(
587609
{
588-
"role": "user",
610+
"role": "assistant",
589611
"tool_calls": [tool_call_info],
590612
}
591613
)
@@ -607,10 +629,11 @@ def openai_get_input_messages_from_response_input(
607629
"tool_results": [tool_result_info],
608630
}
609631
)
632+
tool_call_ids.append(item["call_id"])
610633
if processed_item:
611634
processed.append(processed_item)
612635

613-
return processed
636+
return processed, tool_call_ids
614637

615638

616639
def openai_get_output_messages_from_response(response: Optional[Any]) -> List[Dict[str, Any]]:
@@ -630,15 +653,33 @@ def openai_get_output_messages_from_response(response: Optional[Any]) -> List[Di
630653
if not messages:
631654
return []
632655

656+
processed_messages, _ = _openai_parse_output_response_messages(messages)
657+
658+
return processed_messages
659+
660+
661+
def _openai_parse_output_response_messages(messages: List[Any]) -> Tuple[List[Dict[str, Any]], List[ToolCall]]:
662+
"""
663+
Parses output messages from the openai responses api into a list of processed messages
664+
and a list of tool call outputs.
665+
666+
Args:
667+
messages: A list of output messages
668+
669+
Returns:
670+
- A list of processed messages
671+
- A list of tool call outputs
672+
"""
633673
processed: List[Dict[str, Any]] = []
674+
tool_call_outputs: List[ToolCall] = []
634675

635676
for item in messages:
636677
message = {}
637678
message_type = _get_attr(item, "type", "")
638679

639680
if message_type == "message":
640681
text = ""
641-
for content in _get_attr(item, "content", []):
682+
for content in _get_attr(item, "content", []) or []:
642683
text += str(_get_attr(content, "text", "") or "")
643684
text += str(_get_attr(content, "refusal", "") or "")
644685
message.update({"role": _get_attr(item, "role", "assistant"), "content": text})
@@ -656,26 +697,29 @@ def openai_get_output_messages_from_response(response: Optional[Any]) -> List[Di
656697
}
657698
)
658699
elif message_type == "function_call" or message_type == "custom_tool_call":
659-
arguments = _get_attr(item, "input", "") or _get_attr(item, "arguments", "{}")
660-
arguments = safe_load_json(arguments)
700+
call_id = _get_attr(item, "call_id", "")
701+
name = _get_attr(item, "name", "")
702+
raw_arguments = _get_attr(item, "input", "") or _get_attr(item, "arguments", OAI_HANDOFF_TOOL_ARG)
703+
arguments = safe_load_json(raw_arguments)
661704
tool_call_info = ToolCall(
662-
tool_id=_get_attr(item, "call_id", ""),
705+
tool_id=call_id,
663706
arguments=arguments,
664-
name=_get_attr(item, "name", ""),
707+
name=name,
665708
type=_get_attr(item, "type", "function"),
666709
)
710+
tool_call_outputs.append(tool_call_info)
667711
message.update(
668712
{
669713
"tool_calls": [tool_call_info],
670714
"role": "assistant",
671715
}
672716
)
673717
else:
674-
message.update({"role": "assistant", "content": "Unsupported content type: {}".format(message_type)})
718+
message.update({"content": str(item), "role": "assistant"})
675719

676720
processed.append(message)
677721

678-
return processed
722+
return processed, tool_call_outputs
679723

680724

681725
def openai_get_metadata_from_response(
@@ -1071,126 +1115,26 @@ def llmobs_input_messages(self) -> Tuple[List[Dict[str, Any]], List[str]]:
10711115
- A list of processed messages
10721116
- A list of tool call IDs for span linking purposes
10731117
"""
1074-
messages = self.input
1075-
processed: List[Dict[str, Any]] = []
1076-
tool_call_ids: List[str] = []
1077-
1078-
if self.response_system_instructions:
1079-
processed.append({"role": "system", "content": self.response_system_instructions})
1080-
1081-
if not messages:
1082-
return processed, tool_call_ids
1083-
1084-
if isinstance(messages, str):
1085-
return [{"content": messages, "role": "user"}], tool_call_ids
1086-
1087-
for item in messages:
1088-
processed_item: Dict[str, Union[str, List[Dict[str, str]]]] = {}
1089-
# Handle regular message
1090-
if "content" in item and "role" in item:
1091-
processed_item_content = ""
1092-
if isinstance(item["content"], list):
1093-
for content in item["content"]:
1094-
processed_item_content += content.get("text", "")
1095-
processed_item_content += content.get("refusal", "")
1096-
else:
1097-
processed_item_content = item["content"]
1098-
if processed_item_content:
1099-
processed_item["content"] = processed_item_content
1100-
processed_item["role"] = item["role"]
1101-
elif "call_id" in item and "arguments" in item:
1102-
"""
1103-
Process `ResponseFunctionToolCallParam` type from input messages
1104-
"""
1105-
try:
1106-
arguments = json.loads(item["arguments"])
1107-
except json.JSONDecodeError:
1108-
arguments = item["arguments"]
1109-
processed_item["tool_calls"] = [
1110-
{
1111-
"tool_id": item["call_id"],
1112-
"arguments": arguments,
1113-
"name": item.get("name", ""),
1114-
"type": item.get("type", "function_call"),
1115-
}
1116-
]
1117-
elif "call_id" in item and "output" in item:
1118-
"""
1119-
Process `FunctionCallOutput` type from input messages
1120-
"""
1121-
output = item["output"]
1122-
1123-
if isinstance(output, str):
1124-
try:
1125-
output = json.loads(output)
1126-
except json.JSONDecodeError:
1127-
output = {"output": output}
1128-
tool_call_ids.append(item["call_id"])
1129-
processed_item["role"] = "tool"
1130-
processed_item["content"] = item["output"]
1131-
processed_item["tool_id"] = item["call_id"]
1132-
if processed_item:
1133-
processed.append(processed_item)
1118+
return _openai_parse_input_response_messages(self.input, self.response_system_instructions)
11341119

1135-
return processed, tool_call_ids
1136-
1137-
def llmobs_output_messages(self) -> Tuple[List[Dict[str, Any]], List[Tuple[str, str, str]]]:
1120+
def llmobs_output_messages(self) -> Tuple[List[Dict[str, Any]], List[ToolCall]]:
11381121
"""Returns processed output messages for LLM Obs LLM spans.
11391122
11401123
Returns:
11411124
- A list of processed messages
1142-
- A list of tool call data (name, id, args) for span linking purposes
1125+
- A list of tool calls for span linking purposes
11431126
"""
11441127
if not self.response or not self.response.output:
11451128
return [], []
11461129

11471130
messages: List[Any] = self.response.output
1148-
processed: List[Dict[str, Any]] = []
1149-
tool_call_outputs: List[Tuple[str, str, str]] = []
11501131
if not messages:
1151-
return processed, tool_call_outputs
1132+
return [], []
11521133

11531134
if not isinstance(messages, list):
11541135
messages = [messages]
11551136

1156-
for item in messages:
1157-
message = {}
1158-
# Handle content-based messages
1159-
if hasattr(item, "content"):
1160-
text = ""
1161-
for content in item.content:
1162-
if hasattr(content, "text") or hasattr(content, "refusal"):
1163-
text += getattr(content, "text", "")
1164-
text += getattr(content, "refusal", "")
1165-
message.update({"role": getattr(item, "role", "assistant"), "content": text})
1166-
# Handle tool calls
1167-
elif hasattr(item, "call_id") and hasattr(item, "arguments"):
1168-
tool_call_outputs.append(
1169-
(
1170-
item.call_id,
1171-
getattr(item, "name", ""),
1172-
item.arguments if item.arguments else OAI_HANDOFF_TOOL_ARG,
1173-
)
1174-
)
1175-
message.update(
1176-
{
1177-
"tool_calls": [
1178-
{
1179-
"tool_id": item.call_id,
1180-
"arguments": (
1181-
json.loads(item.arguments) if isinstance(item.arguments, str) else item.arguments
1182-
),
1183-
"name": getattr(item, "name", ""),
1184-
"type": getattr(item, "type", "function"),
1185-
}
1186-
]
1187-
}
1188-
)
1189-
else:
1190-
message.update({"content": str(item)})
1191-
processed.append(message)
1192-
1193-
return processed, tool_call_outputs
1137+
return _openai_parse_output_response_messages(messages)
11941138

11951139
def llmobs_trace_input(self) -> Optional[str]:
11961140
"""Converts Response span data to an input value for top level trace.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
fixes:
3+
- |
4+
LLM Observability: Fixes an issue where reasoning message types were not being handled correctly in the OpenAI Agents integration, leading to
5+
output messages being dropped on LLM spans.

0 commit comments

Comments
 (0)