Skip to content

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Aug 20, 2025

Related Issues

Proposed Changes:

  • reasoning refinements

    • store reasoning in ChatMessage content instead of meta
    • changes to make reasoning be correctly returned when using streaming
    • support different thinking levels (for GPT-oss)
  • refactoring: since we remove reasoning from meta, this is breaking change and we need to release a major version, so I took the opportunity to do some refactoring

    • change default model in Chat Generator: orca-mini is very outdated, while qwen3:0.6b is a small model, capable of tool calling and reasoning
    • refactor Chat Generator tests: better structure, parametrized several tests to reduce duplication
    • 5x smaller embedding model for tests: reduces download and inference times

How did you test it?

CI

I have manually verified and written tests for:

  • reasoning with multi-turn conversations
  • reasoning with streaming
  • reasoning with tool calls

Checklist

@github-actions github-actions bot added integration:ollama type:documentation Improvements or additions to documentation labels Aug 20, 2025
@anakin87 anakin87 changed the title Refine ollama thinking chore: OllamaChatGenerator - refine reasoning support Aug 20, 2025
@anakin87 anakin87 marked this pull request as ready for review August 20, 2025 11:29
@anakin87 anakin87 requested a review from a team as a code owner August 20, 2025 11:29
@anakin87 anakin87 requested review from sjrl and removed request for a team August 20, 2025 11:29
@anakin87 anakin87 changed the title chore: OllamaChatGenerator - refine reasoning support refactor!: OllamaChatGenerator - refine reasoning support + refactoring Aug 20, 2025
Comment on lines +176 to +181
content=content,
meta=meta,
index=index,
finish_reason=finish_reason,
component_info=component_info,
tool_calls=tool_calls_list,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized we aren't utilizing the start in StreamingChunk. Any reason for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we set start in _handle_streaming_response but only for tool_calls. It should also be set to True for the beginnings of normal text blocks and I can't immediately tell if that's happening or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now set start

@anakin87 anakin87 requested a review from sjrl August 22, 2025 14:03
Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@anakin87 anakin87 merged commit 312f02a into main Aug 25, 2025
7 checks passed
@anakin87 anakin87 deleted the refine-ollama-thinking branch August 25, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:ollama topic:CI type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ollama - refine reasoning support
2 participants