Skip to content

[FR]: New Evaluaton Metric "LLM Sycophancy" (SycEval) #2520

@vincentkoc

Description

@vincentkoc

Proposal summary

We like to extend the existing LLM-as-a-judge evaluation metrics to include a new judge metric called "Sycophancy". Full paper with the methodology and prompts can be found here: https://arxiv.org/pdf/2502.08177

Example of an existing judge metric (Hallucination) is defined here:

Expectation is the new judge is added to the frontend for using LLM-as-a-judge from the UI (Online Evaluation tab) as well as in the Python SDK. The appropriate docs needs to be updated and a video attached of the metric working.

Motivation

I would like to see more robust set of metrics and evaluations based on recent research

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions