[FR]: New Evaluaton Metric "LLM Sycophancy" (SycEval)

### Proposal summary

We like to extend the existing LLM-as-a-judge evaluation metrics to include a new judge metric called "Sycophancy". Full paper with the methodology and prompts can be found here: https://arxiv.org/pdf/2502.08177

Example of an existing judge metric (Hallucination) is defined here:
- Docs: https://www.comet.com/docs/opik/evaluation/metrics/hallucination
- Docs Code: https://github.com/comet-ml/opik/blob/main/apps/opik-documentation/documentation/fern/docs/evaluation/metrics/hallucination.mdx
- Python SDK: https://github.com/comet-ml/opik/tree/main/sdks/python/src/opik/evaluation/metrics/llm_judges/hallucination
- Python Examples: https://github.com/comet-ml/opik/blob/main/sdks/python/examples/metrics.py
- Frontend: https://github.com/comet-ml/opik/blob/main/apps/opik-frontend/src/constants/llm.ts

Expectation is the new judge is added to the frontend for using LLM-as-a-judge from the UI (Online Evaluation tab) as well as in the Python SDK. The appropriate docs needs to be updated and a video attached of the metric working.

### Motivation

I would like to see more robust set of metrics and evaluations based on recent research

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FR]: New Evaluaton Metric "LLM Sycophancy" (SycEval) #2520

Proposal summary

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FR]: New Evaluaton Metric "LLM Sycophancy" (SycEval) #2520

Description

Proposal summary

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions