support ShareGPT dataset as data file #305

tukwila · 2025-09-08T04:46:25Z

Summary

Details

I hope data file can support ShareGPT as benchmark test data such as: ShareGPT_V3_unfiltered_cleaned_split.json; In this PR, user can abstract testing prompts from origin file and filter human prompts (10 < words < 1000) to save into local file, refer to:

[ ]

Test Plan

Related Issues

Resolves #

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

tukwila · 2025-09-09T10:10:17Z

related to:#138

https://github.com/LMCache/LMCache/tree/dev/benchmarks/multi-round-qa

sjmonson · 2025-09-10T18:18:42Z

This seems external to the GuideLLM. Can you please move all code and documentation to /contrib/sharegpt_preprocess.

tukwila · 2025-09-12T01:39:23Z

This seems external to the GuideLLM. Can you please move all code and documentation to /contrib/sharegpt_preprocess.

Done

sjmonson · 2025-09-15T16:05:52Z

Sorry I forgot about this PR due to the sudden flurry of new PRs. Can you also move the changes in docs/datasets.md to contrib/sharegpt_preprocess/README.md.

Signed-off-by: guangli.bao <[email protected]>

tukwila · 2025-09-16T03:55:04Z

Sorry I forgot about this PR due to the sudden flurry of new PRs. Can you also move the changes in docs/datasets.md to contrib/sharegpt_preprocess/README.md.

Done

jaredoconnell

Is the requirements.txt supposed to include all dependencies? I had to install datasets and transformers for it to work.

It may be beneficial to also note that you need to run it with the HF_TOKEN value set.

Once I addressed these it appears to have worked.

jaredoconnell · 2025-09-16T04:25:16Z

contrib/sharegpt_preprocess/preprocessing_sharegpt_data.py

+                        # except special characters
+                        not re.search(r"[<>{}[\]\\]", prompt_text)
+                        and not prompt_text.isdigit()
+                    ):  # except pure numbers


I think this comment belongs above the line that's above it.

tukwila changed the title ~~support ShareGPT dataset as data file~~ draft: support ShareGPT dataset as data file Sep 8, 2025

tukwila force-pushed the support_sharegpt branch from f8c2231 to d246dee Compare September 8, 2025 06:30

tukwila changed the title ~~draft: support ShareGPT dataset as data file~~ support ShareGPT dataset as data file Sep 8, 2025

tukwila force-pushed the support_sharegpt branch 3 times, most recently from 1cf7e56 to e98bd0e Compare September 9, 2025 04:21

tukwila force-pushed the support_sharegpt branch from 1d840bc to a347948 Compare September 12, 2025 01:38

support ShareGPT dataset as data file

d904a7e

Signed-off-by: guangli.bao <[email protected]>

tukwila force-pushed the support_sharegpt branch from 394f505 to d904a7e Compare September 16, 2025 03:54

jaredoconnell reviewed Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support ShareGPT dataset as data file #305

support ShareGPT dataset as data file #305

tukwila commented Sep 8, 2025 •

edited

Loading

Uh oh!

tukwila commented Sep 9, 2025 •

edited

Loading

Uh oh!

sjmonson commented Sep 10, 2025

Uh oh!

tukwila commented Sep 12, 2025

Uh oh!

sjmonson commented Sep 15, 2025

Uh oh!

tukwila commented Sep 16, 2025

Uh oh!

jaredoconnell left a comment

Uh oh!

jaredoconnell Sep 16, 2025

Uh oh!

Uh oh!

support ShareGPT dataset as data file #305

Are you sure you want to change the base?

support ShareGPT dataset as data file #305

Conversation

tukwila commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

tukwila commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjmonson commented Sep 10, 2025

Uh oh!

tukwila commented Sep 12, 2025

Uh oh!

sjmonson commented Sep 15, 2025

Uh oh!

tukwila commented Sep 16, 2025

Uh oh!

jaredoconnell left a comment

Choose a reason for hiding this comment

Uh oh!

jaredoconnell Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tukwila commented Sep 8, 2025 •

edited

Loading

tukwila commented Sep 9, 2025 •

edited

Loading