-
Notifications
You must be signed in to change notification settings - Fork 81
support ShareGPT dataset as data file #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f8c2231
to
d246dee
Compare
1cf7e56
to
e98bd0e
Compare
This seems external to the GuideLLM. Can you please move all code and documentation to |
1d840bc
to
a347948
Compare
Done |
Sorry I forgot about this PR due to the sudden flurry of new PRs. Can you also move the changes in |
Signed-off-by: guangli.bao <[email protected]>
394f505
to
d904a7e
Compare
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the requirements.txt supposed to include all dependencies? I had to install datasets
and transformers
for it to work.
It may be beneficial to also note that you need to run it with the HF_TOKEN value set.
Once I addressed these it appears to have worked.
# except special characters | ||
not re.search(r"[<>{}[\]\\]", prompt_text) | ||
and not prompt_text.isdigit() | ||
): # except pure numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment belongs above the line that's above it.
Summary
Details
I hope data file can support ShareGPT as benchmark test data such as: ShareGPT_V3_unfiltered_cleaned_split.json; In this PR, user can abstract testing prompts from origin file and filter human prompts (10 < words < 1000) to save into local file, refer to:
Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##
)