-
Notifications
You must be signed in to change notification settings - Fork 60
Description
I think there is one problem with the Benchmark.
from unitxt.benchmark import Benchmark
from unitxt.api import DatasetRecipe,load_dataset
from unitxt.inference import OpenAiInferenceEngine
benchmark = Benchmark(
# format="formats.user_agent",
format="formats.chat_api",
max_samples_per_subset=5,
loader_limit=5,
subsets={
"cola": DatasetRecipe(
card="cards.cola",
template="templates.classification.multi_class.instruction",
),
},
)
print("*****Load as BENCHMARK ")
dataset = list(benchmark()["test"])
print(dataset[0])
print(type(dataset[0]['source']))
print("*****Load as DATASET ")
dataset = load_dataset(card="cards.cola",template="templates.classification.multi_class.instruction", format="formats.chat_api", loader_limit=5,split="test")
print(dataset[0])
print(type(dataset[0]['source']))
We can see that when loading as benchmark it loads the messages as string, while when loading as a dataset it correctly loads as list of message:
*****Load as BENCHMARK
Loading limited to 5 instances by setting LoadHF.loader_limit;
{'metrics': ['metrics.matthews_correlation'], 'data_classification_policy': ['public'], 'media': {'images': [], 'audios': []}, 'postprocessors': ['processors.take_first_non_empty_line', 'processors.lower_case_till_punc'], 'target': 'acceptable', 'references': ['acceptable'], 'source': '[{"role": "system", "content": "Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable."}, {"role": "user", "content": "text: The sailors rode the breeze clear of the rocks."}]', 'task_data': '{"text": "The sailors rode the breeze clear of the rocks.", "text_type": "text", "classes": ["unacceptable", "acceptable"], "type_of_class": "grammatical acceptability", "metadata": {"data_classification_policy": ["public"], "num_demos": 0, "demos_pool_size": 0, "template": "templates.classification.multi_class.instruction"}, "label": "acceptable"}', 'groups': [], 'subset': ['cola']}
<class 'str'>
*****Load as DATASET
Loader line limit was set to 5
Generating test split: 5 examples [00:00, 1284.00 examples/s]
/Users/yoavkatz/miniforge3/envs/fme/lib/python3.10/site-packages/datasets/builder.py:1243: FutureWarning: 'ignore_verifications' was deprecated in favor of 'verification' in version 2.9.1 and will be removed in 3.0.0.
You can remove this warning by passing 'verification_mode=all_checks' instead.
warnings.warn(
{'metrics': ['metrics.matthews_correlation'], 'data_classification_policy': ['public'], 'media': {'audios': [], 'images': []}, 'postprocessors': ['processors.take_first_non_empty_line', 'processors.lower_case_till_punc'], 'target': 'acceptable', 'references': ['acceptable'], 'source': [{'role': 'system', 'content': 'Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable.'}, {'role': 'user', 'content': 'text: The sailors rode the breeze clear of the rocks.'}], 'task_data': '{"text": "The sailors rode the breeze clear of the rocks.", "text_type": "text", "classes": ["unacceptable", "acceptable"], "type_of_class": "grammatical acceptability", "metadata": {"data_classification_policy": ["public"], "num_demos": 0, "demos_pool_size": 0, "template": "templates.classification.multi_class.instruction"}, "label": "acceptable"}', 'groups': [], 'subset': []}
<class 'list'>
Metadata
Metadata
Assignees
Labels
No labels