Skip to content

Commit fda0c94

Browse files
committed
Support multiple datasets from MBPP; Fix missing commas in python list; Fix doc typos;
1 parent 6116c6a commit fda0c94

File tree

2 files changed

+17
-9
lines changed

2 files changed

+17
-9
lines changed

bigcode_eval/tasks/multiple.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141
LANGUAGES = [
4242
"py",
4343
"sh",
44-
"clj"
44+
"clj",
4545
"cpp",
4646
"cs",
4747
"d",
@@ -53,7 +53,7 @@
5353
"js",
5454
"jl",
5555
"lua",
56-
"ml"
56+
"ml",
5757
"pl",
5858
"php",
5959
"r",
@@ -71,13 +71,19 @@ def create_all_tasks():
7171
:return: {task_name: task}
7272
e.g. {multiple-py: Task, multiple-java: Task}
7373
"""
74-
return {f"multiple-{language}": create_task(language) for language in LANGUAGES}
74+
# The root dataset is HumanEval
75+
tasks = {f"multiple-{language}": create_task("humaneval", language) for language in LANGUAGES}
76+
77+
# The root dataset is MBPP
78+
for language in LANGUAGES:
79+
tasks[f"multiple-{language}-mbpp"] = create_task("mbpp", language)
7580

81+
return tasks
7682

77-
def create_task(language):
83+
def create_task(source, language):
7884
class MultiPLE(GeneralMultiPLE):
7985
def __init__(self):
80-
super().__init__(language)
86+
super().__init__(source, language)
8187

8288
return MultiPLE
8389

@@ -91,9 +97,9 @@ class GeneralMultiPLE(Task):
9197
DATASET_NAME = None
9298
DATASET_REVISION = "ff5c146da05f10bc69b9ce393b77f381b3825d1b"
9399

94-
def __init__(self, language):
100+
def __init__(self, source, language):
95101
self.language = language
96-
self.DATASET_NAME = f"humaneval-{language}"
102+
self.DATASET_NAME = f"{source}-{language}"
97103
# we need the dataset to get stop words for each language
98104
self.dataset = load_dataset(
99105
GeneralMultiPLE.DATASET_PATH,
@@ -194,3 +200,5 @@ def process_results(self, generations, references):
194200
if k <= len(generations[0])
195201
}
196202
return results
203+
204+
print(create_all_tasks().keys())

docs/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ For [StarChat-Beta](https://huggingface.co/HuggingFaceH4/starchat-beta) for exam
184184
[MBPP](https://huggingface.co/datasets/mbpp): consists of around 1,000 crowd-sourced Python programming problems,
185185
designed to be solvable by entry-level programmers. Each problem consists of a task description in English, a code solution and 3 automated test cases. We evaluate on the test set of samples from index 11 to 511.
186186

187-
* Prompts and generation: We use a few-shot setting in InCoder style prompt: we feed the prompt to the model as a doctring and only include one solution, to help the model catch the function name which is required in the unit tests.
187+
* Prompts and generation: We use a few-shot setting in InCoder style prompt: we feed the prompt to the model as a doctring and only include one test case, to help the model catch the function name which is required in the unit tests.
188188
```python
189189
prompt = f'"""\n{description}\n{test_example}\n"""\n'
190190
```
@@ -207,7 +207,7 @@ accelerate launch main.py \
207207
Low temperatures generally work better for small $k$ in pass@k.
208208

209209
### MBPP+
210-
[MBPP+](https://huggingface.co/datasets/evalplus/mbppplus): MBPP with additional unit tests (35x of the original MBPP) for each of the 164 problems.
210+
[MBPP+](https://huggingface.co/datasets/evalplus/mbppplus): MBPP with additional unit tests (35x of the original MBPP) for each of the problems.
211211

212212
The generation and evaluation follows the same approach as [MBPP](#mbpp). One only needs to change the task name to `mbppplus` to run the evaluation on MBPP+, such as:
213213

0 commit comments

Comments
 (0)