Starting kit
A starter kit with an end-to-end submission flow can be found here: Starting kit
Please join us on Discord for discussions and up-to-date announcements: Discord
Open evaluation datasets
| Dataset | Dimension | Source |
| CommonsenseQA | Knowledge | https://www.tau-nlp.sites.tau.ac.il/commonsenseqa |
| BIG-Bench Hard | Reasoning | https://github.com/suzgunmirac/BIG-Bench-Hard |
| GSM8K | Math | https://github.com/openai/grade-school-math |
| LongBench | Long-Context | https://github.com/THUDM/LongBench |
| HumanEval | Programming | https://github.com/openai/human-eval |
| TruthfulQA | Knowledge | https://github.com/sylinrl/TruthfulQA |
| CHID | Language | https://github.com/chujiezheng/ChID-Dataset |