Starting kit

A starter kit with an end-to-end submission flow can be found here: Starting kit

Please join us on Discord for discussions and up-to-date announcements: Discord

Open evaluation datasets

Dataset	Dimension	Source
CommonsenseQA	Knowledge	https://www.tau-nlp.sites.tau.ac.il/commonsenseqa
BIG-Bench Hard	Reasoning	https://github.com/suzgunmirac/BIG-Bench-Hard
GSM8K	Math	https://github.com/openai/grade-school-math
LongBench	Long-Context	https://github.com/THUDM/LongBench
HumanEval	Programming	https://github.com/openai/human-eval
TruthfulQA	Knowledge	https://github.com/sylinrl/TruthfulQA
CHID	Language	https://github.com/chujiezheng/ChID-Dataset