v2.1.0
Release v2.1.0
What's changed
Added features:
-
We added Reward and LLM-as-a-Judge to our task family
- Reward allows you to write a custom function that scores the prediction, without requiring groundtruth
- LLM-as-a-Judge allows you to deligate the task of scoring a prediction to a Judge-LLM, optionally accepting groundtruth
-
Changes to CAPO, to make it applicable to the new tasks:
- CAPO now accepts input parameter "check_fs_accuracy" (default True) - in case of reward tasks the accuracy cannot be evaluated, so we will take the prediction of the downstream_llm as target of fs.
- CAPO also accepts "create_fs_reasoning" (default is True): if set to false, just use input-output pairs from df_few_shots
-
introduces tag-extraction function, to centralize repeated code for extractions like "
5 "
Further changes:
- We now utilize mypy for automated type checking
- core functionalities of classification task has been moved to base task to prevent code duplication for other tasks
- test coverage is now boosted to >90%
Full Changelog: here