Skip to content

v2.1.0

Release v2.1.0

What's changed

Added features:

  • We added Reward and LLM-as-a-Judge to our task family

    • Reward allows you to write a custom function that scores the prediction, without requiring groundtruth
    • LLM-as-a-Judge allows you to deligate the task of scoring a prediction to a Judge-LLM, optionally accepting groundtruth
  • Changes to CAPO, to make it applicable to the new tasks:

    • CAPO now accepts input parameter "check_fs_accuracy" (default True) - in case of reward tasks the accuracy cannot be evaluated, so we will take the prediction of the downstream_llm as target of fs.
    • CAPO also accepts "create_fs_reasoning" (default is True): if set to false, just use input-output pairs from df_few_shots
  • introduces tag-extraction function, to centralize repeated code for extractions like "5"

Further changes:

  • We now utilize mypy for automated type checking
  • core functionalities of classification task has been moved to base task to prevent code duplication for other tasks
  • test coverage is now boosted to >90%

Full Changelog: here