Model Evaluation Job Submission#
Launch evaluation jobs with three options:
LLM as a Judge (LLMAJ) Evaluation - Use large language models to assess model outputs
Custom Scorer Evaluation - Apply previously defined evaluator functions
Benchmark Evaluation - Run standardized performance benchmarks