Evaluations

EvaluationsResource

Methods

create() ->
post/v5/evaluations

Create Evaluation

list() -> SyncCursorPage[]
get/v5/evaluations

List Evaluations

retrieve(, ) ->
get/v5/evaluations/{evaluation_id}

Get Evaluation

archive() ->
delete/v5/evaluations/{evaluation_id}

Archive Evaluation

update(, ) ->
patch/v5/evaluations/{evaluation_id}

Update or Restore Evaluation

retrieve_schema(, ) ->
get/v5/evaluations/{evaluation_id}/schema

Get schema information for evaluation item data, including field names, types, and occurrence counts.

filter() -> SyncCursorPage[]
post/v5/evaluations/filter

Filter evaluations using metadata and other criteria. Supports up to 10 filters with AND logic.

Parameters
filters: Iterable[]

List of metadata filters to apply (maximum 10)

key: str

The metadata key to filter on

operator: Literal["==", "!=", ">=", "<=", "IN", "NOT_IN"]

The comparison operator to use

value: str

The value to compare against (string for all types)

object: Optional[Literal["metadata_filter"]]
(default: "metadata_filter")
ending_before: Optional[str]
include_archived: Optional[]
limit: Optional[int]
(maximum: 10000, minimum: 1, default: 100)
sort_by: Optional[str]
sort_order: Optional[]
starting_after: Optional[str]
views: Optional[List[]]
(default: [])
"tasks"
Returns
id: str
created_at:
(format: date-time)
created_by:

The identity that created the entity.

datasets: List[]
name: str
status: Literal["failed", "completed", "running"]
tags: List[str]

The tags associated with the entity

archived_at: Optional[datetime]
(format: date-time)
description: Optional[str]
error_count: Optional[int]

Number of task errors across all items in this evaluation.

metadata: Optional[Dict[str, ]]

Metadata key-value pairs for the evaluation

object: Optional[Literal["evaluation"]]
(default: "evaluation")
progress: Optional[EvaluationTasksProgressSchema]

Progress of the evaluation's underlying async job

status_reason: Optional[str]

Reason for evaluation status

tasks: Optional[List[]]

Tasks executed during evaluation. Populated with optional task view.

Request example
200Example
retrieve_taxonomy() ->
get/v5/evaluations/{evaluation_id}/taxonomy

Get taxonomy JSON for contributor evaluation question tasks.

Domain types

class AutoEvaluationAgentTaskRequestWithItemLocator: ...
class Evaluation: ...
Dict[str, ]
class EvaluationSchemaResponse: ...

Schema information for an evaluation's item data structure

class EvaluationTasksProgressSchema: ...
Literal["tasks"]
str
str
class PaginatedListEvaluation: ...

EvaluationsResource.TasksResource

Methods

add(, ) ->
post/v5/evaluations/{evaluation_id}/tasks

Add a new test criteria (LLM judge, contributor question, etc.) to an existing evaluation. Gated: rejected if any contributor annotation task has been claimed or completed. Kicks off the evaluation workflow so the new task runs against existing items.

update(, ) ->
patch/v5/evaluations/{evaluation_id}/tasks/{alias}

Replace a single test criteria's configuration, identified by its alias. Gated: rejected if any contributor annotation task for the evaluation has been claimed or completed.