Documentation | Evaluations

Evaluations

EvaluationsResource

Methods

client.evaluations.create() ->

Evaluation

post/v5/evaluations

Create Evaluation

Parameters

data:

Iterable

[Dict[str,

object

]]

Items to be evaluated

name: str

description: Optional[str]

files: Optional[

Iterable

[Dict[str, str]]]

Files to be associated to the evaluation

metadata: Optional[Dict[str,

object

]]

Optional metadata key-value pairs for the evaluation

tags: Optional[

SequenceNotStr

[str]]

The tags associated with the entity

tasks: Optional[

Iterable

[

EvaluationTaskParam

]]

Tasks allow you to augment and evaluate your data

class ChatCompletionEvaluationTask: ...

class GenericInferenceEvaluationTask: ...

class ApplicationVariantV1EvaluationTask: ...

class AgentexOutputEvaluationTask: ...

class MetricEvaluationTask: ...

class AutoEvaluationQuestionTask: ...

class AutoEvaluationGuidedDecodingEvaluationTask: ...

class AutoEvaluationAgentEvaluationTask: ...

class ContributorEvaluationQuestionTask: ...

Returns

Evaluation

id: str

created_at:

datetime

(format: date-time)

created_by:

Identity

The identity that created the entity.

datasets: List[

Dataset

]

name: str

status: Literal["failed", "completed", "running"]

tags: List[str]

The tags associated with the entity

archived_at: Optional[datetime]

(format: date-time)

description: Optional[str]

error_count: Optional[int]

Number of items with task errors. Only populated on GET single evaluation.

metadata: Optional[Dict[str,

object

]]

Metadata key-value pairs for the evaluation

object: Optional[Literal["evaluation"]]

(default: "evaluation")

progress: Optional[

Progress

]

Progress of the evaluation's underlying async job

status_reason: Optional[str]

Reason for evaluation status

tasks: Optional[List[

EvaluationTask

]]

Tasks executed during evaluation. Populated with optional task view.

Request example

import os
from scale_gp_beta import SGPClient

client = SGPClient(
    api_key=os.environ.get("SGP_API_KEY"),  # This is the default and can be omitted
)
evaluation = client.evaluations.create(
    data=[{
        "foo": "bar"
    }],
    name="name",
)
print(evaluation.id)

200Example

{
  "id": "id",
  "created_at": "2019-12-27T18:11:19.117Z",
  "created_by": {
    "id": "id",
    "type": "user",
    "object": "identity"
  },
  "datasets": [
    {
      "id": "id",
      "created_at": "2019-12-27T18:11:19.117Z",
      "created_by": {
        "id": "id",
        "type": "user",
        "object": "identity"
      },
      "current_version_num": 0,
      "name": "name",
      "tags": [
        "string"
      ],
      "archived_at": "2019-12-27T18:11:19.117Z",
      "description": "description",
      "object": "dataset"
    }
  ],
  "name": "name",
  "status": "failed",
  "tags": [
    "string"
  ],
  "archived_at": "2019-12-27T18:11:19.117Z",
  "description": "description",
  "error_count": 0,
  "metadata": {
    "foo": "bar"
  },
  "object": "evaluation",
  "progress": {
    "items": {
      "failed": 0,
      "pending": 0,
      "successful": 0,
      "total": 0,
      "failed_items": [
        {
          "item_id": "item_id",
          "error": "error",
          "error_type": "error_type"
        }
      ]
    },
    "workflows": {
      "completed": 0,
      "failed": 0,
      "pending": 0,
      "total": 0
    }
  },
  "status_reason": "status_reason",
  "tasks": [
    {
      "configuration": {
        "messages": [
          {
            "foo": "bar"
          }
        ],
        "model": "model",
        "audio": {
          "foo": "bar"
        },
        "frequency_penalty": 0,
        "function_call": {
          "foo": "bar"
        },
        "functions": [
          {
            "foo": "bar"
          }
        ],
        "logit_bias": {
          "foo": 0
        },
        "logprobs": true,
        "max_completion_tokens": 0,
        "max_tokens": 0,
        "metadata": {
          "foo": "string"
        },
        "modalities": [
          "string"
        ],
        "n": 0,
        "parallel_tool_calls": true,
        "prediction": {
          "foo": "bar"
        },
        "presence_penalty": 0,
        "reasoning_effort": "reasoning_effort",
        "response_format": {
          "foo": "bar"
        },
        "seed": 0,
        "stop": "stop",
        "store": true,
        "temperature": 0,
        "tool_choice": "tool_choice",
        "tools": [
          {
            "foo": "bar"
          }
        ],
        "top_k": 0,
        "top_logprobs": 0,
        "top_p": 0
      },
      "alias": "alias",
      "task_type": "chat_completion"
    }
  ]
}