Skip to main content
Ahora la API de REST tiene control de versiones. Para obtener más información, consulta "Acerca del control de versiones de la API".

Puntos de conexión de la API de REST para la inferencia de modelos

Usa la API de REST para enviar una solicitud de finalización de chat a un modelo especificado, con o sin atribución organizativa.

Acerca de la inferencia de GitHub Models

Puedes usar la API de REST para ejecutar solicitudes de inferencia mediante la plataforma GitHub Models.

La API admite:

  • Obtener acceso a los modelos principales desde OpenAI, DeepSeek, Microsoft, Llama, etc.
  • Ejecutar solicitudes de inferencia basadas en chat con control total sobre los parámetros de muestreo y respuesta.
  • Finalización de streaming o sin streaming.
  • Seguimiento de uso y atribución de la organización.

Run an inference request attributed to an organization

This endpoint allows you to run an inference request attributed to a specific organization. You must be a member of the organization and have enabled models to use this endpoint. The token used to authenticate must have the models: read permission if using a fine-grained PAT or GitHub App minted token. The request body should contain the model ID and the messages for the chat completion request. The response will include either a non-streaming or streaming response based on the request parameters.

Parámetros para "Run an inference request attributed to an organization"

Encabezados
Nombre, Tipo, Descripción
content-type string Requerido

Setting to application/json is required.

accept string

Setting to application/vnd.github+json is recommended.

Parámetros de la ruta de acceso
Nombre, Tipo, Descripción
org string Requerido

The organization login associated with the organization to which the request is to be attributed.

Parámetros de consulta
Nombre, Tipo, Descripción
api-version string

The API version to use. Optional, but required for some features.

Parámetros del cuerpo
Nombre, Tipo, Descripción
model string Requerido

ID of the specific model to use for the request. The model ID should be in the format of {publisher}/{model_name} where "openai/gpt-4.1" is an example of a model ID. You can find supported models in the catalog/models endpoint.

messages array of objects Requerido

The collection of context messages associated with this chat completion request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles.

Nombre, Tipo, Descripción
role string Requerido

The chat role associated with this message

Puede ser uno de los siguientes: assistant, developer, system, user

content string Requerido

The content of the message

frequency_penalty number

A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2].

max_tokens integer

The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. For example, if your prompt is 100 tokens and you set max_tokens to 50, the API will return a completion with a maximum of 50 tokens.

modalities array of strings

The modalities that the model is allowed to use for the chat completions response. The default modality is text. Indicating an unsupported modality combination results in a 422 error. Supported values are: text, audio

presence_penalty number

A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new tokens. Supported range is [-2, 2].

response_format object

The desired format for the response.

Nombre, Tipo, Descripción
Object object
Nombre, Tipo, Descripción
type string

Puede ser uno de los siguientes: text, json_object

Schema for structured JSON response object Requerido
Nombre, Tipo, Descripción
type string Requerido

The type of the response.

Valor: json_schema

json_schema object Requerido

The JSON schema for the response.

seed integer

If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.

stream boolean

A value indicating whether chat completions should be streamed for this request.

Valor predeterminado: false

stream_options object

Whether to include usage information in the response. Requires stream to be set to true.

Nombre, Tipo, Descripción
include_usage boolean

Whether to include usage information in the response.

Valor predeterminado: false

stop array of strings

A collection of textual sequences that will end completion generation.

temperature number

The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completion request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Decimal values are supported.

tool_choice string

If specified, the model will configure which of the provided tools it can use for the chat completions response.

Puede ser uno de los siguientes: auto, required, none

tools array of objects

A list of tools the model may request to call. Currently, only functions are supported as a tool. The model may respond with a function call request and provide the input arguments in JSON format for that function.

Nombre, Tipo, Descripción
function object
Nombre, Tipo, Descripción
name string

The name of the function to be called.

description string

A description of what the function does. The model will use this description when selecting the function and interpreting its parameters.

parameters

The parameters the function accepts, described as a JSON Schema object.

type string

Valor: function

top_p number

An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Decimal values are supported.

Códigos de estado de respuesta HTTP para "Run an inference request attributed to an organization"

status codeDescripción
200

OK

Ejemplos de código para "Run an inference request attributed to an organization"

Ejemplo de solicitud

post/orgs/{org}/inference/chat/completions
curl -L \ -X POST \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer <YOUR-TOKEN>" \ -H "X-GitHub-Api-Version: 2022-11-28" \ -H "Content-Type: application/json" \ https://models.github.ai/orgs/ORG/inference/chat/completions \ -d '{"model":"openai/gpt-4.1","messages":[{"role":"user","content":"What is the capital of France?"}]}'

Respuesta

Status: 200
{ "choices": [ { "message": { "content": "The capital of France is Paris.", "role": "assistant" } } ] }

Run an inference request

This endpoint allows you to run an inference request. The token used to authenticate must have the models: read permission if using a fine-grained PAT or GitHub App minted token. The request body should contain the model ID and the messages for the chat completion request. The response will include either a non-streaming or streaming response based on the request parameters.

Parámetros para "Run an inference request"

Encabezados
Nombre, Tipo, Descripción
content-type string Requerido

Setting to application/json is required.

accept string

Setting to application/vnd.github+json is recommended.

Parámetros de consulta
Nombre, Tipo, Descripción
api-version string

The API version to use. Optional, but required for some features.

Parámetros del cuerpo
Nombre, Tipo, Descripción
model string Requerido

ID of the specific model to use for the request. The model ID should be in the format of {publisher}/{model_name} where "openai/gpt-4.1" is an example of a model ID. You can find supported models in the catalog/models endpoint.

messages array of objects Requerido

The collection of context messages associated with this chat completion request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles.

Nombre, Tipo, Descripción
role string Requerido

The chat role associated with this message

Puede ser uno de los siguientes: assistant, developer, system, user

content string Requerido

The content of the message

frequency_penalty number

A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2].

max_tokens integer

The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. For example, if your prompt is 100 tokens and you set max_tokens to 50, the API will return a completion with a maximum of 50 tokens.

modalities array of strings

The modalities that the model is allowed to use for the chat completions response. The default modality is text. Indicating an unsupported modality combination results in a 422 error. Supported values are: text, audio

presence_penalty number

A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new tokens. Supported range is [-2, 2].

response_format object

The desired format for the response.

Nombre, Tipo, Descripción
Object object
Nombre, Tipo, Descripción
type string

Puede ser uno de los siguientes: text, json_object

Schema for structured JSON response object Requerido
Nombre, Tipo, Descripción
type string Requerido

The type of the response.

Valor: json_schema

json_schema object Requerido

The JSON schema for the response.

seed integer

If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.

stream boolean

A value indicating whether chat completions should be streamed for this request.

Valor predeterminado: false

stream_options object

Whether to include usage information in the response. Requires stream to be set to true.

Nombre, Tipo, Descripción
include_usage boolean

Whether to include usage information in the response.

Valor predeterminado: false

stop array of strings

A collection of textual sequences that will end completion generation.

temperature number

The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completion request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Decimal values are supported.

tool_choice string

If specified, the model will configure which of the provided tools it can use for the chat completions response.

Puede ser uno de los siguientes: auto, required, none

tools array of objects

A list of tools the model may request to call. Currently, only functions are supported as a tool. The model may respond with a function call request and provide the input arguments in JSON format for that function.

Nombre, Tipo, Descripción
function object
Nombre, Tipo, Descripción
name string

The name of the function to be called.

description string

A description of what the function does. The model will use this description when selecting the function and interpreting its parameters.

parameters

The parameters the function accepts, described as a JSON Schema object.

type string

Valor: function

top_p number

An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Decimal values are supported.

Códigos de estado de respuesta HTTP para "Run an inference request"

status codeDescripción
200

OK

Ejemplos de código para "Run an inference request"

Ejemplo de solicitud

post/inference/chat/completions
curl -L \ -X POST \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer <YOUR-TOKEN>" \ -H "X-GitHub-Api-Version: 2022-11-28" \ -H "Content-Type: application/json" \ https://models.github.ai/inference/chat/completions \ -d '{"model":"openai/gpt-4.1","messages":[{"role":"user","content":"What is the capital of France?"}]}'

Respuesta

Status: 200
{ "choices": [ { "message": { "content": "The capital of France is Paris.", "role": "assistant" } } ] }
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy