POST/v1/chat/completions

Chat Completions

Create a model response for a given conversation. Accepts a list of messages and returns the model's next reply. Use this endpoint to power chat interfaces, AI assistants, content generation, and any text workflow.

This page documents the OpenAI-compatible Chat Completions endpoint. Use the openai Python package or any OpenAI-compatible SDK with OPENAI_BASE_URL=https://api.linkharbor.ai/v1. For Anthropic Messages API, use /anthropic/v1/messages instead.

curl https://api.linkharbor.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "your-model-name",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7
}'

Request Body

modelstringrequired

The ID of the model to use. Retrieve available models from GET /v1/models and replace your-model-name with a real ID from the live catalog.

messagesarrayrequired

A list of messages that make up the conversation. The model uses this history to generate the next reply.

messages[].rolestringrequired

The role of the message author. One of: system (sets assistant behavior), user (human input), or assistant (prior model replies).

messages[].contentstringrequired

The text content of the message.

streambooleanoptionalDefault: false

If true, the response is streamed back as Server-Sent Events (SSE) instead of a single JSON object. Each chunk contains a delta with the incremental content. The stream ends with data: [DONE].

temperaturenumberoptionalDefault: 1 · Range: 0–2

Sampling temperature. Higher values (e.g. 0.9) produce more creative, varied output. Lower values (e.g. 0.2) make responses more focused and deterministic. Adjust this or top_p, not both.

max_tokensintegeroptional

Maximum number of tokens to generate. The total of input tokens and this value cannot exceed the model's context window. Omit to use the model's default maximum.

Request body
{
"model": "your-model-name",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain async/await in Python."}
],
"temperature": 0.7,
"max_tokens": 512
}

Response

Returns a chat completion object. On success, the HTTP status is 200. On error, a JSON object with error type and message is returned instead.

idstring

Unique identifier for this completion, prefixed with chatcmpl-.

modelstring

The model that generated this completion.

choicesarray

Array of generated choices. Each contains a message with role and content, and a finish_reason (e.g. stop when the model completes naturally, length when max_tokens is reached).

usageobject

Token usage statistics for this request.

usage.prompt_tokensinteger

Number of tokens in the input messages.

usage.completion_tokensinteger

Number of tokens in the generated response.

usage.total_tokensinteger

Total tokens used (prompt + completion).

Response object
{
"id": "chatcmpl-A9f3k2mX8y",
"object": "chat.completion",
"created": 1715284800,
"model": "your-model-name",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "async/await lets you write asynchronous code..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 94,
"total_tokens": 122
}
}