Models

Mixlayer hosts a curated set of open-source models served behind the OpenAI-compatible API. Pass a model’s identifier as the model field on any request to https://models.mixlayer.ai/v1/chat/completions.

Model	Identifier	Context Window	Capabilities	Status
Qwen 3.5 4B (Free)	qwen/qwen3.5-4b-free	131K	ToolsReasoning	Stable
Qwen 3.5 9B	qwen/qwen3.5-9b	131K	ToolsReasoning	Stable
Qwen 3.5 27B	qwen/qwen3.5-27b	131K	ToolsReasoning	Stable
Qwen 3.5 35B (MoE, 3B active)	qwen/qwen3.5-35b-a3b	131K	ToolsReasoning	Stable
Qwen 3.5 122B (MoE, 10B active)	qwen/qwen3.5-122b-a10b	131K	ToolsReasoning	Stable
Qwen 3.5 397B (MoE, 17B active)	qwen/qwen3.5-397b-a17b	131K	ToolsReasoning	Stable

Pricing for each model is listed on the Pricing page. To see the exact set of models authorized for your API key, call GET /v1/models.

Choosing a model

Use the table below as a starting point — these are the workloads each model is best suited to and the reason it’s the right pick.

Identifier	Use case & why
qwen/qwen3.5-4b-free	Free tier for prototyping, learning the API, and short low-stakes tasks. Smallest and lowest-latency model in the catalog; rate-limited so not for production traffic.
qwen/qwen3.5-9b	The cheapest paid model. Good default for high-volume, simple chat, classification, and short-form summarization where cost dominates.
qwen/qwen3.5-27b	Dense general-purpose model. Stronger than 9B on multi-step reasoning and instruction-following while staying single-stream fast.
qwen/qwen3.5-35b-a3b	Fast MoE — 35B of total knowledge but only 3B parameters active per token. Use when you want broader capability than 9B at similar latency.
qwen/qwen3.5-122b-a10b	High-capability MoE for complex reasoning, longer contexts, and harder coding tasks where 27B isn't enough but you don't need the frontier model.
qwen/qwen3.5-397b-a17b	Frontier model. Best choice for hard reasoning, multi-step coding, agentic loops, and anywhere quality matters more than per-token cost.

Per-model notes

Different model families have different recommended sampling settings and behavior. The notes below come from the model authors’ published guidance, adapted to the parameters Mixlayer’s API exposes.

Qwen 3.5

Qwen 3.5 models support both a thinking mode (extended chain-of-thought reasoning, returned in reasoning_content) and a non-thinking / instruct mode. See Reasoning for how to toggle between them.

Qwen recommends the following sampling settings depending on mode and task:

Mode	Recommended sampling
Thinking — general tasks	`temperature=1.0`, `top_p=0.95`, `top_k=20`, `presence_penalty=0.0`, `repetition_penalty=1.0`
Thinking — precise coding (e.g. WebDev)	`temperature=0.6`, `top_p=0.95`, `top_k=20`, `presence_penalty=0.0`, `repetition_penalty=1.0`
Instruct (non-thinking)	`temperature=0.7`, `top_p=0.80`, `top_k=20`, `presence_penalty=1.5`, `repetition_penalty=1.0`

Qwen’s published guidance also recommends min_p=0.0. Mixlayer’s API does not currently expose a min_p parameter, so it can be omitted — the gateway applies a model-appropriate default.

For the MoE variants (qwen3.5-35b-a3b, qwen3.5-122b-a10b, qwen3.5-397b-a17b), the active-parameter count is what governs latency and cost, while the total parameter count governs capability. The 397b-a17b model is the strongest and is the recommended choice for hard reasoning, coding, and agentic workloads.

Client Libraries Chat Completions