Models
Mixlayer hosts a curated set of open-source models served behind the OpenAI-compatible API. Pass a model’s identifier as the model field on any request to https://models.mixlayer.ai/v1/chat/completions.
| Model | Identifier | Context Window | Capabilities | Status |
|---|---|---|---|---|
| Qwen 3.5 4B (Free) | qwen/qwen3.5-4b-free | 131K | ToolsReasoning | Stable |
| Qwen 3.5 9B | qwen/qwen3.5-9b | 131K | ToolsReasoning | Stable |
| Qwen 3.5 27B | qwen/qwen3.5-27b | 131K | ToolsReasoning | Stable |
| Qwen 3.5 35B (MoE, 3B active) | qwen/qwen3.5-35b-a3b | 131K | ToolsReasoning | Stable |
| Qwen 3.5 122B (MoE, 10B active) | qwen/qwen3.5-122b-a10b | 131K | ToolsReasoning | Stable |
| Qwen 3.5 397B (MoE, 17B active) | qwen/qwen3.5-397b-a17b | 131K | ToolsReasoning | Stable |
Pricing for each model is listed on the Pricing page. To see the
exact set of models authorized for your API key, call GET /v1/models.
Choosing a model
Use the table below as a starting point — these are the workloads each model is best suited to and the reason it’s the right pick.
| Identifier | Use case & why |
|---|---|
| qwen/qwen3.5-4b-free | Free tier for prototyping, learning the API, and short low-stakes tasks. Smallest and lowest-latency model in the catalog; rate-limited so not for production traffic. |
| qwen/qwen3.5-9b | The cheapest paid model. Good default for high-volume, simple chat, classification, and short-form summarization where cost dominates. |
| qwen/qwen3.5-27b | Dense general-purpose model. Stronger than 9B on multi-step reasoning and instruction-following while staying single-stream fast. |
| qwen/qwen3.5-35b-a3b | Fast MoE — 35B of total knowledge but only 3B parameters active per token. Use when you want broader capability than 9B at similar latency. |
| qwen/qwen3.5-122b-a10b | High-capability MoE for complex reasoning, longer contexts, and harder coding tasks where 27B isn't enough but you don't need the frontier model. |
| qwen/qwen3.5-397b-a17b | Frontier model. Best choice for hard reasoning, multi-step coding, agentic loops, and anywhere quality matters more than per-token cost. |
Per-model notes
Different model families have different recommended sampling settings and behavior. The notes below come from the model authors’ published guidance, adapted to the parameters Mixlayer’s API exposes.
Qwen 3.5
Qwen 3.5 models support both a thinking mode (extended chain-of-thought reasoning, returned in reasoning_content) and a non-thinking / instruct mode. See Reasoning for how to toggle between them.
Qwen recommends the following sampling settings depending on mode and task:
| Mode | Recommended sampling |
|---|---|
| Thinking — general tasks | temperature=1.0, top_p=0.95, top_k=20, presence_penalty=0.0, repetition_penalty=1.0 |
| Thinking — precise coding (e.g. WebDev) | temperature=0.6, top_p=0.95, top_k=20, presence_penalty=0.0, repetition_penalty=1.0 |
| Instruct (non-thinking) | temperature=0.7, top_p=0.80, top_k=20, presence_penalty=1.5, repetition_penalty=1.0 |
Qwen’s published guidance also recommends min_p=0.0. Mixlayer’s API does
not currently expose a min_p parameter, so it can be omitted — the gateway
applies a model-appropriate default.
For the MoE variants (qwen3.5-35b-a3b, qwen3.5-122b-a10b, qwen3.5-397b-a17b), the active-parameter count is what governs latency and cost, while the total parameter count governs capability. The 397b-a17b model is the strongest and is the recommended choice for hard reasoning, coding, and agentic workloads.