Models

Models

Mixlayer hosts a curated set of open-source models served behind the OpenAI-compatible API. Pass a model’s identifier as the model field on any request to https://models.mixlayer.ai/v1/chat/completions.

ModelIdentifierContext WindowCapabilitiesStatus
Qwen 3.5 4B (Free)qwen/qwen3.5-4b-free131KToolsReasoningStable
Qwen 3.5 9Bqwen/qwen3.5-9b131KToolsReasoningStable
Qwen 3.5 27Bqwen/qwen3.5-27b131KToolsReasoningStable
Qwen 3.5 35B (MoE, 3B active)qwen/qwen3.5-35b-a3b131KToolsReasoningStable
Qwen 3.5 122B (MoE, 10B active)qwen/qwen3.5-122b-a10b131KToolsReasoningStable
Qwen 3.5 397B (MoE, 17B active)qwen/qwen3.5-397b-a17b131KToolsReasoningStable

Pricing for each model is listed on the Pricing page. To see the exact set of models authorized for your API key, call GET /v1/models.

Choosing a model

Use the table below as a starting point — these are the workloads each model is best suited to and the reason it’s the right pick.

IdentifierUse case & why
qwen/qwen3.5-4b-freeFree tier for prototyping, learning the API, and short low-stakes tasks. Smallest and lowest-latency model in the catalog; rate-limited so not for production traffic.
qwen/qwen3.5-9bThe cheapest paid model. Good default for high-volume, simple chat, classification, and short-form summarization where cost dominates.
qwen/qwen3.5-27bDense general-purpose model. Stronger than 9B on multi-step reasoning and instruction-following while staying single-stream fast.
qwen/qwen3.5-35b-a3bFast MoE — 35B of total knowledge but only 3B parameters active per token. Use when you want broader capability than 9B at similar latency.
qwen/qwen3.5-122b-a10bHigh-capability MoE for complex reasoning, longer contexts, and harder coding tasks where 27B isn't enough but you don't need the frontier model.
qwen/qwen3.5-397b-a17bFrontier model. Best choice for hard reasoning, multi-step coding, agentic loops, and anywhere quality matters more than per-token cost.

Per-model notes

Different model families have different recommended sampling settings and behavior. The notes below come from the model authors’ published guidance, adapted to the parameters Mixlayer’s API exposes.

Qwen 3.5

Qwen 3.5 models support both a thinking mode (extended chain-of-thought reasoning, returned in reasoning_content) and a non-thinking / instruct mode. See Reasoning for how to toggle between them.

Qwen recommends the following sampling settings depending on mode and task:

ModeRecommended sampling
Thinking — general taskstemperature=1.0, top_p=0.95, top_k=20, presence_penalty=0.0, repetition_penalty=1.0
Thinking — precise coding (e.g. WebDev)temperature=0.6, top_p=0.95, top_k=20, presence_penalty=0.0, repetition_penalty=1.0
Instruct (non-thinking)temperature=0.7, top_p=0.80, top_k=20, presence_penalty=1.5, repetition_penalty=1.0

Qwen’s published guidance also recommends min_p=0.0. Mixlayer’s API does not currently expose a min_p parameter, so it can be omitted — the gateway applies a model-appropriate default.

For the MoE variants (qwen3.5-35b-a3b, qwen3.5-122b-a10b, qwen3.5-397b-a17b), the active-parameter count is what governs latency and cost, while the total parameter count governs capability. The 397b-a17b model is the strongest and is the recommended choice for hard reasoning, coding, and agentic workloads.