Groq
Groq provides ultra-fast inference on custom LPU hardware. It hosts a variety of open-weight models with very low latency. AgentXchain connects via api_proxy using Groq's OpenAI-compatible API.
Which adapter?
api_proxy with provider: "openai" and a custom base_url — Groq's API is OpenAI-compatible.
Prerequisites
- A Groq API key — get one from console.groq.com
GROQ_API_KEYset in your environmentagentxchainCLI installed
Configuration
{
"runtimes": {
"groq-dev": {
"type": "api_proxy",
"provider": "openai",
"model": "gpt-oss-120b",
"auth_env": "GROQ_API_KEY",
"base_url": "https://api.groq.com/openai/v1/chat/completions"
}
},
"roles": {
"dev": {
"runtime": "groq-dev",
"mandate": "Implement features and fix bugs",
"authority": "proposed"
}
}
}
Available models on Groq
| Model | Provider | Best for |
|---|---|---|
gpt-oss-120b | OpenAI (open-weight) | Strong general coding |
kimi-k2 | Moonshot | Code generation |
qwen3-32b | Alibaba | Efficient coding |
llama-4-scout | Meta | Balanced performance |
llama-3.3-70b | Meta | Proven coding model |
codestral-mamba | Mistral | Fast code completion |
Verify the connection
export GROQ_API_KEY="gsk_..."
agentxchain connector check
Why Groq for governed runs?
Groq's LPU hardware delivers inference 5-10x faster than GPU-based providers. For governed runs with many sequential turns, this dramatically reduces total wall-clock time. A 20-turn governed run that takes 30 minutes with a GPU provider might complete in 5-10 minutes on Groq.
Gotchas
- Rate limits: Groq applies aggressive rate limits (tokens per minute). For governed runs with large dispatch bundles, you may hit limits between turns. Consider adding delays or using
timeouts.turn_timeout_msto handle retries gracefully. - Model availability: Groq's model catalog changes. Check console.groq.com/docs/models for current availability.
- Context window: Some Groq-hosted models have smaller context windows than their original versions. Verify the context limit for your chosen model.
- Cost rates: Supply operator-specific rates:
{
"budget": {
"cost_rates": {
"gpt-oss-120b": { "input_per_million": 0.50, "output_per_million": 1.50 }
}
}
}