Skip to main content

Groq

Groq provides ultra-fast inference on custom LPU hardware. It hosts a variety of open-weight models with very low latency. AgentXchain connects via api_proxy using Groq's OpenAI-compatible API.

Which adapter?

api_proxy with provider: "openai" and a custom base_url — Groq's API is OpenAI-compatible.

Prerequisites

  • A Groq API key — get one from console.groq.com
  • GROQ_API_KEY set in your environment
  • agentxchain CLI installed

Configuration

{
"runtimes": {
"groq-dev": {
"type": "api_proxy",
"provider": "openai",
"model": "gpt-oss-120b",
"auth_env": "GROQ_API_KEY",
"base_url": "https://api.groq.com/openai/v1/chat/completions"
}
},
"roles": {
"dev": {
"runtime": "groq-dev",
"mandate": "Implement features and fix bugs",
"authority": "proposed"
}
}
}

Minimal working example

agentxchain init --governed --template api-service --goal "Build a feedback intake API" --dir my-project -y
cd my-project
# Replace the scaffolded runtime wiring in agentxchain.json with the Groq config above.
agentxchain doctor
agentxchain connector check
agentxchain connector validate groq-dev
agentxchain run

If you prefer the guided interactive scaffold, run agentxchain init --governed without -y, then update agentxchain.json with the Groq config above before agentxchain connector check and agentxchain connector validate groq-dev.

Available models on Groq

ModelProviderBest for
gpt-oss-120bOpenAI (open-weight)Strong general coding
kimi-k2MoonshotCode generation
qwen3-32bAlibabaEfficient coding
llama-4-scoutMetaBalanced performance
llama-3.3-70bMetaProven coding model
codestral-mambaMistralFast code completion

Verify the connection

export GROQ_API_KEY="gsk_..."
agentxchain connector check
agentxchain connector validate groq-dev

Why Groq for governed runs?

Groq's LPU hardware delivers inference 5-10x faster than GPU-based providers. For governed runs with many sequential turns, this dramatically reduces total wall-clock time. A 20-turn governed run that takes 30 minutes with a GPU provider might complete in 5-10 minutes on Groq.

Gotchas

  • Rate limits: Groq applies aggressive rate limits (tokens per minute). For governed runs with large dispatch bundles, you may hit limits between turns. Consider adding delays or using timeouts.turn_timeout_ms to handle retries gracefully.
  • Model availability: Groq's model catalog changes. Check console.groq.com/docs/models for current availability.
  • Context window: Some Groq-hosted models have smaller context windows than their original versions. Verify the context limit for your chosen model.
  • Cost rates: Add an operator override keyed by the exact runtime model string. Groq-hosted models are not in the bundled defaults:
{
"budget": {
"cost_rates": {
"gpt-oss-120b": { "input_per_1m": 0.50, "output_per_1m": 1.50 }
}
}
}