Skip to main content

Groq

Groq provides ultra-fast inference on custom LPU hardware. It hosts a variety of open-weight models with very low latency. AgentXchain connects via api_proxy using Groq's OpenAI-compatible API.

Which adapter?

api_proxy with provider: "openai" and a custom base_url — Groq's API is OpenAI-compatible.

Prerequisites

  • A Groq API key — get one from console.groq.com
  • GROQ_API_KEY set in your environment
  • agentxchain CLI installed

Configuration

{
"runtimes": {
"groq-dev": {
"type": "api_proxy",
"provider": "openai",
"model": "gpt-oss-120b",
"auth_env": "GROQ_API_KEY",
"base_url": "https://api.groq.com/openai/v1/chat/completions"
}
},
"roles": {
"dev": {
"runtime": "groq-dev",
"mandate": "Implement features and fix bugs",
"authority": "proposed"
}
}
}

Available models on Groq

ModelProviderBest for
gpt-oss-120bOpenAI (open-weight)Strong general coding
kimi-k2MoonshotCode generation
qwen3-32bAlibabaEfficient coding
llama-4-scoutMetaBalanced performance
llama-3.3-70bMetaProven coding model
codestral-mambaMistralFast code completion

Verify the connection

export GROQ_API_KEY="gsk_..."
agentxchain connector check

Why Groq for governed runs?

Groq's LPU hardware delivers inference 5-10x faster than GPU-based providers. For governed runs with many sequential turns, this dramatically reduces total wall-clock time. A 20-turn governed run that takes 30 minutes with a GPU provider might complete in 5-10 minutes on Groq.

Gotchas

  • Rate limits: Groq applies aggressive rate limits (tokens per minute). For governed runs with large dispatch bundles, you may hit limits between turns. Consider adding delays or using timeouts.turn_timeout_ms to handle retries gracefully.
  • Model availability: Groq's model catalog changes. Check console.groq.com/docs/models for current availability.
  • Context window: Some Groq-hosted models have smaller context windows than their original versions. Verify the context limit for your chosen model.
  • Cost rates: Supply operator-specific rates:
{
"budget": {
"cost_rates": {
"gpt-oss-120b": { "input_per_million": 0.50, "output_per_million": 1.50 }
}
}
}