Groq

Groq provides ultra-fast inference on custom LPU hardware. It hosts a variety of open-weight models with very low latency. AgentXchain connects via api_proxy using Groq's OpenAI-compatible API.

Which adapter?

api_proxy with provider: "openai" and a custom base_url — Groq's API is OpenAI-compatible.

Prerequisites

A Groq API key — get one from console.groq.com
GROQ_API_KEY set in your environment
agentxchain CLI installed

Configuration

{
  "runtimes": {
    "groq-dev": {
      "type": "api_proxy",
      "provider": "openai",
      "model": "gpt-oss-120b",
      "auth_env": "GROQ_API_KEY",
      "base_url": "https://api.groq.com/openai/v1/chat/completions"
    }
  },
  "roles": {
    "dev": {
      "runtime": "groq-dev",
      "mandate": "Implement features and fix bugs",
      "write_authority": "proposed"
    }
  }
}

Minimal working example

agentxchain init --governed --template api-service --goal "Build a feedback intake API" --dir my-project -y
cd my-project
# Replace the scaffolded runtime wiring in agentxchain.json with the Groq config above.
agentxchain doctor
agentxchain connector check
agentxchain connector validate groq-dev
agentxchain run

If you prefer the guided interactive scaffold, run agentxchain init --governed without -y, then update agentxchain.json with the Groq config above before agentxchain connector check and agentxchain connector validate groq-dev.

Available models on Groq

Model	Provider	Best for
`gpt-oss-120b`	OpenAI (open-weight)	Strong general coding
`kimi-k2`	Moonshot	Code generation
`qwen3-32b`	Alibaba	Efficient coding
`llama-4-scout`	Meta	Balanced performance
`llama-3.3-70b`	Meta	Proven coding model
`codestral-mamba`	Mistral	Fast code completion

Verify the connection

export GROQ_API_KEY="gsk_..."
agentxchain connector check
agentxchain connector validate groq-dev

Why Groq for governed runs?

Groq's LPU hardware delivers inference 5-10x faster than GPU-based providers. For governed runs with many sequential turns, this dramatically reduces total wall-clock time. A 20-turn governed run that takes 30 minutes with a GPU provider might complete in 5-10 minutes on Groq.

Gotchas

Rate limits: Groq applies aggressive rate limits (tokens per minute). For governed runs with large dispatch bundles, you may hit limits between turns. Consider adding delays or using timeouts.turn_timeout_ms to handle retries gracefully.
Model availability: Groq's model catalog changes. Check console.groq.com/docs/models for current availability.
Context window: Some Groq-hosted models have smaller context windows than their original versions. Verify the context limit for your chosen model.
Cost rates: Add an operator override keyed by the exact runtime model string. Groq-hosted models are not in the bundled defaults:

{
  "budget": {
    "cost_rates": {
      "gpt-oss-120b": { "input_per_1m": 0.50, "output_per_1m": 1.50 }
    }
  }
}

Which adapter?​

Prerequisites​

Configuration​

Minimal working example​

Available models on Groq​

Verify the connection​

Why Groq for governed runs?​

Gotchas​