Mikael Hugo 3960e42b26 docs: align sf purpose doctrine and docs

2026-05-06 00:38:36 +02:00

14 KiB

Raw Blame History

Provider Setup Guide

Step-by-step setup instructions for every LLM provider SF supports. If you ran the onboarding wizard (sf config) and picked a provider, you may already be configured — check with /model inside a session.

Quick Reference
Built-in Providers
Local Providers
- Ollama
- LM Studio
- vLLM
- SGLang
Custom OpenAI-Compatible Endpoints
Common Pitfalls
Verifying Your Setup

Quick Reference

Provider	Auth Method	Env Variable	Config File
Anthropic	API key	`ANTHROPIC_API_KEY`	—
OpenAI	API key	`OPENAI_API_KEY`	—
Google Gemini	API key	`GEMINI_API_KEY`	—
OpenRouter	API key	`OPENROUTER_API_KEY`	Optional `models.json`
Groq	API key	`GROQ_API_KEY`	—
xAI	API key	`XAI_API_KEY`	—
Mistral	API key	`MISTRAL_API_KEY`	—
GitHub Copilot	OAuth	`GH_TOKEN`	—
Amazon Bedrock	IAM credentials	`AWS_PROFILE` or `AWS_ACCESS_KEY_ID`	—
Vertex AI	ADC	`GOOGLE_APPLICATION_CREDENTIALS`	—
Azure OpenAI	API key	`AZURE_OPENAI_API_KEY`	—
Ollama	None (local)	—	`models.json` required
LM Studio	None (local)	—	`models.json` required
vLLM / SGLang	None (local)	—	`models.json` required

Built-in Providers

Built-in providers have models pre-registered in SF. You only need to supply credentials.

Anthropic (Claude)

Recommended. Anthropic models have the deepest integration: built-in web search, extended thinking, and prompt caching.

Option A — API key (recommended):

export ANTHROPIC_API_KEY="sk-ant-..."

Or run sf config and paste your key when prompted.

Get a key: console.anthropic.com/settings/keys

Note: SF does not support browser-based OAuth sign-in for Anthropic. Use an API key or a configured provider/runtime adapter.

Runtime boundary: SF may use Claude Code, Codex, or Gemini CLI core as model/runtime adapters when explicitly configured. These adapters are not project MCP dependencies, and SF does not expose its own workflow as an MCP server. Run SF directly with sf or /sf autonomous; reserve MCP configuration for external tools that SF may call.

OpenAI

export OPENAI_API_KEY="sk-..."

Or run sf config and choose "Paste an API key" then "OpenAI".

Get a key: platform.openai.com/api-keys

Google Gemini

export GEMINI_API_KEY="..."

Get a key: aistudio.google.com/app/apikey

OpenRouter

OpenRouter aggregates 200+ models from multiple providers behind a single API key.

Step 1 — Get your API key:

Go to openrouter.ai/keys and create a key.

Step 2 — Set the key:

export OPENROUTER_API_KEY="sk-or-..."

Or run sf config, choose "Paste an API key", then "OpenRouter".

Step 3 — Switch to an OpenRouter model:

Inside a SF session, type /model and select an OpenRouter model. Models are prefixed with openrouter/ (e.g., openrouter/anthropic/claude-sonnet-4).

Optional — Add custom OpenRouter models via models.json:

If you want models not in the built-in list, add them to ~/.sf/agent/models.json:

{
  "providers": {
    "openrouter": {
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKey": "OPENROUTER_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "meta-llama/llama-3.3-70b",
          "name": "Llama 3.3 70B (OpenRouter)",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 131072,
          "maxTokens": 32768,
          "cost": { "input": 0.3, "output": 0.3, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Note: the apiKey field here is the name of the environment variable, not the literal key. SF resolves it automatically. You can also use a literal value or a shell command (see Value Resolution).

Optional — Route through specific providers:

Use modelOverrides to control which upstream provider OpenRouter uses:

{
  "providers": {
    "openrouter": {
      "modelOverrides": {
        "anthropic/claude-sonnet-4": {
          "compat": {
            "openRouterRouting": {
              "only": ["amazon-bedrock"]
            }
          }
        }
      }
    }
  }
}

Groq

export GROQ_API_KEY="gsk_..."

Get a key: console.groq.com/keys

xAI (Grok)

export XAI_API_KEY="xai-..."

Get a key: console.x.ai

Mistral

export MISTRAL_API_KEY="..."

Get a key: console.mistral.ai/api-keys

GitHub Copilot

Uses OAuth — sign in through the browser:

sf config
# Choose "Sign in with your browser" → "GitHub Copilot"

Requires an active GitHub Copilot subscription.

Amazon Bedrock

Bedrock uses AWS IAM credentials, not API keys. Any of these work:

# Option 1: Named profile
export AWS_PROFILE="my-profile"

# Option 2: IAM keys
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

# Option 3: Bedrock API key (bearer token)
export AWS_BEARER_TOKEN_BEDROCK="..."

ECS task roles and IRSA (Kubernetes) are also detected automatically.

Anthropic on Vertex AI

Uses Google Cloud Application Default Credentials:

gcloud auth application-default login
export ANTHROPIC_VERTEX_PROJECT_ID="my-project-id"

Or set GOOGLE_CLOUD_PROJECT and ensure ADC credentials exist at ~/.config/gcloud/application_default_credentials.json.

Azure OpenAI

export AZURE_OPENAI_API_KEY="..."

Local Providers

Local providers run on your machine. They require a models.json configuration file because SF needs to know the endpoint URL and which models are available.

Config file location: ~/.sf/agent/models.json

The file reloads each time you open /model — no restart needed.

Ollama

Step 1 — Install and start Ollama:

# Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama serve

Step 2 — Pull a model:

ollama pull llama3.1:8b
ollama pull qwen2.5-coder:7b

Step 3 — Create ~/.sf/agent/models.json:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        { "id": "llama3.1:8b" },
        { "id": "qwen2.5-coder:7b" }
      ]
    }
  }
}

The apiKey is required by the config schema but Ollama ignores it — any value works.

Step 4 — Select the model:

Inside SF, type /model and pick your Ollama model.

Ollama tips:

Ollama does not support the developer role or reasoning_effort — always set compat.supportsDeveloperRole: false and compat.supportsReasoningEffort: false.
If you get empty responses, check that ollama serve is running and the model is pulled.
Context window and max tokens default to 128K / 16K if not specified. Override these if your model has different limits.

LM Studio

Step 1 — Install LM Studio:

Download from lmstudio.ai.

Step 2 — Start the local server:

In LM Studio, go to the "Local Server" tab, load a model, and click "Start Server". The default port is 1234.

Step 3 — Create ~/.sf/agent/models.json:

{
  "providers": {
    "lm-studio": {
      "baseUrl": "http://localhost:1234/v1",
      "api": "openai-completions",
      "apiKey": "lm-studio",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "your-model-name",
          "name": "My Local Model",
          "contextWindow": 32768,
          "maxTokens": 4096
        }
      ]
    }
  }
}

Replace your-model-name with the model identifier shown in LM Studio's server tab.

LM Studio tips:

The model ID in models.json must match what LM Studio reports in its server API. Check the server tab for the exact string.
LM Studio defaults to port 1234. If you changed it, update baseUrl accordingly.
Increase contextWindow and maxTokens if your model supports larger contexts.

vLLM

{
  "providers": {
    "vllm": {
      "baseUrl": "http://localhost:8000/v1",
      "api": "openai-completions",
      "apiKey": "vllm",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false,
        "supportsUsageInStreaming": false
      },
      "models": [
        {
          "id": "meta-llama/Llama-3.1-8B-Instruct",
          "contextWindow": 128000,
          "maxTokens": 16384
        }
      ]
    }
  }
}

The model id must match the --model flag you passed to vllm serve.

SGLang

{
  "providers": {
    "sglang": {
      "baseUrl": "http://localhost:30000/v1",
      "api": "openai-completions",
      "apiKey": "sglang",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "meta-llama/Llama-3.1-8B-Instruct"
        }
      ]
    }
  }
}

Custom OpenAI-Compatible Endpoints

Any server that implements the OpenAI Chat Completions API can work with SF. This covers proxies (LiteLLM, Portkey, Helicone), self-hosted inference, and new providers.

Quickest path — use the onboarding wizard:

sf config
# Choose "Paste an API key" → "Custom (OpenAI-compatible)"
# Enter: base URL, API key, model ID

This writes ~/.sf/agent/models.json for you automatically.

Manual setup:

{
  "providers": {
    "my-provider": {
      "baseUrl": "https://my-endpoint.example.com/v1",
      "apiKey": "MY_PROVIDER_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "model-id-here",
          "name": "Friendly Model Name",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 128000,
          "maxTokens": 16384,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Adding custom headers (for proxies):

{
  "providers": {
    "litellm-proxy": {
      "baseUrl": "https://litellm.example.com/v1",
      "apiKey": "MY_API_KEY",
      "api": "openai-completions",
      "headers": {
        "x-custom-header": "value"
      },
      "models": [...]
    }
  }
}

Qwen models with thinking mode:

For Qwen-compatible servers, use thinkingFormat to enable thinking mode:

{
  "compat": {
    "thinkingFormat": "qwen",
    "supportsDeveloperRole": false
  }
}

Use "qwen-chat-template" instead if the server requires chat_template_kwargs.enable_thinking.

For the full reference on compat fields, modelOverrides, value resolution, and advanced configuration, see Custom Models.

Common Pitfalls

"Authentication failed" with a valid key

Cause: The key is set in your shell but not visible to SF.

Fix: Make sure the environment variable is exported in the same terminal where you run sf. Or use sf config to save the key to ~/.sf/agent/auth.json so it persists across sessions.

OpenRouter models not appearing in `/model`

Cause: No OPENROUTER_API_KEY set, so SF hides OpenRouter models.

Fix: Set the key and restart SF:

export OPENROUTER_API_KEY="sk-or-..."
sf

Ollama returns empty responses

Cause: Ollama server isn't running, or the model isn't pulled.

Fix:

# Verify the server is running
curl http://localhost:11434/v1/models

# Pull the model if missing
ollama pull llama3.1:8b

LM Studio model ID mismatch

Cause: The id in models.json doesn't match what LM Studio exposes via its API.

Fix: Check the LM Studio server tab for the exact model identifier. It often includes the filename or quantization level (e.g., lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF).

`developer` role error with local models

Cause: Most local inference servers don't support the OpenAI developer message role.

Fix: Add compat.supportsDeveloperRole: false to the provider config. This makes SF send system messages instead:

{
  "compat": {
    "supportsDeveloperRole": false,
    "supportsReasoningEffort": false
  }
}

`stream_options` error with local models

Cause: Some servers don't support stream_options: { include_usage: true }.

Fix: Add compat.supportsUsageInStreaming: false:

{
  "compat": {
    "supportsUsageInStreaming": false
  }
}

"apiKey is required" validation error

Cause: models.json schema requires apiKey when models are defined.

Fix: For local servers that don't need auth, set a dummy value:

"apiKey": "not-needed"

Cost shows $0.00 for custom models

Expected behavior. SF defaults cost to zero for custom models. Override with the cost field if you want accurate cost tracking:

"cost": { "input": 0.15, "output": 0.60, "cacheRead": 0.015, "cacheWrite": 0.19 }

Values are per million tokens.

Verifying Your Setup

After configuring a provider:

Launch SF:
```
sf
```
Check available models:
```
/model
```
Your provider's models should appear in the list.
Switch to the model: Select it from the /model picker.
Send a test message: Type anything to confirm the model responds.

If the model doesn't appear, check:

The environment variable is set in the current shell
models.json is valid JSON (use cat ~/.sf/agent/models.json | python3 -m json.tool)
The server is running (for local providers)

For additional help, see Troubleshooting or run /sf doctor inside a session.

14 KiB Raw Blame History