New: K8sGPT MCP Server with AI Integration!

Back to Documentation

Using Ollama for Local LLMs

Run K8sGPT with local LLMs using Ollama for complete data privacy and air-gapped environments

Overview

Ollama enables you to run large language models locally on your machine. This tutorial shows you how to integrate Ollama with K8sGPT for completely private, self-hosted Kubernetes cluster analysis.

Benefits of Using Ollama

  • 100% data privacy - no data leaves your infrastructure
  • Works in air-gapped environments
  • No API costs or rate limits
  • Low latency (local inference)
  • Easy model switching and experimentation

Prerequisites

  • K8sGPT installed (brew install k8sgpt or see installation docs)
  • A Kubernetes cluster to analyze
  • Sufficient disk space for models (2-4GB per model)
  • For larger models: 16GB+ RAM recommended

Step 1: Install Ollama

macOS / Linux

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
ollama serve

Windows

Download the installer from ollama.ai/download

Docker

docker run -d -v ollama:/root/.ollama \
  -p 11434:11434 --name ollama ollama/ollama
Note: Ollama runs on port 11434 by default. Make sure this port is available.

Step 2: Download a Model

Choose a model based on your hardware and accuracy needs:

ModelSizeRAMBest For
llama3.22GB8GBFast, good balance
mistral4.1GB8GBGreat performance
llama3.1:8b4.7GB8GBHigh quality
llama3.1:70b40GB64GBBest accuracy
codellama3.8GB8GBCode-focused

Download and Run a Model

# Recommended: Llama 3.2 (fast, good quality)
ollama pull llama3.2

# Or try Mistral
ollama pull mistral

# Test the model
ollama run llama3.2 "Explain Kubernetes pods in one sentence"
Tip: Browse all available models at ollama.ai/library

Step 3: Configure K8sGPT

k8sgpt auth add --backend ollama \
  --model llama3.2 \
  --baseurl http://localhost:11434

Custom Ollama URL: If Ollama is running on a different host or in a container, update the baseurl accordingly (e.g., http://ollama:11434).

Step 4: Analyze Your Cluster

Basic Analysis

# Analyze all namespaces
k8sgpt analyze --explain

# Analyze specific namespace
k8sgpt analyze --namespace default --explain

# Filter by resource type
k8sgpt analyze --filter Pod --explain

Example Output

AI Provider: ollama

0 default/broken-pod(Pod)
- Error: Back-off restarting failed container
- AI Analysis: This pod is experiencing continuous restart loops. 
  Common causes:
  1. Application crash on startup
  2. Misconfigured liveness/readiness probes
  3. Resource limits too restrictive
  
  Recommended actions:
  - Check pod logs: kubectl logs broken-pod -n default
  - Review probe configuration
  - Verify resource requests/limits

Advanced Configuration

Using Different Models

Switch models based on your needs:

# Use a larger model for better accuracy
k8sgpt auth remove --backend ollama
k8sgpt auth add --backend ollama \
  --model llama3.1:70b \
  --baseurl http://localhost:11434

# Use a code-specialized model
k8sgpt auth add --backend ollama \
  --model codellama \
  --baseurl http://localhost:11434

Kubernetes Operator Configuration

Use Ollama with the K8sGPT Operator:

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-ollama
spec:
  ai:
    enabled: true
    backend: ollama
    model: llama3.2
    baseUrl: http://ollama.ollama-system.svc.cluster.local:11434
  noCache: false
  version: v0.3.x

Note: Deploy Ollama in your cluster first, or point to an external Ollama instance.

Running Ollama in Kubernetes

Deploy Ollama alongside K8sGPT Operator:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ollama-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - name: ollama-data
          mountPath: /root/.ollama
        # Optional: GPU support
        # resources:
        #   limits:
        #     nvidia.com/gpu: 1
      volumes:
      - name: ollama-data
        persistentVolumeClaim:
          claimName: ollama-data
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: ollama-system
spec:
  selector:
    app: ollama
  ports:
  - port: 11434
    targetPort: 11434

After deploying, exec into the pod to pull your desired model: kubectl exec -it deployment/ollama -n ollama-system -- ollama pull llama3.2

Troubleshooting

❌ "Connection refused" error

Ollama service isn't running or wrong URL.

# Check if Ollama is running
curl http://localhost:11434

# Start Ollama
ollama serve

❌ "Model not found" error

The specified model hasn't been downloaded.

# List downloaded models
ollama list

# Pull the model
ollama pull llama3.2

⚠️ Slow inference

Model too large for your hardware.

Try a smaller model like llama3.2 or mistral.

⚠️ Out of memory errors

Insufficient RAM for the model.

Use a smaller model or add more RAM. Llama 3.2 works well on 8GB systems.

Performance Tips

  • 🚀
    GPU Acceleration:

    Ollama automatically uses NVIDIA GPUs if available. Install CUDA drivers for 10-100x speedup.

  • 💾
    Keep Models Loaded:

    Ollama keeps models in memory after first use. Subsequent analyses are much faster.

  • ⚖️
    Balance Size vs Quality:

    7B models (llama3.2, mistral) offer 90% of the quality at 10% of the cost of 70B models.

  • 🔄
    Use Caching:

    K8sGPT caches results by default. Disable with --no-cache only when needed.

You're All Set! 🎉

You now have a fully private, self-hosted K8sGPT setup with Ollama. Your cluster analysis data never leaves your infrastructure.

Next steps: Check out Operator Configuration to set up continuous monitoring, or explore other providers.