Using Ollama for Local LLMs
Run K8sGPT with local LLMs using Ollama for complete data privacy and air-gapped environments
Overview
Ollama enables you to run large language models locally on your machine. This tutorial shows you how to integrate Ollama with K8sGPT for completely private, self-hosted Kubernetes cluster analysis.
Benefits of Using Ollama
- 100% data privacy - no data leaves your infrastructure
- Works in air-gapped environments
- No API costs or rate limits
- Low latency (local inference)
- Easy model switching and experimentation
Prerequisites
- K8sGPT installed (
brew install k8sgptor see installation docs) - A Kubernetes cluster to analyze
- Sufficient disk space for models (2-4GB per model)
- For larger models: 16GB+ RAM recommended
Step 1: Install Ollama
macOS / Linux
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
ollama serveWindows
Download the installer from ollama.ai/download
Docker
docker run -d -v ollama:/root/.ollama \
-p 11434:11434 --name ollama ollama/ollamaStep 2: Download a Model
Choose a model based on your hardware and accuracy needs:
| Model | Size | RAM | Best For |
|---|---|---|---|
| llama3.2 | 2GB | 8GB | Fast, good balance |
| mistral | 4.1GB | 8GB | Great performance |
| llama3.1:8b | 4.7GB | 8GB | High quality |
| llama3.1:70b | 40GB | 64GB | Best accuracy |
| codellama | 3.8GB | 8GB | Code-focused |
Download and Run a Model
# Recommended: Llama 3.2 (fast, good quality)
ollama pull llama3.2
# Or try Mistral
ollama pull mistral
# Test the model
ollama run llama3.2 "Explain Kubernetes pods in one sentence"Step 3: Configure K8sGPT
k8sgpt auth add --backend ollama \
--model llama3.2 \
--baseurl http://localhost:11434Custom Ollama URL: If Ollama is running on a different host or in a container, update the baseurl accordingly (e.g., http://ollama:11434).
Step 4: Analyze Your Cluster
Basic Analysis
# Analyze all namespaces
k8sgpt analyze --explain
# Analyze specific namespace
k8sgpt analyze --namespace default --explain
# Filter by resource type
k8sgpt analyze --filter Pod --explainExample Output
AI Provider: ollama
0 default/broken-pod(Pod)
- Error: Back-off restarting failed container
- AI Analysis: This pod is experiencing continuous restart loops.
Common causes:
1. Application crash on startup
2. Misconfigured liveness/readiness probes
3. Resource limits too restrictive
Recommended actions:
- Check pod logs: kubectl logs broken-pod -n default
- Review probe configuration
- Verify resource requests/limitsAdvanced Configuration
Using Different Models
Switch models based on your needs:
# Use a larger model for better accuracy
k8sgpt auth remove --backend ollama
k8sgpt auth add --backend ollama \
--model llama3.1:70b \
--baseurl http://localhost:11434
# Use a code-specialized model
k8sgpt auth add --backend ollama \
--model codellama \
--baseurl http://localhost:11434Kubernetes Operator Configuration
Use Ollama with the K8sGPT Operator:
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-ollama
spec:
ai:
enabled: true
backend: ollama
model: llama3.2
baseUrl: http://ollama.ollama-system.svc.cluster.local:11434
noCache: false
version: v0.3.xNote: Deploy Ollama in your cluster first, or point to an external Ollama instance.
Running Ollama in Kubernetes
Deploy Ollama alongside K8sGPT Operator:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama-system
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
volumeMounts:
- name: ollama-data
mountPath: /root/.ollama
# Optional: GPU support
# resources:
# limits:
# nvidia.com/gpu: 1
volumes:
- name: ollama-data
persistentVolumeClaim:
claimName: ollama-data
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ollama-system
spec:
selector:
app: ollama
ports:
- port: 11434
targetPort: 11434After deploying, exec into the pod to pull your desired model: kubectl exec -it deployment/ollama -n ollama-system -- ollama pull llama3.2
Troubleshooting
❌ "Connection refused" error
Ollama service isn't running or wrong URL.
# Check if Ollama is running
curl http://localhost:11434
# Start Ollama
ollama serve❌ "Model not found" error
The specified model hasn't been downloaded.
# List downloaded models
ollama list
# Pull the model
ollama pull llama3.2⚠️ Slow inference
Model too large for your hardware.
Try a smaller model like llama3.2 or mistral.
⚠️ Out of memory errors
Insufficient RAM for the model.
Use a smaller model or add more RAM. Llama 3.2 works well on 8GB systems.
Performance Tips
- 🚀GPU Acceleration:
Ollama automatically uses NVIDIA GPUs if available. Install CUDA drivers for 10-100x speedup.
- 💾Keep Models Loaded:
Ollama keeps models in memory after first use. Subsequent analyses are much faster.
- ⚖️Balance Size vs Quality:
7B models (llama3.2, mistral) offer 90% of the quality at 10% of the cost of 70B models.
- 🔄Use Caching:
K8sGPT caches results by default. Disable with
--no-cacheonly when needed.
You're All Set! 🎉
You now have a fully private, self-hosted K8sGPT setup with Ollama. Your cluster analysis data never leaves your infrastructure.
Next steps: Check out Operator Configuration to set up continuous monitoring, or explore other providers.