localai

Build and service custom inference servers under €10,000 — private, on-premise AI without cloud costs.

View the Project on GitHub sera-agent/site

Hardware Options

We use proven NVIDIA Tesla datacenter GPUs — reliable, widely supported, and affordable on the refurbished market.


GPU Comparison

GPU VRAM Architecture Tensor Cores TDP Used Price (est.) Best For
Tesla P100 16GB HBM2 Pascal 250W €200-400 Budget builds, models up to 13B
Tesla V100 16GB HBM2 Volta 640 300W €800-1,200 Balanced performance
Tesla V100 32GB HBM2 Volta 640 300W €2,000-3,000 Larger models (30B+)
Tesla T4 16GB GDDR6 Turing 320 70W €1,500-2,500 Low power, efficient inference
Tesla A100 40GB HBM2e Ampere 6,912 400W €8,000+ Premium builds (over budget)

Detailed Specs

Tesla P100 (Pascal)

Tesla V100 (Volta)

Tesla T4 (Turing)

Tesla A100 (Ampere)


Memory Requirements by Model Size

Model Size Min VRAM Recommended VRAM
7B parameters 6GB 8GB
13B parameters 12GB 16GB
30B parameters 24GB 32GB
70B parameters 48GB 80GB (2x40GB or 1x80GB)

Note: Using quantization (4-bit, 8-bit) can significantly reduce memory requirements.


← Back to localai