- Models

Best LLM Models for 16GB VRAM in 2026 (Tested and Ranked)
Sixteen gigabytes of VRAM puts you in an interesting position. It’s enough to run something genuinely worthwhile — not just…
Read More » - Models

Best LLM Models for 12GB VRAM in 2026 (Tested and Ranked)
12GB VRAM is an interesting tier in 2026. It’s no longer the sweet spot – that’s shifted to 16GB –…
Read More » - Models

Best LLM Models for 8GB VRAM in 2026 (Tested & Ranked)
Eight gigabytes of VRAM has been the default mid-range GPU spec for so long that “can I run a decent…
Read More » - Learn

Best Prompt Settings for Local LLMs: Temperature, Top-p, Min-p
Somewhere on Reddit, three years ago, someone posted their sampling settings for a Llama-2 finetune. Those settings spread like a…
Read More » - Learn

Tokens Per Second (t/s) Explained: Beginner’s Guide to LLM Speed
You’ve watched it happen. You type a question into ChatGPT or your local LLM, hit enter, and the answer starts…
Read More » - Performance

GGUF vs EXL2 vs AWQ: Which Is Fastest on NVIDIA in 2026?
The quick answer (May 2026) For single-user inference on a modern NVIDIA card, EXL2 still has a small edge in…
Read More » - Hardware

RTX 5090 32GB AI LLM Performance Guide: 2026 Benchmarks
The RTX 5090 has been on shelves for over a year now, and the local LLM scene around it has…
Read More »