Local LLM, GPUs and Self-Hosted AI Guides: Quantized.fyi

Models
Tobiasz GromyszMay 16, 2026
0 386
Best LLM Models for 16GB VRAM in 2026 (Tested and Ranked)
Sixteen gigabytes of VRAM puts you in an interesting position. It’s enough to run something genuinely worthwhile — not just…
Read More »
Models
Tobiasz GromyszMay 7, 2026
0 273
Best LLM Models for 12GB VRAM in 2026 (Tested and Ranked)
12GB VRAM is an interesting tier in 2026. It’s no longer the sweet spot – that’s shifted to 16GB –…
Read More »
Models
Tobiasz GromyszMay 7, 2026
0 380
Best LLM Models for 8GB VRAM in 2026 (Tested & Ranked)
Eight gigabytes of VRAM has been the default mid-range GPU spec for so long that “can I run a decent…
Read More »
Learn
Tobiasz GromyszMay 7, 2026
0 118
Best Prompt Settings for Local LLMs: Temperature, Top-p, Min-p
Somewhere on Reddit, three years ago, someone posted their sampling settings for a Llama-2 finetune. Those settings spread like a…
Read More »
Learn
Tobiasz GromyszMay 7, 2026
0 86
Tokens Per Second (t/s) Explained: Beginner’s Guide to LLM Speed
You’ve watched it happen. You type a question into ChatGPT or your local LLM, hit enter, and the answer starts…
Read More »
Performance
Tobiasz GromyszMay 3, 2026
0 109
GGUF vs EXL2 vs AWQ: Which Is Fastest on NVIDIA in 2026?
The quick answer (May 2026) For single-user inference on a modern NVIDIA card, EXL2 still has a small edge in…
Read More »
Hardware
Tobiasz GromyszMay 3, 2026
0 281
RTX 5090 32GB AI LLM Performance Guide: 2026 Benchmarks
The RTX 5090 has been on shelves for over a year now, and the local LLM scene around it has…
Read More »