What quantization should I use for NVIDIA A100 SXM 80GB?

With 80GB, Q4_K_M is the community default. Q6_K or Q8_0 are options if quality matters more and the model fits.

How much VRAM does Yi 34B need?

Yi 34B has 34.39B parameters. At Q8_0 (8-bit), it requires approximately 36.5GB storage plus ~5.5GB for KV cache, totaling roughly 0GB.

Can Your Hardware Run AI Models Locally?

Check if your GPU or system can run Llama 3, DeepSeek, Qwen, Mistral, and more. Real computed numbers — speed, fit, and headroom for 26,290 hardware+model combinations.

Model

Hardware

26,290 Combinations

56 Models

55 Hardware

1 Benchmarks

Popular Combinations

Llama 3.1 70B on NVIDIA GeForce RTX 4090 ✗ Does not fit Llama 3.1 8B on NVIDIA GeForce RTX 4060 ✓ 56.10 tok/s DeepSeek-R1-Distill-Llama-70B on Apple M4 Max (128GB) ✓ 12.80 tok/s Mixtral 8x7B (MoE) on NVIDIA GeForce RTX 4090 ✗ Does not fit Qwen 2.5 72B on NVIDIA RTX 6000 Ada Generation ✗ Does not fit Llama 3.1 8B on Apple M1 Max (32GB) ✓ 82.50 tok/s DeepSeek-R1-Distill-Qwen-32B on NVIDIA GeForce RTX 4090 ✓ 51.00 tok/s Qwen3 30B-A3B (MoE) on NVIDIA GeForce RTX 4070 ✗ Does not fit

How It Works

Pick a model — choose from 56 open-weight LLMs
Pick your hardware — GPU, Apple Silicon, or system RAM
Get the numbers — estimated tok/s, fit quality, and headroom from real bandwidth math

Read the full methodology →