Blackwell Architecture
AI-optimized Tensor Cores deliver next-gen inferencing and neural graphics performance. RT Cores accelerate real-time ray tracing for photorealistic rendering.
Advanced GPU VPS- RTX Pro 5000
| Models | gpt-oss | deepseek-r1 | deepseek-r1 | deepseek-r1 | gemma3 | llama3.3 | qwen3 | qwen2.5 |
|---|---|---|---|---|---|---|---|---|
| Parameters | 20b | 14b | 32b | 70b | 27b | 70b | 32b | 72b |
| Size (GB) | 14 | 9 | 20 | 43 | 17 | 43 | 20 | 47 |
| GPU UTL | 61% | 75% | 85% | 93% | 78% | 91% | 90% | 93% |
| Eval Rate (tokens/s) | 175.26 | 110.01 | 53.53 | 26.07 | 51.10 | 26.05 | 47.58 | 23.72 |
Note: The models are all from the Ollama library. For more testing data, please visit: https://www.databasemart.com/blog/ollama-gpu-benchmark-pro5000
| Models | Llama-3.1-8B | Qwen3-8B | gemma-3-12b-it | gpt-oss-20b | DeepSeek-R1-Distill-Qwen-14B | Qwen3-14B |
|---|---|---|---|---|---|---|
| Quantization | 16 | 16 | 16 | 4 | 16 | 16 |
| Size(GB) | 15GB | 15GB | 23GB | 13GB | 28GB | 28GB |
| Request Numbers | 50 | 50 | 50 | 50 | 50 | 50 |
| Benchmark Duration(s) | 14.19 | 14.12 | 23.55 | 10.42 | 23.84 | 23.98 |
| Request (req/s) | 3.52 | 3.54 | 2.12 | 4.80 | 2.10 | 2.09 |
| Input (tokens/s) | 348.77 | 354.21 | 210.15 | 479.8 | 207.67 | 208.53 |
| Output (tokens/s) | 2113.77 | 2125.27 | 1273.66 | 2878.77 | 1258.62 | 1251.18 |
| Total Throughput (tokens/s) | 2462.54 | 2479.48 | 1483.81 | 3358.57 | 1466.29 | 1459.71 |
Note: The models are all from the Hugging Face library. For more testing data, please visit: https://www.databasemart.com/blog/vllm-gpu-benchmark-pro5000