Excellent AI, Deep Learning
Limited-Time GPU Server Sale!

Powerful AI Servers, Supporting Diverse Models and Platforms!

AI Hosting Sales for Nvidia GPU Server

Enjoy up to 44% off on high-performance GPU hosting servers! Develop and deploy your models with GPU Mart’s on-demand Nvidia GPUs, starting at just $0.04 per hour.
  • GPU Server Price:
  • Parameters:
  • GPU Memory:
  • GPU Card Model:

Lite GPU Dedicated Server - K620

49.00/mo
1mo3mo12mo24mo
Order Now
  • 16GB RAM
  • GPU: Nvidia Quadro K620
  • Quad-Core Xeon E3-1270v3
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Maxwell
  • CUDA Cores: 384
  • GPU Memory: 2GB DDR3
  • FP32 Performance: 0.863 TFLOPS
  • Ideal for lightweight Android emulators, small LLMs, graphic processing, and more. Powerful than GPU VPS.

Express GPU Dedicated Server - P600

52.00/mo
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • GPU: Nvidia Quadro P600
  • Quad-Core Xeon E5-2643
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Pascal
  • CUDA Cores: 384
  • GPU Memory: 2GB GDDR5
  • FP32 Performance: 1.2 TFLOPS
Hot Sale

Express GPU Dedicated Server - P620

34.50/mo
50% OFF Recurring (Was $69.00)
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • GPU: Nvidia Quadro P620
  • Eight-Core Xeon E5-2670
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Pascal
  • CUDA Cores: 512
  • GPU Memory: 2GB GDDR5
  • FP32 Performance: 1.5 TFLOPS

Express GPU Dedicated Server - P1000

64.00/mo
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • GPU: Nvidia Quadro P1000
  • Eight-Core Xeon E5-2690
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Pascal
  • CUDA Cores: 640
  • GPU Memory: 4GB GDDR5
  • FP32 Performance: 1.894 TFLOPS
Hot Sale

Basic GPU Dedicated Server - GTX 1650

59.50/mo
50% OFF Recurring (Was $119.00)
1mo3mo12mo24mo
Order Now
  • Single GPU Specifications:
  • Microarchitecture: Turing
  • CUDA Cores: 896
  • GPU Memory: 4GB GDDR5
  • FP32 Performance: 3.0 TFLOPS

Basic GPU Dedicated Server - T1000

99.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • GPU: Nvidia Quadro T1000
  • Eight-Core Xeon E5-2690
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Turing
  • CUDA Cores: 896
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 2.5 TFLOPS
Fast AI-Cheap GPU Server!

Professional GPU VPS - A4000

98.45/mo
45% OFF Recurring (Was $179.00)
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • 24 CPU Cores
  • 320GB SSD
  • 300Mbps Unmetered Bandwidth
  • Once per 2 Weeks Backup
  • OS: Linux / Windows 10/ Windows 11
  • Dedicated GPU: Quadro RTX A4000
  • CUDA Cores: 6,144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS

Basic GPU Dedicated Server - GTX 1660

139.00/mo
1mo3mo12mo24mo
Order Now
  • Single GPU Specifications:
  • Microarchitecture: Turing
  • CUDA Cores: 1408
  • GPU Memory: 6GB GDDR6
  • FP32 Performance: 5.0 TFLOPS

Basic GPU Dedicated Server - RTX 4060

149.00/mo
1mo3mo12mo24mo
Order Now
  • Single GPU Specifications:
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 3072
  • Tensor Cores: 96
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 15.11 TFLOPS

Basic GPU Dedicated Server - RTX 5060

159.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • GPU: Nvidia GeForce RTX 5060
  • 24-Core Platinum 8160
  • 120GB SSD + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Blackwell 2.0
  • CUDA Cores: 4608
  • Tensor Cores: 144
  • GPU Memory: 8GB GDDR7
  • FP32 Performance: 23.22 TFLOPS

Professional GPU Dedicated Server - RTX 2060

199.00/mo
1mo3mo12mo24mo
Order Now
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 1920
  • Tensor Cores: 240
  • GPU Memory: 6GB GDDR6
  • FP32 Performance: 6.5 TFLOPS

Professional GPU Dedicated Server - P100

169.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: Nvidia Tesla P100
  • Dual 10-Core E5-2660v2
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Pascal
  • CUDA Cores: 3584
  • GPU Memory: 16 GB HBM2
  • FP32 Performance: 9.5 TFLOPS
  • Suitable for AI, Data Modeling, High Performance Computing, etc.

Advanced GPU Dedicated Server - RTX 3060 Ti

239.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: GeForce RTX 3060 Ti
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 4864
  • Tensor Cores: 152
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 16.2 TFLOPS

Advanced GPU Dedicated Server - A4000

209.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: Nvidia Quadro RTX A4000
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 6144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS

Advanced GPU Dedicated Server - V100

229.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: Nvidia V100
  • Dual 12-Core E5-2690v3
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Volta
  • CUDA Cores: 5,120
  • Tensor Cores: 640
  • GPU Memory: 16GB HBM2
  • FP32 Performance: 14 TFLOPS

Multi-GPU Dedicated Server - 2xRTX 4060

269.00/mo
1mo3mo12mo24mo
Order Now
  • Single GPU Specifications:
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 3072
  • Tensor Cores: 96
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 15.11 TFLOPS
Fast AI-Cheap GPU Server

Advanced GPU Dedicated Server - A5000

174.50/mo
50% OFF Recurring (Was $349.00)
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: Nvidia Quadro RTX A5000
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS

Multi-GPU Dedicated Server - 2xRTX 3060 Ti

319.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: 2 x GeForce RTX 3060 Ti
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 4864
  • Tensor Cores: 152
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 16.2 TFLOPS

Multi-GPU Dedicated Server - 2xRTX A4000

359.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: 2 x Nvidia RTX A4000
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 6144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS

Multi-GPU Dedicated Server - 3xRTX 3060 Ti

369.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: 3 x GeForce RTX 3060 Ti
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 4864
  • Tensor Cores: 152
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 16.2 TFLOPS

Enterprise GPU Dedicated Server - RTX 4090

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: GeForce RTX 4090
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 16,384
  • Tensor Cores: 512
  • GPU Memory: 24 GB GDDR6X
  • FP32 Performance: 82.6 TFLOPS

Enterprise GPU Dedicated Server - RTX A6000

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia Quadro RTX A6000
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

Enterprise GPU Dedicated Server - A40

439.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia A40
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 37.48 TFLOPS
New Arrival

Enterprise GPU Dedicated Server - RTX 5090

479.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: GeForce RTX 5090
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Blackwell 2.0
  • CUDA Cores: 21,760
  • Tensor Cores: 680
  • GPU Memory: 32 GB GDDR7
  • FP32 Performance: 109.7 TFLOPS

Multi-GPU Dedicated Server - 2xRTX A5000

439.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • GPU: 2 x Quadro RTX A5000
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS
Fast AI-Cheap GPU Server

Multi-GPU Dedicated Server - 3xV100

299.00/mo
50% OFF Recurring (Was $599.00)
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: 3 x Nvidia V100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Volta
  • CUDA Cores: 5,120
  • Tensor Cores: 640
  • GPU Memory: 16GB HBM2
  • FP32 Performance: 14 TFLOPS

Multi-GPU Dedicated Server - 3xRTX A5000

539.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: 3 x Quadro RTX A5000
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS
Fast AI-Cheap GPU Server

Enterprise GPU Dedicated Server - A100

399.50/mo
50% OFF First Month (Was $799.00)
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia A100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 6912
  • Tensor Cores: 432
  • GPU Memory: 40GB HBM2
  • FP32 Performance: 19.5 TFLOPS
  • 50% off for the first month, 25% off for every renewals.

Multi-GPU Dedicated Server- 2xRTX 4090

729.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: 2 x GeForce RTX 4090
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 16,384
  • Tensor Cores: 512
  • GPU Memory: 24 GB GDDR6X
  • FP32 Performance: 82.6 TFLOPS

Multi-GPU Dedicated Server - 3xRTX A6000

899.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: 3 x Quadro RTX A6000
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS
New Arrival

Multi-GPU Dedicated Server- 2xRTX 5090

859.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: 2 x GeForce RTX 5090
  • Dual Gold 6148
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Blackwell 2.0
  • CUDA Cores: 21,760
  • Tensor Cores: 680
  • GPU Memory: 32 GB GDDR7
  • FP32 Performance: 109.7 TFLOPS

Multi-GPU Dedicated Server - 2xA100

1099.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia A100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 6912
  • Tensor Cores: 432
  • GPU Memory: 40GB HBM2
  • FP32 Performance: 19.5 TFLOPS
  • Free NVLink Included

Multi-GPU Dedicated Server - 4xRTX A6000

1199.00/mo
1mo3mo12mo24mo
Order Now
  • 512GB RAM
  • GPU: 4 x Quadro RTX A6000
  • Dual 22-Core E5-2699v4
  • 240GB SSD + 4TB NVMe + 16TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS
Fast AI-Cheap GPU Server

Enterprise GPU Dedicated Server - A100(80GB)

1019.00/mo
40% OFF Recurring (Was $1699.00)
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia A100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 6912
  • Tensor Cores: 432
  • GPU Memory: 80GB HBM2e
  • FP32 Performance: 19.5 TFLOPS

Multi-GPU Dedicated Server - 8xV100

1499.00/mo
1mo3mo12mo24mo
  • 512GB RAM
  • GPU: 8 x Nvidia Tesla V100
  • Dual 22-Core E5-2699v4
  • 240GB SSD + 4TB NVMe + 16TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Volta
  • CUDA Cores: 5,120
  • Tensor Cores: 640
  • GPU Memory: 16GB HBM2
  • FP32 Performance: 14 TFLOPS
Fast AI-Cheap GPU Server

Multi-GPU Dedicated Server - 4xA100

1374.00/mo
45% OFF Recurring (Was $2499.00)
1mo3mo12mo24mo
Order Now
  • 512GB RAM
  • GPU: 4 x Nvidia A100
  • Dual 22-Core E5-2699v4
  • 240GB SSD + 4TB NVMe + 16TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 6912
  • Tensor Cores: 432
  • GPU Memory: 40GB HBM2
  • FP32 Performance: 19.5 TFLOPS
Fast AI-Cheap GPU Server

Enterprise GPU Dedicated Server - H100

1767.00/mo
32% OFF Recurring (Was $2599.00)
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia H100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Hopper
  • CUDA Cores: 14,592
  • Tensor Cores: 456
  • GPU Memory: 80GB HBM2e
  • FP32 Performance: 183TFLOPS

Multi-GPU Dedicated Server - 8xRTX A6000

2099.00/mo
1mo3mo12mo24mo
Order Now
  • 512GB RAM
  • GPU: 8 x Quadro RTX A6000
  • Dual 22-Core E5-2699v4
  • 240GB SSD + 4TB NVMe + 16TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

LLM Frameworks&Tools

LLM frameworks and tools simplify the complexities of working with LLMs by providing APIs, libraries, and utilities that streamline processes like training, inference, and model optimization.
Ollama-hosting
Ollama is a self-hosted AI platform designed to run open-source large language models. It provides quantized versions of popular models, significantly reducing model size and GPU requirements, making it ideal for small-scale projects, lightweight deployments, or early-stage testing. On this page, you’ll find benchmark results across various GPU servers, along with step-by-step setup guides to help you get started quickly and easily.
vLLM-hosting
vLLM is a high-performance LLM inference engine built for speed, scalability, and production readiness. Unlike Ollama, vLLM typically runs full-size, non-quantized models downloaded from Hugging Face, offering greater accuracy and performance—ideal for enterprise-grade applications and real-time deployment. On this page, explore vLLM’s capabilities, performance benchmarks across GPU servers, and optimized setup instructions for seamless deployment.

LLM Hosting with Ollama — GPU Recommendation

Model Name Size (4-bit Quantization) Recommended GPUs Tokens/s
deepSeek-r1:7B 4.7GB T1000 < RTX3060 Ti < RTX4060 < A4000 < RTX5060 < V100 26.70-87.10
deepSeek-r1:8B 5.2GB T1000 < RTX3060 Ti < RTX4060 < A4000 < RTX5060 < V100 21.51-87.03
deepSeek-r1:14B 9.0GB A4000 < A5000 < V100 30.2-48.63
deepSeek-r1:32B 20GB A5000 < RTX4090 < A100-40gb < RTX5090 24.21-45.51
deepSeek-r1:70B 43GB A40 < A6000 < 2A100-40gb < A100-80gb < H100 < 2RTX5090 13.65-27.03
deepseek-v2:236B 133GB 2A100-80gb < 2H100 --
llama3.2:1b 1.3GB P1000 < GTX1650 < GTX1660 < RTX2060 < T1000 < RTX3060 Ti < RTX4060 < RTX5060 28.09-100.10
llama3.1:8b 4.9GB T1000 < RTX3060 Ti < RTX4060 < RTX5060 < A4000 < V100 21.51-84.07
llama3:70b 40GB A40 < A6000 < 2A100-40gb < A100-80gb < H100 < 2RTX5090 13.15-26.85
llama3.2-vision:90b 55GB 2A100-40gb < A100-80gb < H100 < 2RTX5090 ~12-20
llama3.1:405b 243GB 8A6000 < 4A100-80gb < 4*H100 --
gemma2:2b 1.6GB P1000 < GTX1650 < GTX1660 < RTX2060 19.46-38.42
gemma3:4b 3.3GB GTX1650 < GTX1660 < RTX2060 < T1000 < RTX3060 Ti < RTX4060 < RTX5060 28.36-80.96
gemma3n:e2b 5.6GB T1000 < RTX3060 Ti < RTX4060 < RTX5060 30.26-56.36
gemma3n:e4b 7.5GB A4000 < A5000 < V100 < RTX4090 38.46-70.90
gemma3:12b 8.1GB A4000 < A5000 < V100 < RTX4090 30.01-67.92
gemma3:27b 17GB A5000 < RTX4090 < A100-40gb < H100 = RTX5090 28.79-47.33
qwen3:14b 9.3GB A4000 < A5000 < V100 30.05-49.38
qwen2.5:7b 4.7GB T1000 < RTX3060 Ti < RTX4060 < RTX5060 21.08-62.32
qwen2.5:72b 47GB 2A100-40gb < A100-80gb < H100 < 2RTX5090 19.88-24.15
qwen3:235b 142GB 4A100-40gb < 2H100 ~10-20
mistral:7b / openorca / lite / dolphin 4.1–4.4GB T1000 < RTX3060 < RTX4060 < RTX5060 23.79-73.17
mistral-nemo:12b 7.1GB A4000 < V100 38.46-67.51
mistral-small:22b / 24b 13–14GB A5000 < RTX4090 < RTX5090 37.07-65.07
mistral-large:123b 73GB A100-80gb < H100 ~30

LLM Hosting with vLLM + Hugging Face — GPU Recommendation

Model Name Size (16-bit Quantization) Recommended GPU(s) Concurrent Requests Tokens/s
deepseek-ai/deepseek-coder-6.7b-instruct ~13.4GB A5000 < RTX4090 50 1375–4120
deepseek-ai/DeepSeek-R1-Distill-Llama-8B ~16GB 2A4000 < 2V100 < A5000 < RTX4090 50 1450–2769
deepseek-ai/deepseek-coder-33b-instruct ~66GB A100-80gb < 2A100-40gb < 2A6000 < H100 50 570–1470
deepseek-ai/DeepSeek-R1-Distill-Llama-70B ~135GB 4*A6000 50 466
meta-llama/Llama-3.2-3B-Instruct 6.2GB A4000 < A5000 < V100 < RTX4090 50–300 1375–7214.10
meta-llama/Llama-3.3-70B-Instruct / 3.1-70B / Meta-3-70B 132GB 4A100-40gb, 2A100-80gb, 2*H100 50 ~295.52–990.61
google/gemma-3-4b-it 8.1GB A4000 < A5000 < V100 < RTX4090 50 2014.88–7214.10
google/gemma-2-9b-it 18GB A5000 < A6000 < RTX4090 50 951.23–1663.13
google/gemma-3-12b-it 23GB A100-40gb < 2*A100-40gb < H100 50 477.49–4193.44
google/gemma-3-27b-it 51GB 2*A100-40gb < A100-80gb < H100 50 1231.99–1990.61
Qwen/Qwen2-VL-2B-Instruct ~5GB A4000 < V100 50 ~3000
Qwen/Qwen2.5-VL-3B-Instruct ~7GB A5000 < RTX4090 50 2714.88–6980.31
Qwen/Qwen2.5-VL-7B-Instruct ~15GB A5000 < RTX4090 50 1333.92–4009.29
Qwen/Qwen2.5-VL-32B-Instruct ~65GB 2*A100-40gb < H100 50 577.17–1481.62
Qwen/Qwen2.5-VL-72B-Instruct-AWQ 137GB 4A100-40gb < 2H100 < 4*A6000 50 154.56–449.51
mistralai/Pixtral-12B-2409 ~25GB A100-40gb < A6000 < 2*RTX4090 50 713.45–861.14
mistralai/Mistral-Small-3.2-24B-Instruct-2506 ~47GB 2*A100-40gb < H100 50 ~1200–2000
mistralai/Pixtral-Large-Instruct-2411 292GB 8*A6000 50 ~466.32

Explanation:
Recommended GPUs: From left to right, performance from low to high
Tokens/s: from benchmark data.

Ollama GPU Benchmarks – Model Performance

We've benchmarked LLMs on GPUs including from P1000 to H100. These benchmarks provide insights into how different GPUs perform with Ollama across various model sizes, helping you choose the ideal AI hosting server

GPU Dedicated Server - P1000

GPU Dedicated Server - T1000

GPU Dedicated Server - GTX 1660

GPU Dedicated Server - RTX 4060

GPU Dedicated Server - RTX 2060

GPU Dedicated Server - RTX 3060 Ti

GPU Dedicated Server - A4000

GPU Dedicated Server - P100

GPU Dedicated Server - V100

GPU Dedicated Server - A5000

GPU Dedicated Server - RTX 4090

GPU Dedicated Server - RTX 5090

GPU Dedicated Server - A40

GPU Dedicated Server - RTX A6000

GPU Dedicated Server - A100(40GB)

Multi-GPU Dedicated Server - 2xRTX 5090

Multi-GPU Dedicated Server - 2xA100(2x40GB)

GPU Dedicated Server - H100

vLLama GPU Benchmarks – Model Performance

We've benchmarked LLMs on GPUs includingA5000, T1000, A40, A6000,RTX 4090,Dual RTX 4090, A100 40GB, Dual A100,4xA100 3xV100,A100 80GB,H100, ,4xA6000. Explore the results to select the ideal GPU server for your workload.

GPU Dedicated Server - A5000

GPU Dedicated Server - A40

GPU Dedicated Server - RTX A6000

GPU Dedicated Server - RTX 4090

Multi-GPU Dedicated Server - 2xRTX 4090

GPU Dedicated Server - A100 (40GB)

GPU Dedicated Server - V100

GPU Dedicated Server - A100 (80GB)

GPU Dedicated Server - H100

Multi-GPU Dedicated Server - 2xA100 (2x40GB)

GPU Dedicated Server - A100 (4x40GB)

GPU Dedicated Server - A6000 (4xA6000)

What Clients Say about our AI hosting GPU Server?

Delivering exceptional service and support is our highest priority at GPU Mart. Here’s a glimpse of what our clients have said about their experience with our GPU server services.
We’ve been using their GPU servers to run 70B models, and the performance is incredible. The 48GB GPUs handle everything seamlessly, and the setup process was a breeze. Highly recommend for anyone working with large AI models!
We evaluated multiple GPU server providers, and they offer the best value for money. Compared to other vendors, we achieved better performance at a lower cost here. It's especially suitable for teams with limited budgets but high computing power demands.
We needed a reliable server for our 14B models, and their 16GB GPU plan was exactly what we needed. The flexibility to choose our preferred inference engine made deployment so easy. Great service!
We started with a single GPU and upgraded to a multi-GPU setup as our needs grew. The scalability is fantastic, and the pricing is very competitive. Perfect for growing AI teams!
We tested their servers with a free trial, and the performance was so impressive that we signed up immediately. Being able to benchmark our models before committing was a huge plus. Highly recommend!
We needed a server that could handle our unique environment, and they delivered. The ability to customize the setup and choose our own tools made all the difference.
We use the RTX 4090 server for AI-generated artwork, style transfer, and automated image editing, and its performance has exceeded our expectations. Compared to consumer-grade GPUs, this server offers superior stability and computing power, making it ideal for professional AI image processing tasks.

Questions About AI Hosting Promotion

Find answers to your most common questions in our FAQ section. For personalized recommendations or further assistance, don't hesitate to reach out to our online support team.

1. What is an AI hosting server, and how does it work?

GPU Mart's provide GPU-powered physical servers (bare metal) with dedicated IP access. You can remotely log in, choose your preferred LLM inference engine, and deploy your AI models effortlessly.

2. Which platforms are supported?

There are no platform restrictions. However, different platforms may quantize models differently, which can affect the final model size and performance.

3. What GPU memory is required for a 14B model?

We recommend a 16GB GPU for running 14B models efficiently.

4. What GPU memory is required for a 32B model?

For 32B models, we recommend a GPU with 24GB or more memory.

5. What GPU memory is required for a 70B model?

To run 70B models smoothly, we recommend a GPU with 48GB or more memory.

6. When should I choose a multi-GPU plan?

A multi-GPU plan is ideal when a single GPU cannot handle higher concurrency or larger model sizes. If your workload demands more power, consider upgrading to a multi-GPU setup.

7. Can I upgrade my server configuration later?

Yes! You can upgrade GPU memory and storage space. Some servers also support adding additional GPUs. Contact us for custom upgrade options.

8. Can I run benchmarks on my own models before committing?

Yes, we offer free trials for select products. Reach out to us to request a free trial and test your models.

9. Is server maintenance included, or am I responsible for it?

We handle all server maintenance, so you can focus on running your AI tasks without worrying about hardware management.

10. Can I customize the server environment to fit my needs?

Absolutely! You have full control to configure the server environment according to your requirements.

11. Can I use your servers for both inference and training tasks?

Our servers are optimized for inference and reasoning tasks. For training, please contact us to discuss your specific needs.

12. How many GPU servers can I buy with the AI hosting promotion?

Limited for 3 GPU dedicated server plans. If you require bulk purchasing, please contact our sales team for a unique discount arrangement.

13. What's the minimum duration for a GPU server order?

You can order AI hosting GPU server for any duration of one month or longer.

14. What's the meaning of recurring discount?

'Recurring discount' means your discount will still be available when you renew a AI hosting/machine learning server.

15. Can I get a discount for my existing GPU server?

Unfortunately, AI hosting promotions are only available for new GPU server orders. However, you can contact our sales team to inquire about special renewal discounts.

16. Will the discount remain if I upgrade/downgrade the plan after the promotion?

No, the discount will not be valid if the target plan is excluded from the AI hosting GPU server promotion.

17. What payment methods do you accept?

We accept Visa, MasterCard, American Express, JCB, Discover, Diners Club, PayPal, Wire Transfer, and Check. Note that non-instant payment methods will delay service deployment until the payment clears. Wire Transfers must be over $100. Paper checks are only for U.S. clients.

18. How long will it take to set up my server?

Typically, GPU dedicated server setup takes 20-40 minutes. Customized GPU server will take longer.

19. Can I get a free trial before payment?

We offer a 24-hour free trial for new clients who wish to test our GPU server. To request a trial server, please follow these steps:


Step 1: Submit a Free Trial Request
Select a plan, click 'Order Now,' and leave a note saying 'Need free trial.' Then, click 'Check Out' and proceed to the Order Confirm page. On this page, you must click 'Confirm' to complete the free trial request.

Step 2: Security Verification
This process takes about 30 minutes to 2 hours. Once verified, you will receive the server login details in the console and can start using it.If your trial request is not approved, you will be notified via email.

Custom Servers

Can't find your ideal server? Send us your custom requirements, and our sales rep will provide a tailored solution for you.

Server Inquiry

Confused about choosing a server or have questions? Consult online support for recommendations.