

Blog

Partner

About Us

Hot GPU Discounts

Maximize AI Potential – High-Speed GPU Servers Up to 50% OFF! Order Now!

Excellent AI, Deep Learning
Limited-Time GPU Server Sale!

Name: GPU Hosting for Diverse AI Models
Brand: GPU Mart
SKU: 0446310788
Price: 29 USD
Availability: InStock
Rating: 4.7 (1290 reviews)

Powerful AI Servers, Supporting Diverse Models and Platforms!

Supports 20+ mainstream models, including Deepseek-R1, Llama2, Llama3, Mistral, Gemma, Qwen, Phi, Llava, WizardLM2, TinyLlama, Nemotron-Mini, qwq, Falcon, and more!
Easily handles models ranging from 0.5B to 110B, fully meeting the needs of lightweight to large-scale AI tasks!
Compatible with multiple platforms, including Ollama, llama.cpp, LM Studio, Text Generation WebUI, Hugging Face Transformers + Local Deployment, rwkv.cpp, vLLM, FastChat, and other mainstream LLM inference engines!
Wide GPU memory range, from 1GB to 384GB, offering flexible solutions for AI workloads of all scales!

AI Hosting Sales for Nvidia GPU Server

Enjoy up to 44% off on high-performance GPU hosting servers! Develop and deploy your models with GPU Mart’s on-demand Nvidia GPUs, starting at just $0.04 per hour.

All Plans
New Arrivals
Promotions

GPU Server Price:
Under $50
$50 to $100
$100 to $200
$200 to $500
$500 & Above

Parameters:
1.5b
4b
7b
14b
32b
70b
72b
110b
671b

GPU Memory:
1 GB
2 GB
4 GB
6 GB
8 GB
16 GB
24 GB
32 GB
40 GB
48 GB
72 GB
80 GB
128 GB
144 GB
160 GB
192 GB
384 GB

GPU Card Model:
P600
P620
P1000
T1000
GTX 1650
GTX 1660
RTX 2060
RTX 3060 Ti
RTX A4000
RTX A5000
RTX A6000
RTX 4060
RTX 4090
RTX 5060
RTX 5090
V100
P100
A40
A100
H100

Lite GPU Dedicated Server - K620

$ 49.00/mo

1mo3mo12mo24mo

Order Now

16GB RAM
GPU: Nvidia Quadro K620
Quad-Core Xeon E3-1270v3
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Maxwell
CUDA Cores: 384
GPU Memory: 2GB DDR3
FP32 Performance: 0.863 TFLOPS

Ideal for lightweight Android emulators, small LLMs, graphic processing, and more. Powerful than GPU VPS.

Express GPU Dedicated Server - P600

$ 52.00/mo

1mo3mo12mo24mo

Order Now

32GB RAM
GPU: Nvidia Quadro P600
Quad-Core Xeon E5-2643
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Pascal
CUDA Cores: 384
GPU Memory: 2GB GDDR5
FP32 Performance: 1.2 TFLOPS

Hot Sale

Express GPU Dedicated Server - P620

$ 34.50/mo

50% OFF Recurring (Was $69.00)

1mo3mo12mo24mo

Order Now

32GB RAM
GPU: Nvidia Quadro P620
Eight-Core Xeon E5-2670
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Pascal
CUDA Cores: 512
GPU Memory: 2GB GDDR5
FP32 Performance: 1.5 TFLOPS

Express GPU Dedicated Server - P1000

$ 64.00/mo

1mo3mo12mo24mo

Order Now

32GB RAM
GPU: Nvidia Quadro P1000
Eight-Core Xeon E5-2690
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Pascal
CUDA Cores: 640
GPU Memory: 4GB GDDR5
FP32 Performance: 1.894 TFLOPS

Basic GPU Dedicated Server - GTX 1650

$ 99.00/mo

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce GTX 1650
Eight-Core Xeon E5-2667v3
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Turing
CUDA Cores: 896
GPU Memory: 4GB GDDR5
FP32 Performance: 3.0 TFLOPS

Hot Sale

Basic GPU Dedicated Server - T1000

$ 59.50/mo

50% OFF Recurring (Was $119.00)

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia Quadro T1000
Eight-Core Xeon E5-2690
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Turing
CUDA Cores: 896
GPU Memory: 8GB GDDR6
FP32 Performance: 2.5 TFLOPS

Fast AI-Cheap GPU Server!

Professional GPU VPS - A4000

$ 99.00/mo

44% OFF Recurring (Was $179.00)

1mo3mo12mo24mo

Order Now

32GB RAM
24 CPU Cores
320GB SSD
300Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: Quadro RTX A4000
CUDA Cores: 6,144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Basic GPU Dedicated Server - GTX 1660

$ 139.00/mo

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce GTX 1660
Dual 8-Core Xeon E5-2660
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Turing
CUDA Cores: 1408
GPU Memory: 6GB GDDR6
FP32 Performance: 5.0 TFLOPS

Basic GPU Dedicated Server - RTX 4060

$ 149.00/mo

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce RTX 4060
Eight-Core E5-2690
120GB SSD + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 3072
Tensor Cores: 96
GPU Memory: 8GB GDDR6
FP32 Performance: 15.11 TFLOPS

Basic GPU Dedicated Server - RTX 5060

$ 159.00/mo

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce RTX 5060
24-Core Platinum 8160
120GB SSD + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Blackwell 2.0
CUDA Cores: 4608
Tensor Cores: 144
GPU Memory: 8GB GDDR7
FP32 Performance: 23.22 TFLOPS

Professional GPU Dedicated Server - RTX 2060

$ 199.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia GeForce RTX 2060
Dual 8-Core E5-2660
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 1920
Tensor Cores: 240
GPU Memory: 6GB GDDR6
FP32 Performance: 6.5 TFLOPS

Professional GPU Dedicated Server - P100

$ 169.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Tesla P100
Dual 8-Core E5-2660
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Pascal
CUDA Cores: 3584
GPU Memory: 16 GB HBM2
FP32 Performance: 9.5 TFLOPS

Suitable for AI, Data Modeling, High Performance Computing, etc.

Advanced GPU Dedicated Server - RTX 3060 Ti

$ 239.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: GeForce RTX 3060 Ti
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 4864
Tensor Cores: 152
GPU Memory: 8GB GDDR6
FP32 Performance: 16.2 TFLOPS

Advanced GPU Dedicated Server - A4000

$ 209.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A4000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Advanced GPU Dedicated Server - V100

$ 229.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia V100
Dual 12-Core E5-2690v3
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Volta
CUDA Cores: 5,120
Tensor Cores: 640
GPU Memory: 16GB HBM2
FP32 Performance: 14 TFLOPS

Multi-GPU Dedicated Server - 2xRTX 4060

$ 269.00/mo

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: 2 x Nvidia GeForce RTX 4060
Eight-Core E5-2690
120GB SSD + 960GB SSD
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 3072
Tensor Cores: 96
GPU Memory: 8GB GDDR6
FP32 Performance: 15.11 TFLOPS

Fast AI-Cheap GPU Server

Advanced GPU Dedicated Server - A5000

$ 174.50/mo

50% OFF Recurring (Was $349.00)

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A5000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

Multi-GPU Dedicated Server - 2xRTX 3060 Ti

$ 319.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: 2 x GeForce RTX 3060 Ti
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 4864
Tensor Cores: 152
GPU Memory: 8GB GDDR6
FP32 Performance: 16.2 TFLOPS

Multi-GPU Dedicated Server - 2xRTX A4000

$ 359.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: 2 x Nvidia RTX A4000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Multi-GPU Dedicated Server - 3xRTX 3060 Ti

$ 369.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 3 x GeForce RTX 3060 Ti
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 4864
Tensor Cores: 152
GPU Memory: 8GB GDDR6
FP32 Performance: 16.2 TFLOPS

New Arrival

Advanced GPU VPS - RTX 5090

$ 339.00/mo

1mo3mo12mo24mo

Order Now

96GB RAM
32 CPU Cores
400GB SSD
500Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: GeForce RTX 5090
CUDA Cores: 21,760
Tensor Cores: 680
GPU Memory: 32GB GDDR7
FP32 Performance: 109.7 TFLOPS

Enterprise GPU Dedicated Server - RTX 4090

$ 409.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: GeForce RTX 4090
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 16,384
Tensor Cores: 512
GPU Memory: 24 GB GDDR6X
FP32 Performance: 82.6 TFLOPS

Enterprise GPU Dedicated Server - RTX A6000

$ 409.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia Quadro RTX A6000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

Fast AI-Cheap GPU Server

Enterprise GPU Dedicated Server - A40

$ 318.00/mo

42% OFF Recurring (Was $549.00)

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia A40
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 37.48 TFLOPS

New Arrival

Enterprise GPU Dedicated Server - RTX 5090

$ 479.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: GeForce RTX 5090
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Blackwell 2.0
CUDA Cores: 21,760
Tensor Cores: 680
GPU Memory: 32 GB GDDR7
FP32 Performance: 109.7 TFLOPS

Multi-GPU Dedicated Server - 2xRTX A5000

$ 439.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: 2 x Quadro RTX A5000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

Multi-GPU Dedicated Server - 3xV100

$ 469.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 3 x Nvidia V100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Volta
CUDA Cores: 5,120
Tensor Cores: 640
GPU Memory: 16GB HBM2
FP32 Performance: 14 TFLOPS

Fast AI-Cheap GPU Server

Multi-GPU Dedicated Server - 3xRTX A5000

$ 447.00/mo

36% OFF Recurring (Was $699.00)

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 3 x Quadro RTX A5000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

Fast AI-Cheap GPU Server

Enterprise GPU Dedicated Server - A100

$ 399.50/mo

50% OFF First Month (Was $799.00)

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia A100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6912
Tensor Cores: 432
GPU Memory: 40GB HBM2
FP32 Performance: 19.5 TFLOPS

50% off for the first month, 25% off for every renewals.

Multi-GPU Dedicated Server- 2xRTX 4090

$ 729.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 2 x GeForce RTX 4090
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 16,384
Tensor Cores: 512
GPU Memory: 24 GB GDDR6X
FP32 Performance: 82.6 TFLOPS

Multi-GPU Dedicated Server - 3xRTX A6000

$ 899.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 3 x Quadro RTX A6000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

New Arrival

Multi-GPU Dedicated Server- 2xRTX 5090

$ 859.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 2 x GeForce RTX 5090
Dual E5-2699v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Blackwell 2.0
CUDA Cores: 21,760
Tensor Cores: 680
GPU Memory: 32 GB GDDR7
FP32 Performance: 109.7 TFLOPS

Multi-GPU Dedicated Server - 2xA100

$ 1099.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia A100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6912
Tensor Cores: 432
GPU Memory: 40GB HBM2
FP32 Performance: 19.5 TFLOPS
Free NVLink Included

Multi-GPU Dedicated Server - 4xRTX A6000

$ 1199.00/mo

1mo3mo12mo24mo

Order Now

512GB RAM
GPU: 4 x Quadro RTX A6000
Dual 22-Core E5-2699v4
240GB SSD + 4TB NVMe + 16TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

Enterprise GPU Dedicated Server - A100(80GB)

$ 1559.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia A100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6912
Tensor Cores: 432
GPU Memory: 80GB HBM2e
FP32 Performance: 19.5 TFLOPS

Multi-GPU Dedicated Server - 8xV100

$ 1499.00/mo

1mo3mo12mo24mo

512GB RAM
GPU: 8 x Nvidia Tesla V100
Dual 22-Core E5-2699v4
240GB SSD + 4TB NVMe + 16TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Volta
CUDA Cores: 5,120
Tensor Cores: 640
GPU Memory: 16GB HBM2
FP32 Performance: 14 TFLOPS

Multi-GPU Dedicated Server - 4xA100

$ 1899.00/mo

1mo3mo12mo24mo

Order Now

512GB RAM
GPU: 4 x Nvidia A100
Dual 22-Core E5-2699v4
240GB SSD + 4TB NVMe + 16TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6912
Tensor Cores: 432
GPU Memory: 40GB HBM2
FP32 Performance: 19.5 TFLOPS

Enterprise GPU Dedicated Server - H100

$ 2099.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia H100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Hopper
CUDA Cores: 14,592
Tensor Cores: 456
GPU Memory: 80GB HBM2e
FP32 Performance: 183TFLOPS

Multi-GPU Dedicated Server - 8xRTX A6000

$ 2099.00/mo

1mo3mo12mo24mo

Order Now

512GB RAM
GPU: 8 x Quadro RTX A6000
Dual 22-Core E5-2699v4
240GB SSD + 4TB NVMe + 16TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

Explore More Our GPU Servers

LLM Frameworks&Tools

LLM frameworks and tools simplify the complexities of working with LLMs by providing APIs, libraries, and utilities that streamline processes like training, inference, and model optimization.

Ollama Hosting

Ollama is a self-hosted AI platform designed to run open-source large language models. It provides quantized versions of popular models, significantly reducing model size and GPU requirements, making it ideal for small-scale projects, lightweight deployments, or early-stage testing. On this page, you’ll find benchmark results across various GPU servers, along with step-by-step setup guides to help you get started quickly and easily.

vLLM Hosting

vLLM is a high-performance LLM inference engine built for speed, scalability, and production readiness. Unlike Ollama, vLLM typically runs full-size, non-quantized models downloaded from Hugging Face, offering greater accuracy and performance—ideal for enterprise-grade applications and real-time deployment. On this page, explore vLLM’s capabilities, performance benchmarks across GPU servers, and optimized setup instructions for seamless deployment.

LLM Hosting with Ollama — GPU Recommendation

Model Name	Size (4-bit Quantization)	Recommended GPUs	Tokens/s
deepSeek-r1:7B	4.7GB	T1000 < RTX3060 Ti < RTX4060 < A4000 < RTX5060 < V100	26.70-87.10
deepSeek-r1:8B	5.2GB	T1000 < RTX3060 Ti < RTX4060 < A4000 < RTX5060 < V100	21.51-87.03
deepSeek-r1:14B	9.0GB	A4000 < A5000 < V100	30.2-48.63
deepSeek-r1:32B	20GB	A5000 < RTX4090 < A100-40gb < RTX5090	24.21-45.51
deepSeek-r1:70B	43GB	A40 < A6000 < 2A100-40gb < A100-80gb < H100 < 2RTX5090	13.65-27.03
deepseek-v2:236B	133GB	2A100-80gb < 2H100	--
llama3.2:1b	1.3GB	P1000 < GTX1650 < GTX1660 < RTX2060 < T1000 < RTX3060 Ti < RTX4060 < RTX5060	28.09-100.10
llama3.1:8b	4.9GB	T1000 < RTX3060 Ti < RTX4060 < RTX5060 < A4000 < V100	21.51-84.07
llama3:70b	40GB	A40 < A6000 < 2A100-40gb < A100-80gb < H100 < 2RTX5090	13.15-26.85
llama3.2-vision:90b	55GB	2A100-40gb < A100-80gb < H100 < 2RTX5090	~12-20
llama3.1:405b	243GB	8A6000 < 4A100-80gb < 4*H100	--
gemma2:2b	1.6GB	P1000 < GTX1650 < GTX1660 < RTX2060	19.46-38.42
gemma3:4b	3.3GB	GTX1650 < GTX1660 < RTX2060 < T1000 < RTX3060 Ti < RTX4060 < RTX5060	28.36-80.96
gemma3n:e2b	5.6GB	T1000 < RTX3060 Ti < RTX4060 < RTX5060	30.26-56.36
gemma3n:e4b	7.5GB	A4000 < A5000 < V100 < RTX4090	38.46-70.90
gemma3:12b	8.1GB	A4000 < A5000 < V100 < RTX4090	30.01-67.92
gemma3:27b	17GB	A5000 < RTX4090 < A100-40gb < H100 = RTX5090	28.79-47.33
qwen3:14b	9.3GB	A4000 < A5000 < V100	30.05-49.38
qwen2.5:7b	4.7GB	T1000 < RTX3060 Ti < RTX4060 < RTX5060	21.08-62.32
qwen2.5:72b	47GB	2A100-40gb < A100-80gb < H100 < 2RTX5090	19.88-24.15
qwen3:235b	142GB	4A100-40gb < 2H100	~10-20
mistral:7b / openorca / lite / dolphin	4.1–4.4GB	T1000 < RTX3060 < RTX4060 < RTX5060	23.79-73.17
mistral-nemo:12b	7.1GB	A4000 < V100	38.46-67.51
mistral-small:22b / 24b	13–14GB	A5000 < RTX4090 < RTX5090	37.07-65.07
mistral-large:123b	73GB	A100-80gb < H100	~30

LLM Hosting with vLLM + Hugging Face — GPU Recommendation

Model Name	Size (16-bit Quantization)	Recommended GPU(s)	Concurrent Requests	Tokens/s
deepseek-ai/deepseek-coder-6.7b-instruct	~13.4GB	A5000 < RTX4090	50	1375–4120
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	~16GB	2A4000 < 2V100 < A5000 < RTX4090	50	1450–2769
deepseek-ai/deepseek-coder-33b-instruct	~66GB	A100-80gb < 2A100-40gb < 2A6000 < H100	50	570–1470
deepseek-ai/DeepSeek-R1-Distill-Llama-70B	~135GB	4*A6000	50	466
meta-llama/Llama-3.2-3B-Instruct	6.2GB	A4000 < A5000 < V100 < RTX4090	50–300	1375–7214.10
meta-llama/Llama-3.3-70B-Instruct / 3.1-70B / Meta-3-70B	132GB	4A100-40gb, 2A100-80gb, 2*H100	50	~295.52–990.61
google/gemma-3-4b-it	8.1GB	A4000 < A5000 < V100 < RTX4090	50	2014.88–7214.10
google/gemma-2-9b-it	18GB	A5000 < A6000 < RTX4090	50	951.23–1663.13
google/gemma-3-12b-it	23GB	A100-40gb < 2*A100-40gb < H100	50	477.49–4193.44
google/gemma-3-27b-it	51GB	2*A100-40gb < A100-80gb < H100	50	1231.99–1990.61
Qwen/Qwen2-VL-2B-Instruct	~5GB	A4000 < V100	50	~3000
Qwen/Qwen2.5-VL-3B-Instruct	~7GB	A5000 < RTX4090	50	2714.88–6980.31
Qwen/Qwen2.5-VL-7B-Instruct	~15GB	A5000 < RTX4090	50	1333.92–4009.29
Qwen/Qwen2.5-VL-32B-Instruct	~65GB	2*A100-40gb < H100	50	577.17–1481.62
Qwen/Qwen2.5-VL-72B-Instruct-AWQ	137GB	4A100-40gb < 2H100 < 4*A6000	50	154.56–449.51
mistralai/Pixtral-12B-2409	~25GB	A100-40gb < A6000 < 2*RTX4090	50	713.45–861.14
mistralai/Mistral-Small-3.2-24B-Instruct-2506	~47GB	2*A100-40gb < H100	50	~1200–2000
mistralai/Pixtral-Large-Instruct-2411	292GB	8*A6000	50	~466.32

Explanation:
Recommended GPUs: From left to right, performance from low to high
Tokens/s: from benchmark data.

Ollama GPU Benchmarks – Model Performance

We've benchmarked LLMs on GPUs including from P1000 to H100. These benchmarks provide insights into how different GPUs perform with Ollama across various model sizes, helping you choose the ideal AI hosting server

GPU Dedicated Server - P1000

View more for P1000 servers

GPU Dedicated Server - T1000

View more for T1000 servers.

GPU Dedicated Server - GTX 1660

View more for GTX 1660 servers.

GPU Dedicated Server - RTX 4060

View more for RTX 4060 servers.

GPU Dedicated Server - RTX 2060

View more for RTX2060 servers.

GPU Dedicated Server - RTX 3060 Ti

View more for RTX3060 servers.

GPU Dedicated Server - A4000

View more for A4000 servers.

GPU Dedicated Server - P100

View more for P100 servers.

GPU Dedicated Server - V100

View more for V100 servers.

GPU Dedicated Server - A5000

View more for A5000 servers.

GPU Dedicated Server - RTX 4090

View more for RTX4090 servers.

GPU Dedicated Server - RTX 5090

View more for RTX5090 servers.

GPU Dedicated Server - A40

View more for A40 servers.

GPU Dedicated Server - RTX A6000

View more for RTXA6000 servers.

GPU Dedicated Server - A100(40GB)

View more for 40G A100 servers.

Multi-GPU Dedicated Server - 2xRTX 5090

View more for 2xRTX 5090 servers.

Multi-GPU Dedicated Server - 2xA100(2x40GB)

View more for 2xA100 servers.

GPU Dedicated Server - H100

View more for H100 servers.

vLLama GPU Benchmarks – Model Performance

We've benchmarked LLMs on GPUs includingA5000, T1000, A40, A6000,RTX 4090,Dual RTX 4090, A100 40GB, Dual A100,4xA100 3xV100,A100 80GB,H100, ,4xA6000. Explore the results to select the ideal GPU server for your workload.

GPU Dedicated Server - A5000

View vLLM Benchmark for A5000 servers.

GPU Dedicated Server - A40

View vLLM Benchmark for A40 servers.

GPU Dedicated Server - RTX A6000

View vLLM Benchmark for A6000 servers.

GPU Dedicated Server - RTX 4090

View vLLM Benchmark for RTX 4090 servers.

Multi-GPU Dedicated Server - 2xRTX 4090

View vLLM Benchmark for 2xRTX 4090 servers.

GPU Dedicated Server - A100 (40GB)

View vLLM Benchmark for A100 (40GB) servers.

GPU Dedicated Server - V100

View vLLM Benchmark for V100 servers.

GPU Dedicated Server - A100 (80GB)

View vLLM Benchmark for A100 (80GB) servers.

GPU Dedicated Server - H100

View vLLM Benchmark for H100 servers.

Multi-GPU Dedicated Server - 2xA100 (2x40GB)

View vLLM Benchmark for 2xA100 (40GB) servers.

GPU Dedicated Server - A100 (4x40GB)

View vLLM Benchmark for 4xA100 (40GB) servers.

GPU Dedicated Server - A6000 (4xA6000)

View vLLM Benchmark for 4xA6000 servers.

What Clients Say about our AI hosting GPU Server?

Delivering exceptional service and support is our highest priority at GPU Mart. Here’s a glimpse of what our clients have said about their experience with our GPU server services.

We’ve been using their GPU servers to run 70B models, and the performance is incredible. The 48GB GPUs handle everything seamlessly, and the setup process was a breeze. Highly recommend for anyone working with large AI models!

Alex T., - AI Research Team Lead



We evaluated multiple GPU server providers, and they offer the best value for money. Compared to other vendors, we achieved better performance at a lower cost here. It's especially suitable for teams with limited budgets but high computing power demands.

James P., CEO of an AI Startup



We needed a reliable server for our 14B models, and their 16GB GPU plan was exactly what we needed. The flexibility to choose our preferred inference engine made deployment so easy. Great service!

Sarah L., – Machine Learning Engineer



We started with a single GPU and upgraded to a multi-GPU setup as our needs grew. The scalability is fantastic, and the pricing is very competitive. Perfect for growing AI teams!

David J., - CEO of an AI Startup



We tested their servers with a free trial, and the performance was so impressive that we signed up immediately. Being able to benchmark our models before committing was a huge plus. Highly recommend!

James M., - AI Developer



We needed a server that could handle our unique environment, and they delivered. The ability to customize the setup and choose our own tools made all the difference.

Daniel C. - Head of AI Operations



We use the RTX 4090 server for AI-generated artwork, style transfer, and automated image editing, and its performance has exceeded our expectations. Compared to consumer-grade GPUs, this server offers superior stability and computing power, making it ideal for professional AI image processing tasks.

Michael S. - Head of AI Operations



Questions About AI Hosting Promotion

Find answers to your most common questions in our FAQ section. For personalized recommendations or further assistance, don't hesitate to reach out to our online support team.

1. What is an AI hosting server, and how does it work?



GPU Mart's provide GPU-powered physical servers (bare metal) with dedicated IP access. You can remotely log in, choose your preferred LLM inference engine, and deploy your AI models effortlessly.

2. Which platforms are supported?



There are no platform restrictions. However, different platforms may quantize models differently, which can affect the final model size and performance.

3. What GPU memory is required for a 14B model?



We recommend a 16GB GPU for running 14B models efficiently.

4. What GPU memory is required for a 32B model?



For 32B models, we recommend a GPU with 24GB or more memory.

5. What GPU memory is required for a 70B model?



To run 70B models smoothly, we recommend a GPU with 48GB or more memory.

6. When should I choose a multi-GPU plan?



A multi-GPU plan is ideal when a single GPU cannot handle higher concurrency or larger model sizes. If your workload demands more power, consider upgrading to a multi-GPU setup.

7. Can I upgrade my server configuration later?



Yes! You can upgrade GPU memory and storage space. Some servers also support adding additional GPUs. Contact us for custom upgrade options.

8. Can I run benchmarks on my own models before committing?



Yes, we offer free trials for select products. Reach out to us to request a free trial and test your models.

9. Is server maintenance included, or am I responsible for it?



We handle all server maintenance, so you can focus on running your AI tasks without worrying about hardware management.

10. Can I customize the server environment to fit my needs?



Absolutely! You have full control to configure the server environment according to your requirements.

11. Can I use your servers for both inference and training tasks?



Our servers are optimized for inference and reasoning tasks. For training, please contact us to discuss your specific needs.

12. How many GPU servers can I buy with the AI hosting promotion?



Limited for 3 GPU dedicated server plans. If you require bulk purchasing, please contact our sales team for a unique discount arrangement.

13. What's the minimum duration for a GPU server order?



You can order AI hosting GPU server for any duration of one month or longer.

14. What's the meaning of recurring discount?



'Recurring discount' means your discount will still be available when you renew a AI hosting/machine learning server.

15. Can I get a discount for my existing GPU server?



Unfortunately, AI hosting promotions are only available for new GPU server orders. However, you can contact our sales team to inquire about special renewal discounts.

16. Will the discount remain if I upgrade/downgrade the plan after the promotion?



No, the discount will not be valid if the target plan is excluded from the AI hosting GPU server promotion.

17. What payment methods do you accept?



We accept Visa, MasterCard, American Express, JCB, Discover, Diners Club, PayPal, Wire Transfer, and Check. Note that non-instant payment methods will delay service deployment until the payment clears. Wire Transfers must be over $100. Paper checks are only for U.S. clients.

18. How long will it take to set up my server?



Typically, GPU dedicated server setup takes 20-40 minutes. Customized GPU server will take longer.

19. Can I get a free trial before payment?



We offer a 24-hour free trial for new clients who wish to test our GPU server. To request a trial server, please follow these steps:

Step 1: Submit a Free Trial Request
Select a plan, click 'Order Now,' and leave a note saying 'Need free trial.' Then, click 'Check Out' and proceed to the Order Confirm page. On this page, you must click 'Confirm' to complete the free trial request.

Step 2: Security Verification
This process takes about 30 minutes to 2 hours. Once verified, you will receive the server login details in the console and can start using it.If your trial request is not approved, you will be notified via email.

Custom Servers

Can't find your ideal server? Send us your custom requirements, and our sales rep will provide a tailored solution for you.

Server Inquiry

Confused about choosing a server or have questions? Consult online support for recommendations.

Excellent AI, Deep Learning Limited-Time GPU Server Sale!

AI Hosting Sales for Nvidia GPU Server

LLM Frameworks&Tools

LLM Hosting with Ollama — GPU Recommendation

LLM Hosting with vLLM + Hugging Face — GPU Recommendation

Ollama GPU Benchmarks – Model Performance

GPU Dedicated Server - P1000

GPU Dedicated Server - T1000

GPU Dedicated Server - GTX 1660

GPU Dedicated Server - RTX 4060

GPU Dedicated Server - RTX 2060

GPU Dedicated Server - RTX 3060 Ti

GPU Dedicated Server - A4000

GPU Dedicated Server - P100

GPU Dedicated Server - V100

GPU Dedicated Server - A5000

GPU Dedicated Server - RTX 4090

GPU Dedicated Server - RTX 5090

GPU Dedicated Server - A40

GPU Dedicated Server - RTX A6000

GPU Dedicated Server - A100(40GB)

Multi-GPU Dedicated Server - 2xRTX 5090

Multi-GPU Dedicated Server - 2xA100(2x40GB)

GPU Dedicated Server - H100

vLLama GPU Benchmarks – Model Performance

GPU Dedicated Server - A5000

GPU Dedicated Server - A40

GPU Dedicated Server - RTX A6000

GPU Dedicated Server - RTX 4090

Multi-GPU Dedicated Server - 2xRTX 4090

GPU Dedicated Server - A100 (40GB)

GPU Dedicated Server - V100

GPU Dedicated Server - A100 (80GB)

GPU Dedicated Server - H100

Multi-GPU Dedicated Server - 2xA100 (2x40GB)

GPU Dedicated Server - A100 (4x40GB)

GPU Dedicated Server - A6000 (4xA6000)

What Clients Say about our AI hosting GPU Server?

Questions About AI Hosting Promotion

1. What is an AI hosting server, and how does it work?

2. Which platforms are supported?

3. What GPU memory is required for a 14B model?

4. What GPU memory is required for a 32B model?

5. What GPU memory is required for a 70B model?

6. When should I choose a multi-GPU plan?

7. Can I upgrade my server configuration later?

8. Can I run benchmarks on my own models before committing?

9. Is server maintenance included, or am I responsible for it?

10. Can I customize the server environment to fit my needs?

11. Can I use your servers for both inference and training tasks?

12. How many GPU servers can I buy with the AI hosting promotion?

13. What's the minimum duration for a GPU server order?

14. What's the meaning of recurring discount?

15. Can I get a discount for my existing GPU server?

16. Will the discount remain if I upgrade/downgrade the plan after the promotion?

17. What payment methods do you accept?

18. How long will it take to set up my server?

19. Can I get a free trial before payment?

Excellent AI, Deep Learning
Limited-Time GPU Server Sale!