GPU Infrastructure & Resource Architecture

Explore how GPU Mart designs, deploys, and manages high-performance GPU infrastructure. This overview explains our GPU allocation model, hardware selection strategy, storage architecture, and performance isolation principles that power reliable AI and compute workloads.

GPU Product Line and Selection Strategy

GPU Mart's GPU portfolio is designed and deployed based on application-specific requirements, architectural generations, long-term maintainability, and compute efficiency. We prioritize modern GPU architectures and mainstream models, and continuously update our offerings in line with NVIDIA's technology roadmap. GPU types are selected according to distinct AI and compute workloads to ensure consistent performance and operational stability.

All GPU resources are centrally procured, deployed, and managed by the Database Mart infrastructure team, and are governed under a unified lifecycle management framework, forming a standardized GPU infrastructure pool.

To meet the diverse needs of AI and compute workloads, we categorize GPU resources into the following four classes:

Tier Primary Use Cases Key Characteristics Representative Models
Entry-Level GPU Development, testing, lightweight AI inference, remote graphics Low power consumption, high stability, suitable for entry-level compute workloads GTX 1650, GTX 1660
Mainstream Performance GPU Model inference, computer vision, rendering, general-purpose computing Strong performance-to-cost ratio, mature software ecosystem, broad AI framework compatibility RTX 2060/3060 ti/4060/5060 Series
Workstation-Class GPU Professional computing, enterprise AI development, stability-critical workloads Professional-grade drivers, larger memory capacity, extended support lifecycles RTX A-Series, RTX Pro Series
Data Center GPU Large-scale model training, high-performance computing (HPC) Optimized for training and HPC workloads, deployed exclusively in data center environments A100, H100

GPU Allocation Model

GPU Mart assigns GPU resources using PCIe GPU passthrough, providing each GPU VPS and GPU Dedicated Server instance with a full physical GPU. Resources are provisioned with a one-to-one hardware mapping, with no partitioning, time-sharing, or oversubscription. This model ensures predictable performance and full compatibility with CUDA and AI workloads.

GPU Virtualization & Performance Consistency

GPU VPS services are built on KVM virtualization combined with hardware-level GPU passthrough. This architecture delivers near-native GPU performance while maintaining virtualization flexibility and workload isolation, enabling customers to deploy AI and compute workloads with consistent and stable performance.

Multi-GPU & High-Speed Interconnect Support

For distributed training and parallel compute workloads, GPU Mart supports multi-GPU dedicated server deployments with optional high-speed GPU interconnect technologies such as NVLink. These configurations are designed for large model training, multi-GPU inference, and high-performance computing workloads.

Platform & Software Support

GPU Mart provides a standardized compute platform designed for AI, high-performance computing, and GPU-accelerated workloads. The platform is built around virtualization efficiency, hardware-level GPU allocation, and optimized runtime environments.

Preconfigured AI Runtime Environment

To reduce environment deployment complexity, GPU Mart provides optional preconfigured runtime stacks designed for AI and content generation workloads.
Common supported environments include:

  • NVIDIA GPU drivers are pre-installed
  • CUDA and GPU runtime installation available upon request
  • Ollama for local LLM inference
  • ComfyUI for Stable Diffusion workflows
  • Whisper speech recognition environments

These environments are provided as deployment templates and can be customized or replaced by customer-managed configurations.

Container & Orchestration Support

GPU Mart supports containerized GPU workloads and cluster-based compute environments.
Available support includes:

  • Kubernetes GPU cluster deployment assistance
  • GPU scheduling and orchestration integration
  • Container runtime compatibility with modern AI frameworks

Cluster environments can be deployed across both VPS and dedicated GPU infrastructure depending on workload scale and architecture requirements.

Storage Architecture

NVMe SSD Workload Storage

GPU Mart deploys NVMe SSD storage as high-performance workload storage across GPU VPS and most dedicated GPU server deployments. NVMe storage is primarily used for data processing, dataset storage, caching layers, and AI workload pipelines.
Advantages include:

  • High IOPS performance for dataset loading and caching
  • Low-latency data access for AI training and inference workflows
  • Improved throughput for parallel compute pipelines
  • Reduced GPU idle time caused by storage bottlenecks

NVMe storage is adopted as a baseline infrastructure standard to support modern AI and GPU-accelerated workloads.

RAID Storage Configuration (Dedicated Servers)

For dedicated GPU servers, RAID storage configurations may be deployed depending on workload and customer requirements.

Supported objectives include:

  • Data redundancy and fault tolerance
  • Increased aggregate storage throughput
  • Improved data reliability for production workloads

RAID design is selected based on performance, redundancy, and capacity balancing requirements.

Compute & Network Performance Optimization

To ensure stable high-throughput performance, the platform adopts optimized virtualization drivers and network architecture:

  • Virtio high-performance drivers for storage and networking
  • Low-latency data path optimization
  • Upgradeable network bandwidth up to 1 Gbps (availability depends on data center location)
  • Dual-stack public networking support (IPv4 and IPv6)

These optimizations are designed to support latency-sensitive AI inference, distributed training coordination, and data-intensive compute workloads.

Operations & Compliance

GPU Mart maintains standardized operational controls. All GPU assets are centrally procured and lifecycle-managed through the DBM infrastructure system.

GPU resources are deployed under dedicated allocation policies with no oversubscription, supported by continuous infrastructure monitoring, tenant isolation controls, and baseline security and backup mechanisms.

Security & Infrastructure Stability

Platform stability and tenant isolation are maintained through multiple infrastructure protection mechanisms:

  • Network-level tenant isolation within data center environments
  • Baseline network threat mitigation and anti-mining protection
  • Automated full-disk backups with retention cycles automatically scheduled based on the selected plan size. (typically 2-week or 4-week intervals)

Frequently Asked Questions

How does GPU virtualization work?


We primarily use PCIe GPU passthrough technology to assign physical GPUs directly to customer instances. Each GPU VPS or dedicated server instance is mapped to a full physical GPU, allowing workloads to run with near-native hardware performance and full compatibility with CUDA and AI frameworks.

Are GPU resources shared between customers?


No. We provision GPU resources using dedicated allocation policies. Each GPU VPS or GPU dedicated server instance receives an exclusive physical GPU, with no time-sliced sharing or performance contention between tenants.

How is workload performance isolation ensured?


Performance isolation is achieved through hardware-level GPU allocation, memory resource control, network tenant isolation, and storage performance design. This architecture prevents cross-tenant workload interference and ensures predictable compute performance.

Do you resell or broker third-party GPU capacity?


No. We deploy and operates GPU infrastructure using centrally procured hardware managed through the DBM infrastructure lifecycle system. All GPU assets are tracked, maintained, and operated as part of GPU Mart’s controlled infrastructure environment.

What types of workloads are your infrastructure designed for?


Our infrastructure supports a wide range of GPU-accelerated workloads, including AI training, model inference, rendering pipelines, simulation workloads, and high-performance computing applications. Different GPU classes are deployed to match workload requirements and performance characteristics.

Can customers customize software environments and runtime configurations?


Yes. Customers maintain full control over operating systems, applications, AI frameworks, and runtime configurations. GPU Mart provides optional preconfigured environments to simplify deployment but does not restrict customer-managed software stacks.