GPU Over Provisioning
This page explains how to allow Curio to run more than multiple GPU tasks on a single GPU at the same time
Overview
The HARMONY_GPU_OVERPROVISION_FACTOR
environment variable enables GPU over-provisioning by allowing each physical GPU to present itself as multiple logical GPUs. When set to a value greater than 1, this feature allows a single GPU to handle multiple independent processes concurrently.
Usage
Enabling Over provisioning
Set the HARMONY_GPU_OVERPROVISION_FACTOR
environment variable to the desired over-provisioning factor.
Example
Effect: Each physical GPU is treated as two logical GPUs.
Application: In a snap encode worker, this setting allows each GPU to handle two independent encode processes simultaneously.
Example with Service File
/etc/curio.env File
Considerations
Workload Compatibility: Ideal for workloads that are not heavily memory-bound.
Snap Encode Workloads: Generally suitable for over-provisioning.
SNARK Workloads: May encounter memory limitations, especially on GPUs with lower memory capacity.
GPU Specifications: Enterprise GPUs with higher memory are better suited for over-provisioning.
Performance Testing: It's important to test and validate the optimal over-provisioning factor for your specific hardware and workloads.
Benefits
Increased Throughput: Potentially improves processing capacity per GPU.
Enhanced Utilization: Makes better use of GPU resources that might otherwise be underutilized.
Limitations
Memory Constraints: Over-provisioning can lead to memory bottlenecks on GPUs with limited memory.
Potential Instability: Running multiple processes on a single GPU may affect system stability and performance.
Recommendations
Start with Lower Values: Begin with an over-provisioning factor of 2 and monitor system performance.
Monitor Resource Usage: Keep an eye on GPU memory usage, temperatures, and overall system load.
Increment Gradually: Adjust the over-provisioning factor incrementally to find the optimal balance.
Feedback and Support
As this is an experimental feature, we encourage users to provide feedback on their experience. Your insights are valuable for improving GPU over-provisioning support in future releases.
Last updated