As AIDC increasingly adopt accelerators from various brands, issues such as incompatibility among heterogeneous computing resources, insufficient adaptation between model frameworks and underlying chips often arise. These issues lead to challenges in compute scheduling and hinder the effective scaling of computational resources.
Insufficient computing power scheduling in tasks like large-scale data processing and model training leads to uneven allocation of resources, wasting computing power and lowering utilization rates. Inadequate scheduling for different AI tasks affects overall AIDC service performance.
Optimization for AI tasks like data loading, training, fine-tuning, and inference of large models is often incomplete or only partially accelerated. This results in slow storage access, low computing and memory utilization rates, and inefficient communication, impacting the completion of AI tasks.
Data scientists often lack skills to utilize intelligent computing hardware, while enterprise IT personnel lack cluster management capabilities for large model training. There's a need for a flexible, user-friendly cluster environment and toolchain for fine-tuning large AI models.
DataCanvas Alaya NeW serves as the central management system for AIDC, effectively handling computing resource management and scheduling. It supports various AI computing services and applications, breaking through key technologies like heterogeneous computing power adaptation and scheduling. With a focus on simplicity, user-friendliness, and cluster-centricity, it's ideal for high-performance AI computing. It supports building, training, and inference of both large and small AI models, and facilitates integration of various AI models. Alaya NeW spans the basic software and hardware infrastructure of the AI industry, accelerating AI computing technology development and industrial ecosystem growth.
Construct a multi-tiered mechanism, including HPC cluster, Virtual Kubernetes Cluster, GPU Virtual Machine and AI container instances to precisely match and efficiently fulfill diverse-scale computing needs.
Dedicated to fundamental AI tasks such as training and fine-tuning large models, Alaya NeW offers an all-in-one service integrating computing, data, algorithms, and scheduling to advance AI industry innovations.
Utilize hardware and software optimization, achieving 100% improvement in cluster training efficiency, 50% increase in single-card utilization, 4x inference speed and 5x token throughput.
Provides management and optimization of diverse computing resources globally, significantly enhancing resource utilization across AIDC through continuous monitoring and innovative scheduling algorithms.
A DataCanvas Unit (DCU) is a unit of computing capability, billed on a per-second usage. 1DCU=312TFLops * 1hour.The DCU consumption depends on the amount and type of instance running GPU.