Energy-Efficient Cloud Infrastructure Design For Large Language Model Training And Inference
DOI:
https://doi.org/10.64252/qwxkxm32Keywords:
Cloud, Inference, Energy, LLM, InfrastructureAbstract
The rapidly increasing development of Large Language Models (LLMs) has rapidly placed a tremendous burden on cloud computing in terms of energy requirements, cost of operation, and environmental impacts, never seen before. This paper presents an overall architectural design that focuses on making energy efficient all the levels of LLM training and inference workloads. With a multi-prong approach, the design will take advantage of energy efficient accelerators (e.g., NVIDIA H100, TPUs), new cooling solutions (liquid and direct-to-chip), as well as software-level optimizations such as quantization and pruning and knowledge distillation.