Lenovo AI Validated Design for Model Training on ThinkSystem ServersReference Architecture

Lenovo ThinkSystem DM Series and DE Series Storage Arrays

At Lenovo’s Transform 2.0 conference, Lenovo and NetApp announced a global multi-faceted partnership to bring innovative technology and a simplified experience to help customers modernize IT and accelerate their digital transformation. The ThinkSystem DM Series Unified and DE Series SAN Hybrid and All Flash Storage Arrays are the first solutions from this partnership. Read the DM Series and DE Series Product Guides to get more information about the new storage systems.

Authors
Updated
7 Sep 2018
Form Number
LP0892
PDF size
36 pages, 836 KB

Abstract

This document describes the reference architecture for Artificial Intelligence (AI) Model Training on Lenovo ThinkSystem servers. It provides a predefined and optimized hardware infrastructure for the model training under various usage scenarios. The reference architecture provides planning, design considerations, and best practices for implementing model training with Lenovo products.

One key step in the AI adoption journey is exploration and selection of models for deep learning (DL). Typical models are based on deep neural networks (DNNs) and require a significant amount of computational resources for training. Using hardware infrastructure designed as a scale-out cluster for such model training use cases is a key requirement for enabling DL adoption.

The intended audience for this reference architecture is IT professionals, technical architects, sales engineers, and consultants to assist in planning, designing, and implementing advanced analytics solutions with Lenovo hardware.

Table of Contents

1 Introduction
2 Business problem and business value
3 Requirements
4 Architectural overview
5 Component model
6 Operational model
7 Deployment considerations
8 Appendix: Bill of Material
9 Appendix: Example Training Workload
Resources
Document history

To view the document, click the language links under Download PDF.

Note: The Chinese version is one level back and will be updated shortly.

Change History

Changes in the September 7 update (Version 1.7):

  • Added SR670 Training Node Configurations
  • Updated BOM tables to include configurations with SR670
  • Updated BOM tables to include configurations with 100Gb Ethernet switches

Related product families

Product families related to this document are the following: