Lenovo intelligent Computing Orchestration (LiCO)Product Guide
Lenovo intelligent Computing Orchestration (LiCO) is a software solution that simplifies the management and use of distributed clusters for High Performance Computing (HPC) workloads and Artificial Intelligence (AI) model development.
This product guide provides essential presales information to understand LiCO and its key features, specifications and compatibility. This guide is intended for technical specialists, sales specialists, sales engineers, IT architects, and other IT professionals who want to learn more about LiCO and consider its use in HPC solutions.
Changes in the August 7 update:
- Revised list of validated software components
- Added information on LiCO installation through Lenovo Professional Services
Lenovo intelligent Computing Orchestration (LiCO) is a software solution that simplifies the management and use of distributed clusters for High Performance Computing (HPC) workloads and Artificial Intelligence (AI) model development. LiCO leverages an open source cluster management software stack, consolidating the management, monitoring and scheduling functions into a single platform.
The unified platform simplifies interaction with the underlying compute resources, enabling customers to take advantage of popular open source cluster tools while reducing the effort and complexity of using it for both HPC and AI.
Did You Know?
LiCO enables a single cluster to be used for both HPC and AI workloads simultaneously, with multiple users accessing the cluster at the same time. Running more workloads can increase utilization of cluster resources, driving more value from the environment.
The following table lists the ordering information for LiCO.
|Description||LFO||Software CTO||Feature code|
|Lenovo HPC AI LiCO Software 90 Day Evaluation License||7S090004WW||7S09CTO2WW||B1YC|
|Lenovo HPC AI LiCO Software w/1 yr S&S||7S090001WW||7S09CTO1WW||B1Y9|
|Lenovo HPC AI LiCO Software w/3 yr S&S||7S090002WW||7S09CTO1WW||B1YA|
|Lenovo HPC AI LiCO Software w/5 yr S&S||7S090003WW||7S09CTO1WW||B1YB|
Note: LiCO is only configurable in the x-config configurator.
For cluster users, LiCO provides the following benefits:
- A web-based portal to execute, monitor and manage HPC and AI jobs on a distributed cluster
- Enhanced end-user functionality to support AI model training and management
- Workflow templates to provide an intuitive starting point for less experienced users
- Management of private space on shared storage through the GUI
- Monitoring of job progress and log access
- Dedicated tools leveraging neural networks for image classification training (Caffe-based)
- In-flight visualizations, testing, and validation capabilities for image classification training (Caffe-based)
- Container-based user management of supported AI frameworks (through Singularity)
- Console access for advanced cluster users with command-line skills
For cluster administrators, LiCO provides the following benefits:
- A single cluster management portal consolidating monitoring, alarms, and reporting
- LiCO user management and multi-user support with user and billing groups
- Compatibility with popular shared file systems (Spectrum Scale, NFS, Lustre)
- Command-line access to the underlying open source stack components for skilled administrators
- Report generation for job activity, alarms, and actions in the cluster
- Generation of notifications and alarms based on cluster status
To facilitate the varying needs of an organization, the LiCO web portal supports 3 different access roles: administrators, users, and operators.
Features for LiCO Administrators
For cluster administrators, LiCO provides a sophisticated monitoring solution, built on OpenHPC tooling. The following menus are available to administrators:
- Home menu for administrators – provides dashboards giving a global overview of the health of the cluster. Utilization is given for the CPUs, GPUs, memory, storage, and network. Node status is given, indicating which nodes are being used for I/O, compute, login, and management. Job status is also given, indicating runtime for the current job, and the order of jobs in the queue. The Home menu is shown in the following figure.
- User menu – provides dashboards to control user groups and users, determining permissions and access levels (based on LDAP) for the organization. Administrators can also control and provision billing groups for accurate accounting.
- Monitor menu – provides dashboards for interactive monitoring and reporting on cluster nodes, including a list of the nodes, or a physical look at the node topology. Administrators may also use the Monitor menu to drill down to the component level, examining statistics on cluster CPUs, GPUs, jobs, and operations. Administrators can access alarms that indicate when these statistics reach unwanted values (for instance, GPU temperature reaching critical levels.) These alarms are created using the Setting menu. The figures below display the component and alarm dashboards.
- Reports menu – allows administrators the ability to generate reports on jobs, alarms, or actions for a given time interval. Administrators may export these reports as a spreadsheet, in a PDF, or in HTML.
- Admin menu – Provides the administrator with the capability to examine processes and assets, monitor VNC sessions, and download web logs.
- Setting menu – allows administrators to set up automated notifications and alarms. Administrators may enable the notifications to reach users and interested parties via email, SMS, and WeChat. Administrators may also enable notifications and alarms via uploaded scripts.
Features for LiCO Operators
For the purpose of monitoring clusters but not overseeing user access, LiCO provides the Operator designation. LiCO Operators have access to a subset of the dashboards provided to Administrators; namely, the dashboards contained in the Home, Monitor, and Reports menus:
- Home menu for operators – provides dashboards giving a global overview of the health of the cluster. Utilization is given for the CPUs, GPUs, memory, storage, and network. Node status is given, indicating which nodes are being used for I/O, compute, login, and management. Job status is also given, indicating runtime for the current job, and the order of jobs in the queue.
- Monitor menu – Dashboard that enables interactive monitoring and reporting on cluster nodes, including a list of the nodes, or a physical look at the node topology. Operators may also use the Monitor menu to drill down to the component level, examining statistics on cluster CPUs, GPUs, jobs, and operations. Operators can access alarms that indicate when these statistics reach unwanted values (for instance, GPU temperature reaching critical levels.) These alarms are created by Administrators using the Setting menu (for more information on the Setting menu, see the LiCO Administrator Features section.)
- Reports menu – allows operators the ability to generate reports on jobs, alarms, or actions for a given time interval. Operators may export these reports as a spreadsheet, in a PDF, or in HTML.
Features for LiCO Users
Those designated as LiCO users have access to dashboards related specifically to HPC and AI tasks. Users can add jobs to the queue, and monitor their results through the dashboards. The following menus are available to users:
- Home menu for users – provides dashboards giving a global overview of the health of the cluster. Utilization is given for the CPUs, GPUs, memory, storage, and network. Jobs and job statuses are also given, indicating the runtime for the current job, and the order of jobs in the queue. Additionally, a list of recent job templates is given for both HPC and AI workloads. The figure below displays the home menu.
- Submit job menu – allows users to set up a job and add it to the queue. The user first picks a job template. After selecting the template, the user gives the job a name and inputs the relevant parameters, and adds it to the queue. Depending on the selected template, the parameters relevant to the job will change. Users can also submit jobs as scripts via the Common Job template. The figures below display two job templates.
- Jobs menu – displays a dashboard listing queued jobs and their statuses. In addition, you can select the job and see results and logs pertaining to the job in progress (or after completion.)
- Train model menu – displays options for running neural network AI workloads for Intel-Caffe. This includes a list of the available datasets, the neural network topologies that have been created, and the image classification models built from those datasets and topologies. Users can leverage existing datasets or partition new datasets into training, validation, and test data sets. The user can also use existing topologies to get started quickly with a model, or create new topologies to solve their given problem.
After creating a model, users can train the model as a job. The model will be trained and the accuracy will be evaluated on both the training and validation datasets to help control for overfitting. LiCO provides users with graphs showing model statistics at each epoch including model accuracy, training loss, and processing speed. After the model has finished training, the user can navigate to the testing data set in order to perform unbiased model evaluation or comparison as needed.
The following figure shows an example of the graph statistics for a given model run using Intel-Caffe.
- Expert mode menu – recommended for users who are familiar with the command line interface for the OpenHPC tools. The Expert mode menu provides console access that allows users to log in to the Management node where LiCO is located. Users who log in through this console can submit HPC jobs and manage workloads using the CLI.
- Admin menu – allows users to manage the AI framework containers and directly access active nodes with VNC. For AI jobs, users can upload Singularity containers with their own frameworks, allowing for flexibility in their framework environment. The VNC dashboard provides a real-time display of all the VNC sessions in a cluster created by the user.
Subscription & Support
LiCO is enabled through a per-CPU and per-GPU subscription and support model, which once entitled for the entire cluster, gives the customer access to LiCO package updates and Lenovo support for the length of the acquired term.
Lenovo will provide interoperability support for all software tools defined as validated with LiCO, and development support (Level 3) for specific Lenovo-supported tools only. Open source and supported-vendor bugs/issues will be logged and tracked with their respective communities or companies if desired, with no guarantee from Lenovo for bug fixes. Additional support options may be available; please contact your Lenovo sales representative for more information.
Entitlements for Support are obtained as part of a Lenovo Scalable Infrastructure solution or for “roll your own” (RYO) solutions outside of the LeSI framework, and LiCO software package updates will be provided directly through the Lenovo Electronic Delivery system.
Validated Software Components
LiCO’s software packages are dependent on a number of software components that need to be installed prior to LiCO in order to function properly. Each LiCO software release is validated against a defined configuration of software tools and Lenovo systems, to make deployment more straightforward and enable support. Other management tools, hardware systems and configurations outside the defined stack may be compatible with LiCO, though not formally supported; to determine compatibility with other solutions, please check with your Lenovo sales representative.
The following software components are validated by Lenovo as part of the overall LiCO software solution entitlement:
- Lenovo Development Support (L1-L3)
- Graphical User Interface: LiCO
- System Management & Provisioning: xCAT/Confluent
- Lenovo Configuration Support (L1 only)
- Job Scheduling & Orchestration: SLURM; Torque/Maui (HPC only)
- System Monitoring: Nagios
- Application Monitoring: Ganglia
- Container Support (AI): Singularity
- AI Frameworks (AI): Caffe, Intel-Caffe, TensorFlow, MxNet, Neon
The following software components are validated for compatibility with LiCO:
- Supported by their respective software provider
- Operating System: CentOS/RHEL 7.4, SUSE SLES 12 SP3
- File Systems: IBM Spectrum Scale, Lustre
- Job Scheduling & Orchestration: IBM Spectrum LSF
- Development Tools: GNU compilers, Intel Cluster Toolkit
The following servers are supported to run LiCO. This server must run one of the supported operating systems as well as the validated software stack, as described in the Validated Software Components section.
- ThinkSystem SD530 – The Lenovo ThinkSystem SD530 is an ultra-dense and economical two-socket server in a 0.5U rack form factor. With up to four SD530 server nodes installed in the ThinkSystem D2 enclosure, and the ability to cable and manage up to four D2 enclosures as one asset, you have an ideal high-density 2U four-node (2U4N) platform for enterprise and cloud workloads. The SD530 also supports a number of high-end GPU options with the optional GPU tray installed, making it an ideal solution for AI Training workloads. For more information, see the product guide at https://lenovopress.com/lp0635-thinksystem-sd530-server.
- ThinkSystem SR630 – Lenovo ThinkSystem SR630 is an ideal 2-socket 1U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. The SR630 server is designed to handle a wide range of workloads, such as databases, virtualization and cloud computing, virtual desktop infrastructure (VDI), infrastructure security, systems management, enterprise applications, collaboration/email, streaming media, web, and HPC. For more information, see the product guide at https://lenovopress.com/lp0643-lenovo-thinksystem-sr630-server.
- ThinkSystem SR650 – The Lenovo ThinkSystem SR650 is an ideal 2-socket 2U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. The SR650 server is designed to handle a wide range of workloads, such as databases, virtualization and cloud computing, virtual desktop infrastructure (VDI), enterprise applications, collaboration/email, and business analytics and big data. For more information, see the product guide at https://lenovopress.com/lp0644-lenovo-thinksystem-sr650-server.
Additional Lenovo ThinkSystem and System x servers may be compatible with LiCO and validated upon request. Contact your Lenovo sales representative for more information.
LiCO Implementation services
Customers who do not have the cluster management software stack required to run with LiCO may engage Lenovo Professional Services to install LiCO and the necessary open-source software. Lenovo Professional Services can provide comprehensive installation and configuration of the software stack, including operation verification, as well as post-installation documentation for reference. Contact your Lenovo sales representative for more information.
Client PC requirements
A web browser is used to access LiCO's monitoring dashboards. To fully utilize LiCO’s monitoring and visualization capabilities, the client PC should meet the following specifications:
- Hardware: CPU of 2.0 GHz or above and 1 GB or more of RAM
- Display resolution: 1280 x 800 or higher
- Browser: Chrome or Firefox is recommended