Lenovo X6 Server RAS FeaturesArticle

Published
23 Aug 2016
Form Number
LP0554
PDF size
7 pages, 455 KB

Abstract

Server availability is critical to business operations and impacts the core business, service, customers and company reputation. This article talks about the importance of server availability, defines Reliability, Availability, and Serviceability (RAS) and shows how Lenovo X6 servers have the highest level of RAS available.

Introduction

When it comes to IT, the only good downtime is no downtime.

Time is money. Even a few minutes of downtime can result in significant costs and cause internal business operations to come to a standstill. Downtime can also impact adversely a company’s relationship with its customers, business suppliers and partners. Reliability or lack thereof can potentially damage a company’s reputation and result in lost business.

Lenovo System x3850 X6
Figure 1. The Lenovo x3850 X6 server offers mission-critical reliability, serviceability, and availability (RAS) features

The growth of new applications has ratcheted database processing and business analytics to the top of the list for server workloads. These workloads demand continuous availability from the enterprise platforms on which they run.

The concept of always-on has become a global requirement and impacts every aspect of our lives:

  • Maximize productivity – Manufacturers need to keep their production line up and running. System downtime shouldn’t interrupt it.
  • Control access – Building security companies prevent external threats to organizations. Security application downtime shouldn’t be an internal one.
  • Protect profit – Retailers have sales targets to meet day in day out. Transaction system downtime shouldn’t get in the way.
  • Protect lives – First responders take care of emergencies 24 x 7 x 365. Application downtime shouldn’t be one of them.
  • Ensure quality care and privacy – Healthcare institutions need to access patient information and be HIPPA compliant all the time. System downtime shouldn’t compromise either one.
  • Process transactions – Financial Services organizations manage thousands of transactions a second. Processing system downtime simply can’t happen

Looking at downtime impacts from a sector perspective:

  • Manufacturing – Revenue, quality, cost, compliance
  • Financial services – Revenue for business and client, privacy, compliance, customer trust, brand reputation, transactional processing systems
  • Retail – Transaction and data loss: sales data, customer data – lost opportunity, abandoned opportunities, tarnished reputation, and compliance
  • Public safety – Lives, public safety, community trust, computer aided dispatch, public answering safety point
  • Building automation – Safety/security risk, Instills fear in building occupants, uncomfortable building environment, compliance
  • Health care – Quality of care, patient privacy, compliance, electronic health records

Characteristics of mission-critical workloads

What defines a mission-critical workload? The following is a list of the key characteristics:

Transaction processing:

  • 1,000s - 1,000,000+ online users
  • Support large transactional databases
  • 24 x 7 operation

Business intelligence and analytics

  • Enable all users
  • Complex queries
  • Multiple data sources
  • Large data warehouse

Database

  • Large scalable enterprise databases
  • No single point of failure
  • Extremely fast operational speed

Lenovo System x3950 X6

Figure 2. The System x3950 X6 with eight processors is ideal for high-performance mission-critical workloads

An hour of downtime can mean millions in lost revenue

  • The ITIC 2015-2016 Global Server Hardware and Server OS Reliability Report found that 98% of firms say hourly downtimes costs exceed $100K and 81% estimate hourly downtime cost their companies over $300K.
  • The IDC Storage Quickpoll 2013 study found that an hour of downtime of mission critical applications to cost $500K for companies with 5-10K employees and $1.5M for companies with greater than 10K employees.

Server RAS defined

We know RAS stands for Reliability, Availability and Serviceability, but what do these mean in terms of Enterprise servers?

Reliability = “Engineered Strategy”

  • Error Detection and Self-Healing
  • Minimizes outage opportunities
  • Correct results all the time

Availability = Keep running despite problems

  • Reduce frequency and duration of outages
  • Self-diagnosing: work around faulty components or “self-heal”
  • Never stops or slows down

Serviceability = Minimize outages/downtime

  • Avoid repeat failures with accurate diagnostics
  • Concurrent repair on higher failure rate items
  • Easy to repair and upgrade

To ensure business continuity and increase end user productivity, it is imperative that businesses maximize the reliability and uptime of their server hardware and server operating systems.

Lenovo servers are the industry leader in server availability. The ITIC 2015-2016 Global Server Hardware and Server OS Reliability Report found that Lenovo System x and IBM Power Systems averaged the lowest percentage of server outages compared to HP ProLiant and Integrity servers, Dell PowerEdge, Oracle x86 and SPARC.

Lenovo x3850 X6 and x3950 X6 RAS

All Lenovo servers have strong RAS capabilities but the Lenovo x3850 X6 and x3950 X6 servers feature advanced RAS features not found in other servers. The differentiated X6 self-healing technology proactively identifies potential failures and transparently takes necessary corrective actions.

x3850 X6 with compute book extended
Figure 3. The X6 servers make servicing easy because components like the Compute Book are easily removed from the front or rear of the server

The Lenovo X6 servers provide mainframe-like RAS because they integrate across the hardware and software stack.

X6 provides four levels of RAS beyond standard RAS:

  1. Standard Intel-based server RAS – Strong RAS capabilities available on all Lenovo servers.
    • Strong error prevention
    • Error Detection / Correction
  2. Intel Run Sure Technology – Enterprise RAS only available on the Intel Xeon E7-4800/8800 v4 processors used in the x3850 X6 and x3950 X6.
    • MCA consumed error recovery
    • UPI faildown
    • Dual Device Data Correction for memory
    • PCIe live error recovery
  3. Lenovo platform RAS innovation – Higher availability, more platform-level RAS
    • Automated processor failover
    • Automated firmware backup
    • Automated memory page sorting and page retire
    • Advanced transaction recovery
  4. Lenovo management innovation – Greater solution-level RAS management with X6 software stack integration
    • VMware virtualization
    • Microsoft virtualization
  5. X6 modular design – reduces service time by enabling quick easy replacement of upgradeable or failed components.
    • Compute Books
    • Storage Book
    • I/O Books
    • Power supplies
    • Fans

In addition to the above X6 RAS features, Lenovo XClarity which is a new centralized systems management solution also increases RAS for Lenovo servers. XClarity helps in the following ways:

  • Provides the tools needed to enable administrators to deploy platforms more quickly and manage them easier.
  • Allows servers even call home if they detect an issue, so a potential problem may be fixed before it occurs.
  • Collects and downloads diagnostic data, including logs, service data, and inventory to help identify the cause of the issue.

Lenovo X6 technologies drive the outstanding system availability and uninterrupted application performance needed to host mission-critical applications.

Related product families

Product families related to this document are the following:

Trademarks

Intel® and Xeon® are trademarks of Intel Corporation or its subsidiaries.

Microsoft® is a trademark of Microsoft Corporation in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.