Non-functional Requirement - Hardware Resiliency

Category: Resiliency

Context: Hardware

Goals: When hardware component fails, the repair or replacement of the failed hardware must be treatable as routine system maintenance rather than as a service affecting outage or emergency.

Rationale: If the availability of the system is sufficiently critical, the MTTR must not be dependent on the response time of hardware vendors, the availability of repair parts or the availability of staff. The availability of the system therefore must be decoupled from the availability of any single hardware component or any individual staff person.

Requirement: Failure of a single hardware component shall not cause user detectable loss of business functionality for an elapsed time more than Metric. After an elapsed time no longer than Metric, the user will be able to continue business functionality.

Metric:

Level A:

A1. The user detectable loss of business functionality will be no more than one minute
A2. The user will receive a visual indicator of the status of the in-flight transaction
A3. Business functionality will be available to the user without re-authentication
A4. The user application context will be preserved, restored or recovered

Level B:

B1. The user detectable loss of business functionality will be no more than ten minutes
B2. No more than the single most recent in-flight transaction will be lost
B3. Business functionality will be available to the user after re-authentication
B4. The system will continue to meet non-functional requirements other than resiliency requirements.

Level C:

C1. The user detectable loss of business functionality will be no more than one business day
C2. No more than the most recent one business day of data modifications will be lost

Level D:

D1. The recovered system will meet all pre-failure functional and non-functional requirements.
D2. The system will meet <existing internal standard>

Scale: Seconds duration, business day, elapsed time

Stakeholders: ​System Managers, Operations

Implications: If this requirement is not met, the organization will incur decreased availability of systems, decreased flexibility for hosting and system management, and increased frequency and duration of unplanned outages.

Applicability: See Enterprise Requirements Framework

Tags: Hardware, Resiliency

Status: Approved, Requirement

Author: <Author>

Revision: <Revision>



Notes: 

Incorporates traditional concepts of Redundancy, Clustering, Load Balancing and Fault Tolerance. A systems 'Availability', RPO and RTO are derived from this and other requirements. 

This requirement is intended to force the designer to leverage high availability technologies for systems in which the impact of an unavailable system reaches certain thresholds.

For more information, see NFR Summary