Spring World 2018

Conference & Exhibit

Attend The #1 BC/DR Event!

Fall Journal

Volume 30, Issue 3

Full Contents Now Available!

Wednesday, 09 September 2015 05:00

A Practical Disaster Recovery Approach for Mission Critical Identity and Access Management (IAM) Systems

Written by  Seetharaman Jeganathan

Identity and access management (IAM) is one of the mission critical information systems that many organizations operate as part of their overall information security program. IAM is a combination of identity management (IDM) and access management (AM) systems. Identity management helps organizations to automate the business processes followed in provisioning, managing and de-provisioning of user accounts (digital identities) in critical IT systems. Access management helps organization to enforce access control mechanisms and offer single sign-on (SSO) feature for users to access web based applications. According to Gartner, IAM is one of the most important components of an organization’s security infrastructure in order to protect information assets from being compromised and stay compliant with the legal and regulatory requirements.

Web access management solution controls users’ access and enables single sign-on (SSO) to critical business applications. With a wide range of cloud adoption, companies are adopting software-as-a-service (SaaS) model applications for e-mail communication, document management, file sharing, customer relationship management, human capital management, and several other business requirements. What will happen if the access management solution becomes unavailable for more than the maximum tolerable downtime (MTD)? Similarly, how can an organization continue the operations during critical business days if the identity management solution which automates the accounts provisioning process to critical enterprise IT systems becomes unavailable? Let us analyze about building practical disaster recovery approaches for mission critical IAM systems in any organization.

IAM solutions are typically built and run on premise datacenters by large organizations. Many organizations are adopting cloud based identity as service solutions (IDaaS) model as well to support their business requirements. For such organizations, building the disaster recovery plan, required infrastructure, processes and procedures are majorly off loaded to the service provider. However it is very vital to have an effective policy statement and standards that enforce the cloud service provider to comply with the organization’s policies and derived standards. Disaster recovery planning for cloud based identity and access management services differ from systems deployed and operated on premise. In this paper, the author leans towards providing practical approaches for building disaster recovery plan for IAM solutions built and operated on premise in the datacenters owned by the organizations.

IAM Disaster Recovery Plan

According to 2014 annual report of disaster recovery preparedness benchmark survey, 3-out-of-4 companies are at risk by failing to prepare for disaster recovery. IT systems outages and associated costs remain a major challenge for several organizations with losses ranging from a few thousand dollars to millions of dollars. Dun & Bradstreet has reported that 59 percent of Fortune 500 companies experience a minimum of 1.6 hours downtime of IT systems each week and 73 percent of businesses have experienced revenue loss of 70 million in the past five years due to service interruptions.

High availability of a mission critical application is highly imperative for an organization to meet its service levels to its customers and business partners. However, high availability in production site alone is not enough to eliminate the fact that the disaster recovery and business continuity plan is extremely critical to be prepared for the worst case situations.

DR is an investment

For any IT initiative, obtaining support from the senior management and sponsors is quintessential. It is the responsibility of the IAM system owner(s) to present appropriate business cases along with the results of business impact analysis (BIA) and other risk factors portraying the downtime of IAM systems and its direct effects in supporting various business groups to meet their objectives. It might work favorably with the management if presented with data on revenue loss caused directly or indirectly due to the IAM system down time to the business groups.

DR planning for IAM systems differs from conventional IT systems because it supports and enhances the information security posture of an organization. There are several laws and regulations that apply to companies belonging to several industries to prove that due diligence have been made in protecting the information obtained from customers, business partners and associates. In the United States, U.S Sarbanes Oxley Act of 2002, the Health Insurance Portability and Accountability Act of 1996, the Gramm-Leach-Bliley Act (GLBA) of 2009, the Basel II Accord regulatory laws and industry specific standards such as Payment Card Industry – Data Security Standard (PCI DSS) are the mostly commonly known regulations. It is required to understand that the same level of information protection countermeasures must be in place in the DR environment as similar to production environment, otherwise the organization will become non-compliant and could face serious damages, both tangible and intangible.

Based on the 2014 annual survey report, best practices from better prepared organizations for disaster recovery indicates that, deriving specific metrics for recovery time objectives (RTOs) and recovery point objectives (RPOs) is a successful beginning for building effective DR plan. Defining RTO and RPO metrics for identity manager and access manager applications is a collaborative task between the IT system owner(s) of IAM applications and the business process owners. There are other parties who should also be involved for consulting about information security compliance requirements, legal and regulations, risk management, IT project management etc. Many of the “process matured” companies have specialist roles for IT systems contingency planning and operations. A formal review and approval by the contingency planning group is essential for a successful outcome from the DR planning exercise.

There are several key questions that must be asked and answered in order to define the RTO and RPO metrics. The very basic, but the most imperative question to ask is how long the business can sustain the failure of a mission critical application? In terms of IAM applications, the workgroup should debate about possible downtime scenarios and how it will impact the business negatively. Below are some essential questions to brainstorm and get answers to,

  1. What are the functional dependencies for other IT systems with identity manager and access manager applications?
  2. How these dependencies negatively impact these applications if IAM systems are unavailable?
  3. What are the impacts on service level agreements to business groups if IAM systems are unavailable?
  4. What is the maximum data loss the business can afford? This is a very critical question to spend a significant time & discussion on to derive the accurate value affordable for the business.

The outcome of this workgroup must be the derivation of specific metrics of RTO and RPO for mission critical identity manager and access manager applications for the organization.

Identity Manager Components

In most common scenarios, companies build IAM solutions with commercially off-the- shelf (COTS) products. Gartner magic quadrant for user provisioning and administration is a good reference for an overview of the key market players in this domain. I collocated the identity manager general architectures discussed throughout this article in reference with a book titled “Identity Management on a Shoestring” by Ganesh Prasad and Umesh Rajbhandari.

Table 1 – Identity Manager Technology Components

Layer

Technology

Description

Front end –

User Interface Layer

Web technologies

User interface layer for identity manager typically have a user portal and an admin portal.

  • User portal is for end users to interact with identity manager to manage their profiles and making self-access provisioning requests.
  • Admin portal is for identity manager system administrators to perform maintenance tasks and monitor the system performance.

Business Layer

  • Middleware technologies
  • Business process management (BPM)

This layer consists of several technologies to build the business logic for approvals, workflow processes, access request flows and core system level APIs for interoperability and connection to the end target systems.

Data Layer

  • Database technologies
  • Directory server technologies

This layer consists of components to store user identity information in either hierarchical or relational data model. Also this layer consist storage of identity manager transactional auditing data about “who has requested what permissions to whom.”

Connectors Layer

  • XML technologies
  • Platform specific APIs ( Ex. Java based or .NET based)

Connector components receive the commands from identity manager and convert them to the programming languages and commands understood by the platform to create and manager user accounts.

Sample reference architectures for identity manager deployment involving the technical components narrated in Table 1 are given below for further discussion.

Jeganathan1

Figure 1 – Identity Manager Logical Architecture (Sample Reference 1)

In Figure 1 above, the identity manager sample logical architecture is depicted with high availability. It is assumed that the user data and transactional audit data about accounts provisioning and approvals are stored in the relational database clustered instances.

Jeganathan2

Figure 2 – Identity Manager Logical Architecture (Sample Reference 2)

In Figure 2, another possible logical architecture is depicted with high availability. In this architecture, the user data is stored in the directory server based product and the transactional audit data about accounts provisioning and approvals are stored in the relational clustered database instances. The directory servers are designed with two nodes primary and secondary cluster with a peer to peer replication model. In both the architectures, the identity manager application is hosted in web application servers in a clustered model and placed under load balancer for traffic load balancing and automatic failover to another when one of the instances goes down.

Let us analyze possible disaster recovery approaches for the identity manager sample reference architectures considered for our discussion. In Figure 3, a practical disaster recovery model is shown for the identity manager application leveraging relational database cluster (refer Figure 1) for storage of user data information and transactional audit data information. The DR site could possibly be in the secondary but fully functioning data center for the organization or in the same data center in which the contingency is planned and built to meet the disaster recovery requirements. The key considerations made in the DR model is given below,

  1. Manual failover at the application layer. – The major advantage of manual failover at the application layer is, the incident response team can make all possible attempts to bring the system back online within the maximum tolerable downtime. Automatic failover is preferred when the MTD is very minimal in which the incident response team will not be able to bring the system back online.
  2. Data backup – Backup of identity manager database schema instances must happen on a closest possible time interval that is sufficient for the business. For example, a weekly full backup of the database instances and a daily incremental backup overnight might meet the requirements. The key factor to understand here is the recovery point, which is the time interval between the last successful data backup (incremental or full) and the time data become unavailable due to a disaster.
  3. Data mirroring – Most of the relational database products offer real time synchronization capabilities with the peer database instances planned for contingency. Implementing features to replicate the data between the instances in the production site and the DR site can help companies achieve lowest possible RPO for a mission critical application.
  4. Effective change management process – Effective change management process ensures that when a change is made or additional components are deployed in the production environment for identity manager application then it mandates that the change is deployed properly and verified in the DR environment also without fail. This process will ensure that same version levels and features of software components are deployed and operated in both the places.

    Jeganathan3

Figure 3 – Identity Manager Production and DR Architecture (Sample Reference 1)

In Figure 4 below, another disaster recovery model is shown for identity manager application leveraging directory server products for user data storage and relational database cluster (refer Figure 2) for storage of transactional audit data information. All the key considerations discussed above are applicable for this model along with an additional data replication of directory servers’ product from production site to the DR site.

Jeganathan4

Figure 4 – Identity Manager Production and DR Architecture (Sample Reference 2)

Access Manager Components

Like identity manager, access manager reference architecture leveraging the COTS will have one or more of the technology components listed in the table below. I collocated the access manager general architecture discussed in this article with an article titled “Dimensions of Identity Federation: A Case Study in Financial Services” by Manish Gupta and Raj Sharman.

Table 2 – Access Manager Technology Components

Layer

Technology

Description

Front end –

User Interface Layer

Web technologies

User interface layer for access manager typically have a user portal and an admin portal.

  • User portal is for end users to interact with access manager to manage their profiles and gain single sign-on access to protected applications.
  • Admin portal is for access manager system administrators to perform maintenance tasks and monitor the system performance.

Web Proxy Layer

  • Java based
  • .Net based

Web proxy layer intercepts the request from clients to the protected applications and redirect them to access manager for authentication and authorization.

Business Layer

  • Middleware technologies
  • Policy Server
  • Identity Servers

This layer consists of access engine components to build the business logic for access control policies, design and implement role based, attributes based access control models along with core system level APIs for interoperability and provide integration mechanisms to different user stores. Identity Server components play the role of identity provider to meet the business requirements for standards based federated single sign-on capabilities. Some well-known standards are,

  • Security Assertion Markup Language (SAML)
  • Open ID
  • OAUTH

User Stores

  • Database technologies
  • Directory server technologies

This layer consists of components to integrate different possible user stores Active Directory, Databases, Radius servers, Directory servers etc. for authentication and authorization services.

A generic architecture for access manager deployment is discussed in the diagram (Figure 5) below. The components are built with high availability in production environment with automatic failover to each other when one component crashes or fails.

Jeganathan5

Figure 5 – Access Manager Logical Architecture

In Figure 6, a practical disaster recovery model is shown for the access manager application. The DR site could possibly be in the secondary but fully functioning data center for the organization or in the same data center in which the contingency is planned and built to meet the disaster recovery requirements.

Jeganathan6

Figure 6 – Access Manager Production and DR Architecture

The key considerations made in the DR model is given below,

  1. Manual failover procedure at the application layer. – This will help in incident handling and process and take informed decision about continuing the operations from DR site.
  2. Data backup – backup of access manager data such as policies, roles, and protected applications configuration settings. For example, a weekly full backup of the configuration data could help in restoring the primary site when it is crashed.
  3. Effective change management process – Effective change management process ensures that when a change is made in the production environment for access manager application then it mandates that the change is deployed properly and verified in the DR environment also without fail. This process will ensure that same level version and features of software components are deployed and operated in both the places.

Conclusion

Organizations might think disaster recovery as an unproductive investment. However, when there is a real disaster, the company’s credibility is at stake with the stakeholders, business partners and customers. Instead of learning it the hard way, it is better to analyze diligently about the practical approaches that are cost effective but efficient to keep the organization continue operating the mission critical applications and recover from the disaster. This way, the organization can protect the interests of its mission and stake holders in order to thrive and upscale in an ever challenging business environment.

About the Author:

Jeganathan-SeetharamanSeetharaman Jeganathan has more than 13 years of experience in IT technology security consulting and program management. He is an ISC2 Certified Information Systems and Security Professional (CISSP) in good status. He mainly focuses on information systems risk assessments, identity and access management (IAM) solution strategy definition, and architecture definition, design and implementation of IAM security solutions using Oracle, IBM, NetIQ and SailPoint vendor products to worldwide customers belong to several industries. He also specializes in cloud based applications security consulting and implementation of IAM solutions in cloud.