Security on Cloud Pak for Data

IBM Cloud Pak® for Data supports several different mechanisms for securing your environment and your data.

Quick links

Secure engineering practices
Basic security features on Red Hat OpenShift Container Platform
Authentication and authorization
Encryption
Network access requirements
Audit logging
Multitenancy and network security
Regulatory compliance
Additional security measures

Secure engineering practices

Cloud Pak for Data follows IBM Security and Privacy by Design (SPbD). Security and Privacy by Design (SPbD) at IBM is a set of focused security and privacy practices, including vulnerability management, threat modeling, penetration testing, privacy assessments, security testing, and patch management.

For more information about the IBM Secure Engineering Framework (SEF) and SPbD, see the following resources:

Basic security features on Red Hat OpenShift Container Platform

Security is required for every enterprise, especially for organizations in the government, financial services, and healthcare sectors. OpenShift® container platform provides a set of security features. These features protect sensitive customer data with strong encryption controls and improve the oversight of access control across applications and the platform itself.

Cloud Pak for Data builds on the security features provided by OpenShift by creating Security Context Constraints (SCC), service accounts, and roles so that Cloud Pak for Data pods and users have the lowest level of privileges to the OpenShift platform that is needed for them. Cloud Pak for Data is also security hardened on the OpenShift platform and is installed in a secure and transparent manner.

For more information, see Basic security features on Red Hat OpenShift Container Platform.

Authentication and authorization

By default, Cloud Pak for Data user records are stored in an internal LDAP. The initial setup of Cloud Pak for Data uses the internal LDAP. However, after you set up Cloud Pak for Data, it is recommended that you use an enterprise-grade password management solution, such as SAML SSO or an LDAP provider for password management.

User management

For more information, see the following resources:

Authorization

Cloud Pak for Data provides user management capabilities to authorize users. For more information, see Managing users.

Tokens and API keys

You can use tokens and API keys to securely access Cloud Pak for Data instances, services, and APIs.

Cloud Pak for Data automatically generates a bearer token when a user signs in, and securely stores information in the user's home directory. When the user signs out, the stored bearer token is cleared.
An analytics project requires a personal access token to connect to an external Git repository.
Cloud Pak for Data provides an encrypted bearer token in the model deployment details that an application developer can use for evaluating models online with REST APIs. The token never expires and is limited to the model it is associated with.
By using API keys, you are able to authenticate to Cloud Pak for Data instances or services with your own credentials. For more information, see Generating API keys for authentication.
You must use an API key to access Cloud Pak for Data APIs. For more information, see Developer resources.
Cloud Pak for Data uses a JSON Web Token (JWT) to authenticate to some services. Services that support JWT tokens can use the Cloud Pak for Data credentials to authenticate to the service. For more information, see:
Cloud Pak for Data uses a JSON Web Token (JWT) to authenticate to some data sources. If you create a connection to a data source that supports JWT tokens, you can select the Use my platform login credentials checkbox. When the user logs in to Cloud Pak for Data with their username and password, the Cloud Pak for Data authorization service returns a JWT token to the browser. The token is forwarded to the data source to grant access to the system. The user does not need to enter credentials again for the data source. The token has a limited expiry and generally lasts only an hour unless the browser refreshes it. Following are the data sources that can use Cloud Pak for Data credentials for authentication:

Idle web client session timeout

You can configure the idle web client session timeout in accordance with your security and compliance requirements. When a user leaves their session idle in a web browser for the specified length of time, the user is automatically logged out of the web client.

For more information, see Setting the idle session timeout.

Shared credentials

By default, connections that are created in Cloud Pak for Data are shared.

A project administrator is able to turn off the shared connections and use only connections with personal credentials.

When shared credentials are used, account users with access to the connection are not prompted for credentials. Hence you are not able to detect the user who accessed the data. This can be a problem if determining individual accountability is important. The use of shared credentials might be disallowed by various industry-specific regulations to which your organization must comply with. Enabling shared credentials functionality for Cloud Pak for Data connections might be a security and compliance risk that you must evaluate and make the correct choice based on your specific need.

Important: By turning off the shared connections, you apply this paradigm for all new connections that are created in Cloud Pak for Data. Existing connections are not affected.

For more information, see Disabling the shared credentials.

Encryption

Cloud Pak for Data supports protection of data at rest and in motion. It supports FIPS (Federal Information Processing Standard) compliant encryption for all encryption needs.

Data

In general, data security is managed by your remote data sources. OpenShift uses resources that are known as Security Context Constraints (SCCs) to enforce the security context of a Pod or a Container (the Kubernetes equivalent is the PodSecurityPolicy).Cloud Pak for Data containers use restricted SCC by default. Restricted SCC deny access to all host features and requires pods to run with a UID, SELinux context that is scoped within the namespace. For more information, see Storage considerations.
To ensure that your data in Cloud Pak for Data is stored securely, you can encrypt your storage partition. For more information, see Encrypting and mirroring disks during installation in the Red Hat OpenShift Container Platform documentation:
- Version 4.8
  4.6.0 - 4.6.2 only
- Version 4.10
  4.6.x
- Version 4.12
  4.6.4 or later

Communications

You can use TLS or SSL to encrypt communications to and from Cloud Pak for Data.

If you plan to use your own TLS certificate and private key (both in PEM format) to enable an HTTPS connection to the Cloud Pak for Data web client, see Using a custom TLS certificate for HTTPS connections to the platform.
If you plan to use SSL for a remote IBM Db2 Database or Db2 Warehouse on Cloud connection, select the Use SSL checkbox when you create the connection to the data source.
If you plan to use SSL to connect to a remote IBM Db2 Database that uses a self-signed certificate or a certificate that is signed by a local certificate authority, follow the process for Connecting to data sources
If you plan to use SSL for a remote Apache Kafka connection, you need:
- A truststore with certificate authority (CA) certificate installed.
- A keystore with a key pair and certificate that is signed by CA.
The latter one is used to authenticate Spark submit service as a web client.

In addition, it is recommended that you disable TLS 1.0 and TLS 1.1 from Red Hat OpenShift Container Platform HAProxy routers on port 443. For more information, see Disable TLS1.0 and TLS1.1 in HAproxy routers.

FIPS

Cloud Pak for Data also supports protection of data at rest and in motion. It supports FIPS (Federal Information Processing Standard) compliant encryption for all encryption needs.

For more information, see Enabling FIPS on your Red Hat OpenShift cluster.

Network access requirements

To ensure secure transmission of network traffic to and from the Cloud Pak for Data cluster, you need to configure the communication ports used by the Cloud Pak for Data cluster.

Primary port: The primary port is what the Red Hat OpenShift router exposes.; For more information, see Network Access Requirements.
Communication ports for services: When you provision a new service or integration on your Cloud Pak for Data cluster, the services might require connections to be made from outside the cluster.; For more information, see Securing communication ports.
DNS service name: When you install the Cloud Pak for Data control plane control plane, the installation points to the default Red Hat OpenShift DNS service name. If your OpenShift cluster is configured to use a custom name for the DNS service, a project administrator or cluster administrator must update the DNS service name to prevent performance problems.

Audit logging

Audit logging provides accountability, traceability, and regulatory compliance. The regulatory compliance must be set in a way that it allows access to and modification of data.

For more information, see Auditing Cloud Pak for Data.

Multitenancy and network security

To make effective use of infrastructure and reduce operational expenses, you can run Cloud Pak for Data in multi-tenant mode on a single OpenShift cluster, while still maintaining security, compliance, and independent operability.

Security in a multi-tenant cluster is based on:

Setting up network policies to isolate each instance of Cloud Pak for Data
Setting up OpenShift projects (namespaces) to align with the Principle of Least Privilege.

Regulatory compliance

Cloud Pak for Data is assessed for various Privacy and Compliance regulations. Cloud Pak for Data provides features that can be used by its customers in preparation for various privacy and compliance assessments. These features are not an exhaustive list. It is difficult to assemble such an exhaustive list of features, since customers can choose and configure the features in many ways. Furthermore, Cloud Pak for Data can be used in various ways as a stand-alone product or with third-party applications and systems.

Cloud Pak for Data is not aware of the nature of data that it is handling other than at a technical level (for example, encoding, data type, size). Therefore, Cloud Pak for Data can never be aware of the presence or lack of personal data. Customers must track whether personal information is present in the data that is being used by Cloud Pak for Data.

For more information, see What regulations does Cloud Pak for Data comply with?

Additional security measures

To protect your Cloud Pak for Data instance, consider the following best practices.

Network isolation

As a best practice, network isolation is to be used to isolate the Red Hat OpenShift project (Kubernetes namespace) where Cloud Pak for Data is deployed. Then, you must ensure that only the appropriate services are accessible outside the namespace or outside the cluster. For more information about network isolation, review the following OpenShift documentation.

Setting up an elastic load balancer

To filter out unwanted network traffic, such as protecting against Distributed Denial of Service (DDoS) attacks, use an elastic load balancer that accepts only full HTTP connections. Using an elastic load balancer that is configured with an HTTP profile inspects the packets and forward only the HTTP requests that are complete to the Cloud Pak for Data web server. For more information, see Protecting Against DDos Attacks.