Deploying IBM Cloud Pak for Data on Red Hat OpenShift Service on AWS
Amazon Web Services (AWS) customers who are looking for a more intuitive way to deploy and use IBM Cloud Pak for Data (CP4D) on the AWS Cloud, can now use the Red Hat OpenShift Service on AWS (ROSA).
ROSA is a fully managed service, jointly supported by AWS and Red Hat. It is managed by Red Hat Site Reliability Engineers and provides a pay-as-you-go pricing model, as well as a unified billing experience on AWS.
With this, customers do not manage the lifecycle of Red Hat OpenShift Container Platform clusters. Instead, they are free to focus on developing new solutions and innovating faster, using IBM’s integrated data and artificial intelligence platform on AWS, to differentiate their business and meet their ever-changing enterprise needs.
CP4D can also be deployed from the AWS Marketplace with self-managed OpenShift clusters. This is ideal for customers with requirements, like Red Hat OpenShift Data Foundation software defined storage, or who prefer to manage their OpenShift clusters.
In this post, we explain how to create a ROSA cluster and perform an express installation of CP4D.
Cloud Pak for data architecture
Here, we are implementing a highly available ROSA cluster with three Availability Zones (AZs), three master nodes, three infrastructure nodes, and three worker nodes.
Review the AWS Regions and Availability Zones documentation and the regions where ROSA is available to choose the best region for your deployment.
Figure 1 demonstrates the solution’s architecture.
In our scenario, we are building a public ROSA cluster, with an internet-facing Classic Load Balancer providing access to Ports 80 and 443. Consider using a ROSA private cluster when you are deploying CP4D in your AWS account.
We are using Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic File System (Amazon EFS) for the cluster’s persistent storage. Review the IBM documentation for information about supported storage options.
Also, review the AWS prerequisites for ROSA and follow the Security best practices in IAM documentation, before deploying CP4D for production workloads, to protect your AWS account before deploying CP4D.
Cost
You are responsible for the cost of the AWS services used when deploying CP4D in your AWS account. For cost estimates, see the pricing pages for each AWS service you use.
Prerequisites
Before getting started, review the following prerequisites for this solution:
- This blog assumes familiarity with: CP4D, Terraform, Amazon Elastic Compute Cloud (Amazon EC2), Amazon EBS, Amazon EFS, Amazon Virtual Private Cloud, and AWS Identity and Access Management (IAM).
- Access to an AWS account, with permissions to create the resources described in the installation steps section.
- An AWS IAM user, with the permissions described in the AWS prerequisites for ROSA documentation.
- Verification of the required AWS service quotas to deploy ROSA. You can request service-quota increases from the AWS console.
- Access to an IBM entitlement API key: either a 60-day trial or an existing entitlement.
- Access to a Red Hat ROSA token; you can register on the Red Hat website to obtain one.
- A bastion host to run the CP4D installer; we have used and AWS Cloud9 workspace. You can use another device, provided it supports the required software packages:
Installation steps
Complete the following steps to deploy CP4D on ROSA:
- From the AWS ROSA console, click on Enable ROSA to activate the service on your AWS account (Figure 2).
- Create an AWS Cloud9 environment to run your CP4D installation. We’ve used a t3.medium instance (Figure 3).
- After your AWS Cloud9 environment is up, close the Welcome tab and open a new Terminal tab and install the required packages:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install sudo yum -y install jq gettext sudo wget -c https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz -O - | sudo tar -xz -C /usr/local/bin/ sudo wget -c https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/stable/openshift-client-linux.tar.gz -O - | sudo tar -xz -C /usr/local/bin/
- Create an IAM policy named cp4d-installer-permissions with the following permissions:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:*", "cloudformation:*", "cloudwatch:*", "ec2:*", "elasticfilesystem:*", "elasticloadbalancing:*", "events:*", "iam:*", "kms:*", "logs:*", "route53:*", "s3:*", "servicequotas:GetRequestedServiceQuotaChange", "servicequotas:GetServiceQuota", "servicequotas:ListServices", "servicequotas:ListServiceQuotas", "servicequotas:RequestServiceQuotaIncrease", "sts:*", "support:*", "tag:*" ], "Resource": "*" } ] }
- Create an IAM role:
1. Select an AWS service and Amazon EC2, then click Next: Permissions.
2. Select the cp4d-installer-permissions policy, and click Next.
3. Name it cp4d-installer, and click Create role. - From your AWS Cloud9 IDE, click the circle button on the top right, and select Manage EC2 Instance (Figure 4).
- On the Amazon EC2 console, select the AWS Cloud9 instance, then choose Actions / Security / Modify IAM Role.
- Choose cp4d-installer from the IAM Role drop down, and click Update IAM role (Figure 5).
- Update the IAM settings for your AWS Cloud9 workspace:
aws cloud9 update-environment --environment-id $C9_PID --managed-credentials-action DISABLE rm -vf ${HOME}/.aws/credentials
- Ensure the Elastic Load Balancing service-linked role exists in your AWS account:
aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" || aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
- Setup your AWS environment:
export ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account) export AWS_REGION=$(curl -s 169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region') aws configure set default.region ${AWS_REGION}
- Navigate to the Red Hat Hybrid Cloud Console, and copy your OpenShift Cluster Manager API Token.
- Use the token and log in to your Red Hat account:
rosa login --token=<YOUR_ROSA_API_TOKEN>
- Verify that your AWS account satisfies the quotas to deploy your cluster:
rosa verify quota
- When deploying ROSA for the first time, create the account-wide roles:
rosa create account-roles --mode auto –yes
- Create your ROSA cluster:
export ROSA_CLUSTER_NAME=<YOUR_CLUSTER_NAME> rosa create cluster --cluster-name ${ROSA_CLUSTER_NAME} --sts \ --multi-az \ --region ${AWS_REGION} \ --version 4.10.47 \ --compute-machine-type m5.4xlarge \ --compute-nodes 3 \ --operator-roles-prefix ${ROSA_CLUSTER_NAME} \ --mode auto --yes \ --watch
- Once your cluster is ready, create a cluster-admin user and take note of the cluster API URL, username, and password:
rosa create admin --cluster=${ROSA_CLUSTER_NAME}
- Log in to your cluster using the login information from the previous step. For example:
oc login https://<YOUR_CLUSTER_API_ADDRESS>:6443 \ --username cluster-admin \ --password <YOUR_CLUSTER_ADMIN_PASSWORD>
- Create an inbound rule in your worker nodes security group, allowing NFS traffic from your cluster’s VPC CIDR:
WORKER_NODE=$(oc get nodes --selector=node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.name}') VPC_ID=$(aws ec2 describe-instances --filters "Name=private-dns-name,Values=$WORKER_NODE" --query 'Reservations[*].Instances[*].{VpcId:VpcId}' | jq -r '.[0][0].VpcId') VPC_CIDR=$(aws ec2 describe-vpcs --filters "Name=vpc-id,Values=$VPC_ID" --query 'Vpcs[*].CidrBlock' | jq -r '.[0]') SG_ID=$(aws ec2 describe-instances --filters "Name=private-dns-name,Values=$WORKER_NODE" --query 'Reservations[*].Instances[*].{SecurityGroups:SecurityGroups}' | jq -r '.[0][0].SecurityGroups[0].GroupId') aws ec2 authorize-security-group-ingress \ --group-id $SG_ID \ --protocol tcp \ --port 2049 \ --cidr $VPC_CIDR | jq .
- Create an Amazon EFS file system:
EFS_FS_ID=$(aws efs create-file-system --performance-mode generalPurpose --encrypted --region ${AWS_REGION} --tags Key=Name,Value=ibm_cp4d_fs | jq -r '.FileSystemId') SUBNETS=($(aws ec2 describe-subnets --filters "Name=vpc-id,Values=${VPC_ID}" "Name=tag:Name,Values=*${ROSA_CLUSTER_NAME}*private*" | jq --raw-output '.Subnets[].SubnetId')) for subnet in ${SUBNETS[@]}; do aws efs create-mount-target \ --file-system-id $EFS_FS_ID \ --subnet-id $subnet \ --security-groups $SG_ID done
- Log in to Container software library on My IBM and copy your API key.
- In this blog, we are installing CP4D with IBM Watson Machine Learning and IBM Watson Studio.
- Review the IBM documentation to determine which CP4D components you need to install to support your requirements.
- Export environment variables for the CP4D installation. The COMPONENTS variable defines which services will be installed:
export OCP_URL=<https://YOUR_CLUSTER_API_ADDRESS:6443> export OPENSHIFT_TYPE=rosa export IMAGE_ARCH=amd64 export OCP_USERNAME=cluster-admin export OCP_PASSWORD=<YOUR_CLUSTER_ADMIN_PASSWORD> export PROJECT_CPFS_OPS=ibm-common-services export PROJECT_CATSRC=openshift-marketplace export PROJECT_CPD_INSTANCE=cpd-instance export STG_CLASS_BLOCK=gp3-csi export STG_CLASS_FILE=efs-nfs-client export IBM_ENTITLEMENT_KEY=<YOUR_IBM_API_KEY> export VERSION=4.6.1 export COMPONENTS=cpfs,scheduler,cpd_platform,ws,wml export EFS_LOCATION=${EFS_FS_ID}.efs.${AWS_REGION}.amazonaws.com export EFS_PATH=/ export PROJECT_NFS_PROVISIONER=nfs-provisioner export EFS_STORAGE_CLASS=efs-nfs-client export NFS_IMAGE=k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
- Install the CP4D cli:
curl -v https://icr.io mkdir ibm-cp4d && wget https://github.com/IBM/cpd-cli/releases/download/v12.0.1/cpd-cli-linux-SE-12.0.1.tgz -O - | tar -xz -C ~/environment/ibm-cp4d --strip-components=1 export PATH=/home/ec2-user/environment/ibm-cp4d:$PATH cpd-cli manage restart-container
- Log in to your ROSA cluster:
cpd-cli manage login-to-ocp --username=${OCP_USERNAME} \ --password=${OCP_PASSWORD} --server=${OCP_URL}
- Setup persistent storage for your cluster:
cpd-cli manage setup-nfs-provisioner \ --nfs_server=${EFS_LOCATION} --nfs_path=${EFS_PATH} \ --nfs_provisioner_ns=${PROJECT_NFS_PROVISIONER} \ --nfs_storageclass_name=${EFS_STORAGE_CLASS} \ --nfs_provisioner_image=${NFS_IMAGE}
- Create projects to deploy the CP4D software:
oc new-project ${PROJECT_CPFS_OPS} oc new-project ${PROJECT_CPD_INSTANCE}
- Modify load balancer timeout settings to prevent connections from being closed before processes complete:
LOAD_BALANCER=`aws elb describe-load-balancers --output text | grep $VPC_ID | awk '{ print $5 }' | cut -d- -f1 | xargs` for lbs in ${LOAD_BALANCER[@]}; do aws elb modify-load-balancer-attributes \ --load-balancer-name $lbs \ --load-balancer-attributes "{\"ConnectionSettings\":{\"IdleTimeout\":600}}" done
- Modify the pids_limit setting for the CRI-O container runtime on OpenShift:
cpd-cli manage apply-crio \ --openshift-type=${OPENSHIFT_TYPE}
- Configure the global image pull-secret to pull images from the IBM container repository:
cpd-cli manage add-icr-cred-to-global-pull-secret \ ${IBM_ENTITLEMENT_KEY}
- Create the operators and operator subscriptions for your CP4D installation:
cpd-cli manage apply-olm \ --release=${VERSION} \ --components=${COMPONENTS}
- Install the CP4D platform and services:
cpd-cli manage apply-cr \ --components=${COMPONENTS} \ --release=${VERSION} \ --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \ --block_storage_class=${STG_CLASS_BLOCK} \ --file_storage_class=${STG_CLASS_FILE} \ --license_acceptance=true
- Get your CP4D URL and admin credentials:
cpd-cli manage get-cpd-instance-details \ --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \ --get_admin_initial_credentials=true
- The command output will display the URL of your CP4D and the password for your Admin user (Figure 6):
- Using the information from the Step 35 (CP4D URL, User, Admin Password), access your CP4D console.
- From the CP4D home (welcome page), click on Discover Services to be directed to the Services catalog.
- From the Services catalog, you can see all CP4D available services.
- Use the search bar to filter for Watson, and find the IBM Watson Machine Learning and IBM Watson Studio services. Note how they are displayed as Enabled (Figure 7).
Congratulations! You have successfully deployed IBM CP4D on Red Hat OpenShift on AWS.
Post-installation
Review the following topics, when you installing CP4D on production:
- Review the IBM system requirements documentation to calculate the size of your ROSA cluster.
- Launch a specialized installation in your environment.
- Review the administrative tasks to enable security, maintenance, monitoring, managing users, and backing up your environment.
- How to setup services after you have installed the platform.
- Configure identity providers on ROSA.
- Enable auto scaling for your ROSA cluster.
- Configure logging and enable monitoring for your ROSA cluster.
Cleanup
Connect to your AWS Cloud9 workspace, and run the following steps to delete the CP4D installation, including ROSA. This avoids incurring future charges on your AWS account:
EFS_FS_ID=$(aws efs describe-file-systems \
--query 'FileSystems[?Name==`ibm_cp4d_fs`].FileSystemId' \
--output text)
MOUNT_TARGETS=$(aws efs describe-mount-targets --file-system-id $EFS_FS_ID --query 'MountTargets[*].MountTargetId' --output text)
for mt in ${MOUNT_TARGETS[@]}; do
aws efs delete-mount-target --mount-target-id $mt
done
aws efs delete-file-system --file-system-id $EFS_FS_ID
rosa delete cluster -c $ROSA_CLUSTER_NAME --yes --region $AWS_REGION
To monitor your cluster uninstallation logs, run:
rosa logs uninstall -c $ROSA_CLUSTER_NAME --watch
Once the cluster is uninstalled, remove the operator-roles
and oidc-provider
, as informed in the output of the rosa delete
command. For example:
rosa delete operator-roles -c <OPERATOR_ROLES_NAME> -m auto -y
rosa delete oidc-provider -c <OIDC_PROVIDER_NAME> -m auto -y
Conclusion
In summary, we explored how customers can take advantage of a fully managed OpenShift service on AWS to run IBM CP4D. With this implementation, customers can focus on what is important to them, their workloads, and their customers, and less on the day-to-day operations of managing OpenShift to run CP4D.
If you are interested in learning more about CP4D on AWS, explore the IBM Cloud Pak for Data (CP4D) on AWS Modernization Workshop.
Visit the AWS Marketplace for a complete list of offerings from IBM Data & AI.
Further reading
- Building a healthcare data pipeline on AWS with IBM Cloud Pak for Data
- IBM Cloud Pak for Data Simplifies and Automates How You Turn Data into Insights
Additional resources
- IBM on AWS Partner Page
- Red Hat OpenShift Service on AWS: architecture and networking
- Red Hat OpenShift Service on AWS: private clusters with AWS PrivateLink