AWS Architecture Blog

Deploying IBM Cloud Pak for Data on Red Hat OpenShift Service on AWS

Amazon Web Services (AWS) customers who are looking for a more intuitive way to deploy and use IBM Cloud Pak for Data (CP4D) on the AWS Cloud, can now use the Red Hat OpenShift Service on AWS (ROSA).

ROSA is a fully managed service, jointly supported by AWS and Red Hat. It is managed by Red Hat Site Reliability Engineers and provides a pay-as-you-go pricing model, as well as a unified billing experience on AWS.

With this, customers do not manage the lifecycle of Red Hat OpenShift Container Platform clusters. Instead, they are free to focus on developing new solutions and innovating faster, using IBM’s integrated data and artificial intelligence platform on AWS, to differentiate their business and meet their ever-changing enterprise needs.

CP4D can also be deployed from the AWS Marketplace with self-managed OpenShift clusters. This is ideal for customers with requirements, like Red Hat OpenShift Data Foundation software defined storage, or who prefer to manage their OpenShift clusters.

In this post, we explain how to create a ROSA cluster and perform an express installation of CP4D.

Cloud Pak for data architecture

Here, we are implementing a highly available ROSA cluster with three Availability Zones (AZs), three master nodes, three infrastructure nodes, and three worker nodes.

Review the AWS Regions and Availability Zones documentation and the regions where ROSA is available to choose the best region for your deployment.

Figure 1 demonstrates the solution’s architecture.

IBM Cloud Pak for Data on ROSA

Figure 1. IBM Cloud Pak for Data on ROSA

In our scenario, we are building a public ROSA cluster, with an internet-facing Classic Load Balancer providing access to Ports 80 and 443. Consider using a ROSA private cluster when you are deploying CP4D in your AWS account.

We are using Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic File System (Amazon EFS) for the cluster’s persistent storage. Review the IBM documentation for information about supported storage options.

Also, review the AWS prerequisites for ROSA and follow the Security best practices in IAM documentation, before deploying CP4D for production workloads, to protect your AWS account before deploying CP4D.

Cost

You are responsible for the cost of the AWS services used when deploying CP4D in your AWS account. For cost estimates, see the pricing pages for each AWS service you use.

Prerequisites

Before getting started, review the following prerequisites for this solution:

Installation steps

Complete the following steps to deploy CP4D on ROSA:

  1. From the AWS ROSA console, click on Enable ROSA to activate the service on your AWS account (Figure 2).

    Enable ROSA on your AWS account

    Figure 2. Enable ROSA on your AWS account

  2. Create an AWS Cloud9 environment to run your CP4D installation. We’ve used a t3.medium instance (Figure 3).

    Create an AWS Cloud9 environment

    Figure 3. Create an AWS Cloud9 environment

  3. After your AWS Cloud9 environment is up, close the Welcome tab and open a new Terminal tab and install the required packages:
    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
    unzip awscliv2.zip
    sudo ./aws/install
    sudo yum -y install jq gettext
    sudo wget -c https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz -O - | sudo tar -xz -C /usr/local/bin/
    sudo wget -c https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/stable/openshift-client-linux.tar.gz -O - | sudo tar -xz -C /usr/local/bin/
  4. Create an IAM policy named cp4d-installer-permissions with the following permissions:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "autoscaling:*",
                    "cloudformation:*",
                    "cloudwatch:*",
                    "ec2:*",
                    "elasticfilesystem:*",
                    "elasticloadbalancing:*",
                    "events:*",
                    "iam:*",
                    "kms:*",
                    "logs:*",
                    "route53:*",
                    "s3:*",
                    "servicequotas:GetRequestedServiceQuotaChange",
                    "servicequotas:GetServiceQuota",
                    "servicequotas:ListServices",
                    "servicequotas:ListServiceQuotas",
                    "servicequotas:RequestServiceQuotaIncrease",
                    "sts:*",
                    "support:*",
                    "tag:*"
                ],
                "Resource": "*"
            }
        ]
    }
  5. Create an IAM role:
    1. Select an AWS service and Amazon EC2, then click Next: Permissions.
    2. Select the cp4d-installer-permissions policy, and click Next.
    3. Name it cp4d-installer, and click Create role.
  6. From your AWS Cloud9 IDE, click the circle button on the top right, and select Manage EC2 Instance (Figure 4).

    Manage the AWS Cloud9 EC2 instance

    Figure 4. Manage the AWS Cloud9 EC2 instance

  7. On the Amazon EC2 console, select the AWS Cloud9 instance, then choose Actions / Security / Modify IAM Role.
  8. Choose cp4d-installer from the IAM Role drop down, and click Update IAM role (Figure 5).

    Attach the IAM role to your workspace

    Figure 5. Attach the IAM role to your workspace

  9. Update the IAM settings for your AWS Cloud9 workspace:
    aws cloud9 update-environment --environment-id $C9_PID --managed-credentials-action DISABLE
    rm -vf ${HOME}/.aws/credentials
  10. Ensure the Elastic Load Balancing service-linked role exists in your AWS account:
    aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" || aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
  11. Setup your AWS environment:
    export ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account)
    export AWS_REGION=$(curl -s 169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region')
    aws configure set default.region ${AWS_REGION}
  12. Navigate to the Red Hat Hybrid Cloud Console, and copy your OpenShift Cluster Manager API Token.
  13. Use the token and log in to your Red Hat account:
    rosa login --token=<YOUR_ROSA_API_TOKEN>
  14. Verify that your AWS account satisfies the quotas to deploy your cluster:
    rosa verify quota
  15. When deploying ROSA for the first time, create the account-wide roles:
    rosa create account-roles --mode auto –yes
  16. Create your ROSA cluster:
    export ROSA_CLUSTER_NAME=<YOUR_CLUSTER_NAME>
    
    rosa create cluster --cluster-name ${ROSA_CLUSTER_NAME} --sts \
      --multi-az \
      --region ${AWS_REGION} \
      --version 4.10.47 \
      --compute-machine-type m5.4xlarge \
      --compute-nodes 3 \
      --operator-roles-prefix ${ROSA_CLUSTER_NAME} \
      --mode auto --yes \
      --watch
  17. Once your cluster is ready, create a cluster-admin user and take note of the cluster API URL, username, and password:
    rosa create admin --cluster=${ROSA_CLUSTER_NAME}
  18. Log in to your cluster using the login information from the previous step. For example:
    oc login https://<YOUR_CLUSTER_API_ADDRESS>:6443 \
      --username cluster-admin \
      --password <YOUR_CLUSTER_ADMIN_PASSWORD>
  19. Create an inbound rule in your worker nodes security group, allowing NFS traffic from your cluster’s VPC CIDR:
    WORKER_NODE=$(oc get nodes --selector=node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.name}')
    VPC_ID=$(aws ec2 describe-instances --filters "Name=private-dns-name,Values=$WORKER_NODE" --query 'Reservations[*].Instances[*].{VpcId:VpcId}' | jq -r '.[0][0].VpcId')
    VPC_CIDR=$(aws ec2 describe-vpcs --filters "Name=vpc-id,Values=$VPC_ID" --query 'Vpcs[*].CidrBlock' | jq -r '.[0]')
    SG_ID=$(aws ec2 describe-instances --filters "Name=private-dns-name,Values=$WORKER_NODE" --query 'Reservations[*].Instances[*].{SecurityGroups:SecurityGroups}' | jq -r '.[0][0].SecurityGroups[0].GroupId')
    aws ec2 authorize-security-group-ingress \
      --group-id $SG_ID \
      --protocol tcp \
      --port 2049 \
      --cidr $VPC_CIDR | jq .
  20. Create an Amazon EFS file system:
    EFS_FS_ID=$(aws efs create-file-system --performance-mode generalPurpose --encrypted --region ${AWS_REGION} --tags Key=Name,Value=ibm_cp4d_fs | jq -r '.FileSystemId')
    SUBNETS=($(aws ec2 describe-subnets --filters "Name=vpc-id,Values=${VPC_ID}" "Name=tag:Name,Values=*${ROSA_CLUSTER_NAME}*private*" | jq --raw-output '.Subnets[].SubnetId'))
    for subnet in ${SUBNETS[@]}; do
      aws efs create-mount-target \
        --file-system-id $EFS_FS_ID \
        --subnet-id $subnet \
        --security-groups $SG_ID
    done
  21. Log in to Container software library on My IBM and copy your API key.
  22. In this blog, we are installing CP4D with IBM Watson Machine Learning and IBM Watson Studio.
  23. Review the IBM documentation to determine which CP4D components you need to install to support your requirements.
  24. Export environment variables for the CP4D installation. The COMPONENTS variable defines which services will be installed:
    export OCP_URL=<https://YOUR_CLUSTER_API_ADDRESS:6443>
    export OPENSHIFT_TYPE=rosa
    export IMAGE_ARCH=amd64
    export OCP_USERNAME=cluster-admin
    export OCP_PASSWORD=<YOUR_CLUSTER_ADMIN_PASSWORD>
    export PROJECT_CPFS_OPS=ibm-common-services
    export PROJECT_CATSRC=openshift-marketplace
    export PROJECT_CPD_INSTANCE=cpd-instance
    export STG_CLASS_BLOCK=gp3-csi
    export STG_CLASS_FILE=efs-nfs-client
    export IBM_ENTITLEMENT_KEY=<YOUR_IBM_API_KEY>
    export VERSION=4.6.1
    export COMPONENTS=cpfs,scheduler,cpd_platform,ws,wml
    export EFS_LOCATION=${EFS_FS_ID}.efs.${AWS_REGION}.amazonaws.com
    export EFS_PATH=/
    export PROJECT_NFS_PROVISIONER=nfs-provisioner
    export EFS_STORAGE_CLASS=efs-nfs-client
    export NFS_IMAGE=k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
  25. Install the CP4D cli:
    curl -v https://icr.io
    mkdir ibm-cp4d && wget https://github.com/IBM/cpd-cli/releases/download/v12.0.1/cpd-cli-linux-SE-12.0.1.tgz -O - | tar -xz -C ~/environment/ibm-cp4d --strip-components=1
    export PATH=/home/ec2-user/environment/ibm-cp4d:$PATH
    cpd-cli manage restart-container
  26. Log in to your ROSA cluster:
    cpd-cli manage login-to-ocp --username=${OCP_USERNAME} \
    --password=${OCP_PASSWORD} --server=${OCP_URL}
  27. Setup persistent storage for your cluster:
    cpd-cli manage setup-nfs-provisioner \
    --nfs_server=${EFS_LOCATION} --nfs_path=${EFS_PATH} \
    --nfs_provisioner_ns=${PROJECT_NFS_PROVISIONER} \
    --nfs_storageclass_name=${EFS_STORAGE_CLASS} \
    --nfs_provisioner_image=${NFS_IMAGE}
  28. Create projects to deploy the CP4D software:
    oc new-project ${PROJECT_CPFS_OPS}
    oc new-project ${PROJECT_CPD_INSTANCE}
  29. Modify load balancer timeout settings to prevent connections from being closed before processes complete:
    LOAD_BALANCER=`aws elb describe-load-balancers --output text | grep $VPC_ID | awk '{ print $5 }' | cut -d- -f1 | xargs`
    for lbs in ${LOAD_BALANCER[@]}; do
      aws elb modify-load-balancer-attributes \
        --load-balancer-name $lbs \
        --load-balancer-attributes "{\"ConnectionSettings\":{\"IdleTimeout\":600}}"
    done
  30. Modify the pids_limit setting for the CRI-O container runtime on OpenShift:
    cpd-cli manage apply-crio \
    --openshift-type=${OPENSHIFT_TYPE}
  31. Configure the global image pull-secret to pull images from the IBM container repository:
    cpd-cli manage add-icr-cred-to-global-pull-secret \
    ${IBM_ENTITLEMENT_KEY}
  32. Create the operators and operator subscriptions for your CP4D installation:
    cpd-cli manage apply-olm \
    --release=${VERSION} \
    --components=${COMPONENTS}
  33. Install the CP4D platform and services:
    cpd-cli manage apply-cr \
    --components=${COMPONENTS} \
    --release=${VERSION} \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --block_storage_class=${STG_CLASS_BLOCK} \
    --file_storage_class=${STG_CLASS_FILE} \
    --license_acceptance=true
  34. Get your CP4D URL and admin credentials:
    cpd-cli manage get-cpd-instance-details \
    --cpd_instance_ns=${PROJECT_CPD_INSTANCE} \
    --get_admin_initial_credentials=true
  35. The command output will display the URL of your CP4D and the password for your Admin user (Figure 6):

    CP4D URL and admin credentials

    Figure 6. CP4D URL and admin credentials.

  36. Using the information from the Step 35 (CP4D URL, User, Admin Password), access your CP4D console.
  37. From the CP4D home (welcome page), click on Discover Services to be directed to the Services catalog.
  38. From the Services catalog, you can see all CP4D available services.
  39. Use the search bar to filter for Watson, and find the IBM Watson Machine Learning and IBM Watson Studio services. Note how they are displayed as Enabled (Figure 7).
    Services enabled in your CP4D catalog

    Figure 7. Services enabled in your CP4D catalog

    Congratulations! You have successfully deployed IBM CP4D on Red Hat OpenShift on AWS.

Post-installation

Review the following topics, when you installing CP4D on production:

Cleanup

Connect to your AWS Cloud9 workspace, and run the following steps to delete the CP4D installation, including ROSA. This avoids incurring future charges on your AWS account:

EFS_FS_ID=$(aws efs describe-file-systems \
  --query 'FileSystems[?Name==`ibm_cp4d_fs`].FileSystemId' \
  --output text)

MOUNT_TARGETS=$(aws efs describe-mount-targets --file-system-id $EFS_FS_ID --query 'MountTargets[*].MountTargetId' --output text)

for mt in ${MOUNT_TARGETS[@]}; do
  aws efs delete-mount-target --mount-target-id $mt
done

aws efs delete-file-system --file-system-id $EFS_FS_ID

rosa delete cluster -c $ROSA_CLUSTER_NAME --yes --region $AWS_REGION

To monitor your cluster uninstallation logs, run:

rosa logs uninstall -c $ROSA_CLUSTER_NAME --watch

Once the cluster is uninstalled, remove the operator-roles and oidc-provider, as informed in the output of the rosa delete command. For example:

rosa delete operator-roles -c <OPERATOR_ROLES_NAME> -m auto -y
rosa delete oidc-provider -c <OIDC_PROVIDER_NAME> -m auto -y

Conclusion

In summary, we explored how customers can take advantage of a fully managed OpenShift service on AWS to run IBM CP4D. With this implementation, customers can focus on what is important to them, their workloads, and their customers, and less on the day-to-day operations of managing OpenShift to run CP4D.

If you are interested in learning more about CP4D on AWS, explore the IBM Cloud Pak for Data (CP4D) on AWS Modernization Workshop.

Visit the AWS Marketplace for a complete list of offerings from IBM Data & AI.

Further reading

Additional resources

 

Eduardo Monich Fronza

Eduardo Monich Fronza

Eduardo Monich Fronza is a Partner Solutions Architect at AWS. His experience includes Cloud, solutions architecture, application platforms, containers, workload modernization, and hybrid solutions. In his current role, Eduardo helps AWS partners and customers in their cloud adoption journey.