Setting up Apache Flink storage

To set up how Flink stores its internal state, you create a persistent volume, to which you assign an owner. Optionally, you also create a persistent volume claim.

About this task

Apache Flink needs a persistent volume to store its internal state and be able to support fault tolerance and high availability. If no dynamic provisioning has been set up, you must create a persistent volume. The persistent volume must provide enough space to fit the Persistent Volume Capacity value that you set at installation time, which is 20Gi by default.

Procedure

  1. Create the NFS shared folder for the persistent volume. In this example, the NFS shared folder is /export/NFS.
    The Flink user is the user that runs the Flink related containers for the job manager, task managers, and jobs. This user has the 9999 identifier. Because the containers need to access the persistent volume, the Flink user and group under ID 9999 must have read and write permissions to the folder. You can set up the folder on the NFS server as follows.
    mkdir /export/NFS/ibm-bai-pv
    chown -R 9999:9999 /export/NFS/ibm-bai-pv 
    chmod 770 /export/NFS/ibm-bai-pv
  2. Create the persistent volume.

    It is safer to apply the Retain reclaim policy to make sure data is kept on release.

    1. Use the following YAML file to create a persistent volume and replace the placeholders with the values that are appropriate for your environment.
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: ibm-bai-pv
      spec:
        accessModes:
        - ReadWriteMany
        capacity:
          storage: <storage_capacity>
        nfs:
          path: /export/NFS/ibm-bai-pv
          server: <server-ip>
        persistentVolumeReclaimPolicy: Retain
        claimRef:
          namespace: <my-namespace>
          name: <my-pvc-name>
      Tip: The claimRef section is optional. However, in production mode, you must set it to make sure that your release always uses the same volume and that you do not lose data. If you add the claimRef section, you must set the namespace and name of the persistent volume claim as in Step 3.
    2. To create the persistent volume in your IBM® Cloud Private environment, run the apply command.
      kubectl apply -f <pv_sample.yaml>
  3. Optional: Create the persistent volume claim (PVC).
    1. Use the following YAML file to create a persistent volume claim by replacing the placeholders with the appropriate values.
      The value of <my-pvc-name> must match the claimRef section of the persistent volume that you will use when you configure your Business Automation Insights installation. The <storage-size> value must be smaller than, or equal to, the value of the storage capacity of the persistent volume.
      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: <my-pvc-name>
        namespace: <my-namespace>
      spec:
        storageClassName: ""
        accessModes:
          - ReadWriteMany
        resources:
          requests:
            storage: <storage_size>
    2. To create the persistent volume in your IBM Cloud Private environment, run the apply command.
      kubectl apply -f <pv_sample.yaml>
  4. Optional: If you want to refine how the persistent volumes are bound, provide a storageClassName value to the .yaml file of the persistent volume, and later reference this storage class name when you configure your IBM Business Automation Insights installation.

    For more information, see the Class section of the Kubernetes documentation.