Configuring data retention and index rollover time periods

You can set a time period for data retention and index rollover using the Advanced Analytics Configuration option for an analytics service.

Before you begin

This task assumes you have registered an analytics service and associated it with a gateway service. See Registering an analytics service and Associating an analytics service with a gateway service.

One of the following roles is required to set the Data Retention and Index Rollover values:

Administrator
Topology Administrator
Owner
A custom role with the topology:manage permission

About this task

Each API call is recorded as a document in a shared index. Periodically, a new index is created and the previous index is stored. You can specify Index Rollover and Data Retention settings that control how often new indexes are created, and how long indexes are stored.

The Index Rollover settings determine how long an index accumulates data before it "rolls over" into storage and a new index is created. You can control the duration of an index two ways: by setting a maximum age for the index (default is 1 day), and by setting a maximum number of documents that can be recorded in the index (default is 25 million documents). When the rollover occurs, a new index is created and documents are recorded there, while the previous index is stored until its age exceeds the Data Retention setting. If you change the rollover settings, consider both the amount of data being stored within each index as well as the number of indexes being stored. Allowing indexes to grow too large, or storing too many indexes at once, might cause issues.

The Data Retention setting determines how long the stored indexes (and the data they contain) are retained. Once every day, all indexes that are older than the specified retention period are purged. Retention is based on the index's age, not the age of the data within that index. When the index's own creation date exceeds the retention period, the index and all its data is deleted even if some of the data stored in that index is younger than the retention period. Data is purged by deleting entire records so you cannot choose to delete only certain fields.

The default retention period is 90 days. Reasons for changing this setting include storage constraints, and data retention requirements for your organization. You might want to set this value to be less than the default if a large number of API events are stored, especially if payload logging is enabled on the APIs. Although there is no hard retention limit, we do not recommend exceeding 10 years (approximately 3650 days). If you modify the retention value, you should modify the index rollover setting as well to ensure that they remain in sync.

The most recently created index (the index that is currently being written to) is not deleted, even if you set the retention period as small as 1 hour. If you regularly need to delete data quickly, adjust the Index Rollover setting so that a new index is rolled over to sooner, and the old index can be deleted.

To change the data retention and index rollover settings, you must configure you settings in the Cloud Manager, and then edit the schedule in the related cronjob as explained in the following procedure.

Procedure

Modify the Cloud Manager properties that specify the data retention and index rollover settings:
1. In the Cloud Manager, clickTopology.
2. Locate the Analytics service that requires data retention and index rollover settings, then click the title to open the Edit Analytics Service page.
3. Click the Advanced Analytics Configuration link to load Kibana.
4. In Kibana, click Storage in the API Connect section.
5. For Data Retention, select a number and a unit of time, and then click Save.
6. Enter the desired value for Index Rollover and click Save.
Modify the schedule for the cronjob that manages data retention and index rollovers.

If you want to set the data retention or index rollover frequency to more than once per day, then you must also edit the schedule that determines when the related cronjob runs.

On all platforms, the retention and rollover actions are triggered using a Kubernetes cronjob, which runs once per day by default. If you want the cronjob to run more frequently, modify its schedule by completing the following steps:
1. View the current cronjob schedule by running the following command:
  - Kubernetes:
```
kubectl -n <apic-analytics-namespace> get cronjob
```
  - OpenShift and Cloud Pak for Integration:
```
oc -n <apic-analytics-namespace> get cronjob
```
  - VMware:
    1. Run the following command to connect as the API Connect administrator, replacing <ip_address> with the appropriate IP address:
      ssh <ip_address> -l apicadm
    2. When prompted, select Yes to continue connecting.
    3. When you are connected, run the following command to receive the necessary permissions for working directly on the appliance:
      sudo -i
    4. View the current cronjob schedule by running the following command:
      kubectl -n <apic-analytics-namespace> get cronjob
  The response displays the current SCHEDULE, as in the following example:
```
NAME                        SCHEDULE        SUSPEND    ACTIVE   LAST SCHEDULE   AGE
analytics-cj-retention      30 1 * * *      False      0        8h              266d
analytics-cj-rollover       15,45 * * * *   False      0        19m             266d
```
2. Run the following command to edit the cronjob so you can modify the schedules:
  - Kubernetes:
```
kubectl -n <apic-analytics-namespace> edit cronjob <cronjob_name>
```
  - OpenShift:
```
oc -n <apic-analytics-namespace> edit cronjob <cronjob_name>
```
  - VMware:
```
kubectl -n <apic-analytics-namespace> edit cronjob <cronjob_name>
```
3. Adjust the cronjob's rollover and retention schedules, and then save your changes.
  Important: Do not make any other changes to the cronjob. Other changes might result in the retention and rollover actions failing to complete successfully.
  
  To configure the job to run more than once a day, modify the schedule and update the minutes and hour values as needed (the values in positions 1 and 2).
  
  The format for the schedule consists of 5 fields, separated with spaces. Each field is represented with a * and based on its position, controls the value shown in the following list:
  - * * * * * schedule includes 5 fields
  - * * * * + ----- position 5: day of week (0 - 6 where 0=Sunday)
  - * * * + ------- position 4: month (1 - 12)
  - * * + --------- position 3: day of month (1 - 31)
  - * + ----------- position 2: hour (0 - 23)
  - + ------------- position 1: minutes (0 - 59)
  For example, the following schedule runs a job at 30 minutes (position 1) past the first hour (position 2) every day: 30 1 * * *
  
  If you want a job to run more than once an hour, you can specify multiple minutes values by separating them with a comma. The following schedule runs a job at 15 minutes and 45 minutes past the every hour of every day: 15,45 * * * *