Purging of poll data
Poll data is purged using database partitioning, The raw and historical poll data tables in the NCPOLLDATA database are partitioned based on time. Partitions are added and detached at regular intervals.
- Raw poll data table, polldata
- Historical poll data tables:
- pdEwmaForDay
- pdEwmaForWeek
- pdEwmaForMonth
- pdEwmaForYear
Table |
Number of partitions |
Length of each partition |
---|---|---|
pollData |
8 |
10 minutes |
pdEwmaForDay |
26 |
1 hour |
pdEwmaForWeek |
16 |
12 hours |
pdEwmaForMonth |
33 |
1 day |
pdEwmaForYear |
14 |
30 days |
The goal is to always be writing new data to the penultimate partition of each of the tables. Once the time range for a partition is complete, a new partition is added. If the table has more than the defined number of partitions for that table, then the oldest partition in the table is detached, and that data is removed from storage.
![Sequence of events for the writing raw poll data to the pollData table](../../images/cd_polldatapartitions.gif)
- 1 Polling engine, ncp_poller, writes data to the pollData table
- The Polling engine, ncp_poller, writes data to the penultimate partition in the table.
- 2 New partition is added
- Once the time range for the partition that is being written to is complete, a new empty partition is added to the "front" of the table. This ensures that there is always a spare partition at the front of the table.
- 3 The oldest partition is detached from the table
- With the addition of the new partition, the system detects that the table has more than the defined number of partitions for that table, which in the case of the pollData table is 8. At that point, the oldest partition in the table is detached, and that raw data is removed from storage.
![Sequence of events for the writing aggregated poll data to the pdEwmaForDay aggregated poll data table](../../images/cd_pdEwmaForDaypartitions.gif)
- 1 Apache Storm process aggregates raw poll data
- Every 15 minutes Apache Storm calculates the average of the last 15 minutes of data in the raw poll data table, pollData.
- 2 Storm writes data to the pdEwmaForDay table
- Storm writes data to the penultimate partition in the pdEwmaForDay table.
- 3 New partition is added
- Once the time range for the partition that is being written to is complete, a new empty partition is added to the "front" of the table. This ensures that there is always a spare partition at the front of the table.
- 4 The oldest partition is detached from the table
- With the addition of the new partition, the system detects that the table has more than the defined number of partitions for that table, which in the case of the pdEwmaForDay table is 26. At that point, the oldest partition in the table is detached, and that raw data is removed from storage.
Network Manager monitors the historical poll data database tables and sends alerts if it detects that data in the tables is outside of age limits or that the amount of data in the table violates size limits. Data age violations are an indication that the Apache Storm process might not be running. Table size violations are an indication that the poll data storage rate is too high. In either case, you will need do some troubleshooting work to get to the root of the problem.