IBM Data Reduction Estimator Tool (DRET) for SVC, Storwize and FlashSystem products

Download

Downloadable File

File link	File size	File description

Abstract

Data Reduction Estimator tool (DRET) is a command-line host-based utility for estimating the data reduction saving on block devices.

In order to help with the profiling and analysis of existing user workloads that need to be migrated to a new system, IBM provides a highly accurate data reduction estimation tool, which supports both deduplication and compression. The tool operates by scanning target workloads on any storage array (IBM or 3rd party) and then merging all scan results to provide an integrated system level data reduction estimate.

Download Description

Overview

The Data Reduction Estimator utility uses advanced mathematical and statistical algorithms to perform an analysis with a low memory footprint. The utility runs on a host that has access to the devices to be analyzed. It performs read operations, so has no effect on the data stored on the device. The following sections provide information on installing Data Reduction Estimator on a host and by using it to analyze, devices on it. Depending on the environment configuration, in many cases Data Reduction Estimator can be used on more than one host in order to analyze more data types.

It is important to understand block device behavior, when analyzing traditional (fully allocated) volumes. Traditional volumes that were created without initial zeroing the device might contain traces of old data on the block device level. Such data is not accessible or viewable on the file system level. When using Data Reduction Estimator to analyze such volumes, the expected reduction results reflect the saving rate to be achieved for all the data on the block device level, including the traces of old data.

Regardless of the block device type being scanned, it is also important to understand a few principles of common file system space management. When files are deleted from a file system, the space they occupied before deletion becomes free and available to the file system. This happens even though the data on disk was not removed, but rather the file system index and pointers were updated to reflect this change. When using Data Reduction Estimator to analyze a block device used by a file system, all underlying data in the device is analyzed, regardless of whether this data belongs to files that were already deleted from the file system. For example, you can fill a 100GB file system and make it 100% used. You can then delete all the files in the file system making it 0% used. When scanning the block device used for storing the file system in this example, Data Reduction Estimator (or any other utility for that matter) accesses the data that belongs to the files that are already deleted.

In order to reduce the impact of block device and file system behavior previously mentioned, it is recommended to use Data Reduction Estimator to analyze volumes that contain as much active data as possible rather than volumes that are mostly empty of data. This increases accuracy level and reduces the risk of analyzing old data that is already deleted, but might still have traces on the device.

Using Data Reduction Estimator tool

There are 3 steps that need to be performed to get the reduction ratio.

Step 1: Obtain the device list.

Step 2: Estimate the reduction ratio on each of the devices independently.

Step 3: In case there are 2 or more scanned devices, merge the results from all independent volume scans together to view the data reduction estimation across all scanned volumes.

The next section will describe how to perform each of the steps.

See the Syntax section for a detailed description of the syntax and command-line options.

Step 1: Obtain the device list

This step is performed differently on different platforms.

Linux, ESX, AIX, Solaris, and HP-UX server:

Log in to the host by using the root account.
Obtain the list of device names, by using the following commands:
- Linux: "fdisk –l"
- ESX: "esxcli storage core device list | grep Dev"
- AIX: "lsdev –Cc disk"
- Solaris: "format"
- HP-UX: "ioscan –kfnC disk"

Windows server:

Log in to the server, by using an account with Administrator privileges.
Open an elevated command prompt with Administrator rights (Run as Administrator).
Run “wmic DISKDRIVE list brief”
C:\>wmic DISKDRIVE list brief
Caption DeviceID Model Partition Size
IBM 2145 Multi-Path Disk Device \\.\PHYSICALDRIVE0 IBM 2145 Multi-Path Disk Device 1 256052966400

Step 2: Estimate the reduction ratio on each of the devices independently

Run Data Reduction Estimator tool with the following parameters:

–d <device> the device to analyze, according to the DeviceID output
- --command scan|partialscan the scan command performs a full scan, and reports total data reduction saving estimations (deduplication + compression), while partial scan performs a short scan, estimating compression savings only but not dedup. The default is “scan”
–o outputFileName to keep the output results for the next step.

For example:

Linux:
Data-Reduction-Estimator -d /dev/sda1 -o scan_Linux_RHEL7

Windows:
Data-Reduction-Estimator.exe -d \\.\PHYSICALDRIVE0 -o scan_Win7

Data reduction estimator tool can scan one device at a time in a single CLI command. Using the --batchfile parameter, can handle several devices in parallel, when each line represents a different device.
An example of several devices in a single batch file:
In this example, devices /dev/sda and /dev/sdb are fully scanned, while device /dev/sdc are partially scanned.

During the scan process, checkpoints are stored in a file (with *.rdb extension)

.dat sketches are for completed scanned devices.

If the scan is terminated before completion, on the next execution the scan will continue from the checkpoint.
- -d /dev/sda
- -d /dev/sdb
- -d /dev/sdc –-command partialscan

Step 3: Merge the results from the independent scans together

Collect the output files of all scanned devices from step 2 and place them in the same directory, where the >Data-Reduction-Estimator tool is located;
To calculate the total data reduction saving, use the –-command merge option. Separate output file with a comma “,”:

For example:
Data-Reduction-Estimator --command merge --mergefiles scan_freebsd91_1024,scan_vendlist_1024,scan_win2008_1024,scan_Win7,scan_Linux_RHEL7
After the data reduction analysis is completed, the overall data reduction estimation for all devices is displayed.
- Volumes with overall data reduction ratio threshold of 90% or lower by default (equivalent to 10% savings or higher).
- --mergeall – Overrides the data reduction threshold. All volumes are merged.
Note:
- Merge is applied to data files generated by the same binary build. Otherwise, the “Build mismatch” error is generated;
- Merge cannot be applied on *.rdb sketches files. *.rdb are intermediate files created by unfinished scans.

Syntax

Linux, ESX, AIX, Solaris, and HP-UX:
Data-Reduction-Estimator –d <device> [-x Max MBps] [-o result data filename] [-s Update interval] [--command scan|merge|load|partialscan] [--mergefiles Files to merge] [--loglevel Log Level] [--batchfile batch file to process] [-h]

Windows:
Data-Reduction-Estimator.exe –d <device> [-x Max MBps] [-o result data filename] [-s Update interval] [--command scan|merge|load|partialscan] [--mergefiles Files to merge] [--loglevel Log Level] [--batchfile batch file to process] [-h]

-d	The device name. Linux: Path of device to analyze (for example /dev/sda in Linux) Windows: DeviceID. In order to get the DeviceID, use the wmic Windows utility. See the previous section for instructions on how to obtain the device list.
-x	Throughput limit up to X MBps. Default is 0 – No limit
-o	The name of the output file, the data file that contains the information on the analyzed device. Later it can be used for the “merge” option. If no name is provided, the output file is created with a default name.
-s	The update interval progress. Default is every 10 seconds.
--command	The operation mode. scan – Full scan (default). Estimates total data reduction saving. merge – Can be used after the scan of all the devices is completed in order to get the statistics average for all scanned devices. Minimum two files (volumes) are required. load – Can be used after the scan of a device is completed in order to load the device statistics from the .dat sketches. partialscan – Compression saving estimation. It is used for a quick scan sample.
--mergefiles	Total data saving for more than one scanned device. File list By default, devices with data saving lower than 90% are ignored. Every such instance is reported.
--mergeall	Override the 90% data saving threshold.
--loglevel	Log level to run. Values are 3 – 7 (default is 3).
--batchfile	Batch file to process. The batch file can contain several devices, with each line referring to a different device.

Examples:

Data Reduction Estimator output examples

root@swfc120:/tmp# ./Data-Reduction-Estimator -d /dev/dm-10 Result data filename not given, auto-generating: file_C8F50050.dat 200.00 GB | 55.60 MBps: 0% [####################] 100%% Estimated Dedupe Savings: 11.659% Estimated Compression Savings: 65.797% Data Reduction Savings: 69.784% --------------------------------------- Zeroes Detected Savings: 0.227% Total Data efficiency Savings: 69.853% Time Consumed: 00:11:01 Analyzing the results of the above example: Volume size: 200GB Auto-generated output file: file_C8F50050.dat

To get the total data on disk after reduction, take the data reduction saving of the disk size:
(100% - 69.784%) of 200GB = 60.432GB
Data on disk after reduction (dedup and compression) is 60.432GB

The Zeroes Detected saving refers to large sequences of zeros that were detected on the device. It is not an inherent part of the data reduction saving, as some systems consider this as thin provisioning. The total data efficiency is the total savings, including deduplication, compression, and the large zero sequences combined.

First data is deduped, and then it is compressed.
Dedup saving:
0.11659 * 200GB = 23.318GB
Compression saving (after dedup)
0.65797 * (200 - 23.318GB) = 116.210GB

Total saving: 23.318GB + 116.210GB = 139.528GB (~70% of 200GB)

Next example illustrates the total data saving on two volumes: RHEL7 and win2008.

Data-Reduction-Estimator --command merge --mergefiles scan_Linux_RHEL7,scan_win2008_1024 Result data filename not given, auto-generating: merge_out Estimated Dedup Savings: 97.8%

Estimated Compression Savings: 16.3%

Data Reduction Savings: 98.2%

---------------------------------------

Zeroes Detected Savings: 4.11%

Total Data Efficiency Savings: 98.2%

Time Consumed: 00:00:00

Mergeall CLI syntax example:

Data-Reduction-Estimator --command merge --mergefiles scan_Linux_RHEL7,scan_win2008_1024 --mergeall

Load CLI syntax example:

Data-Reduction-Estimator --command load –o file_770F7F23.dat

Best practices

As a rule, ESXi operations require larger amounts of RAM. If the amount of RAM is not sufficient, the data reduction estimation tool performance might be degraded.

On Windows, Red Hat Linux, Ubuntu, AIX, and Solaris, the default number of concurrent threads performing the scan is 10.

If the scanning task is taking too long, decrease the number of threads. You can try to reduce the number of threads to 5, and then down to a single thread.

Thread reduction example:

DEDUPEL3=1 ./Data-Reduction-Estimator -p 5 –d XXXX

On HP-UX and ESXi, the default number of threads is 1. If more than 100MB of RAM are available for the tool, increase the number of threads, as detailed above.

In some cases, especially with ESXi, changing the number of threads might not result in a significant performance improvement. In this case, run a partial scan. It estimates compression savings only and requires smaller amount of RAM.

Enhancements introduced in V1.03

New command parameter – “load”. Reloads devices statistics of previously scanned devices (see the Syntax section for details);
Resuming scans. Checkpoints are stored in *.rdb files. If scanning is stopped, the next scan continues from the last checkpoint;
Smart merges. Allow merging volumes with overall data reduction ratio higher than the defined threshold only (default is 90% or lower).

Prerequisites

Data Reduction Estimator can be used on the following client operating systems:

Windows 2008 Server, Windows 2012
Red Hat Enterprise Linux Version 5.x, 6.x, 7.x (64-bit)
UBUNTU 12.04
ESX 5.0, 5.5, 6.0
AIX 6.1, 7.1
Solaris 10
HP-UX 11.31

Minimum hardware requirements

HP-UX and ESXi – 100 MB of free RAM
Windows, Red Hat Linux, Ubuntu, AIX, Solaris – 500 MB of free RAM

Note: If there is not enough unique data in the device, the results might not be accurate. Therefore, the scan is limited to devices with >= 400M of unique data. In case a device does not have enough unique data. The scan fails with an “Not enough unique data to estimate savings” error.

Installation Instructions

Data Reduction Estimator can be installed only on supported Windows operating systems (refer to list). After installation, the binary files for other supported operating systems become available in the Windows installation folder.

By default, the files are copied to:

Windows 64-bit: C:\Program Files(x86)\IBM\Data Reduction Estimation Tool
Windows 32-bit: C:\Program Files\IBM\Data Reduction Estimation Tool

SHA256 checksum = df1f6b8a9ee3f7a8cc71960ec5d26bfe5859ce8afad47efe7ca0e10b925b74bd

Off

          [{"DNLabel":"IBM Data Reduction Estimator Tool v1.03 (Build 153)","DNDate":"02 Oct 2018","DNLang":"English","DNSize":"5956000 B","DNPlat":{"label":"Windows","code":"PF033"},"DNURL":"https://ftp.software.ibm.com/storage/san/sanvc/dret_v1.03_build153/data_reduction_estimator_153.exe","DNURL_FTP":"","DDURL":null}]
          

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSA76Z4","label":"IBM FlashSystem 7x00"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR9","label":"IBM FlashSystem 5000"},"ARM Category":[{"code":"a8m0z000000bqPqAAI","label":"Documentation"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STPVGU","label":"SAN Volume Controller"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR7","label":"IBM Storwize V7000"},"ARM Category":[{"code":"a8m0z000000bqQoAAI","label":"A-V7000"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB26","label":"Storage"}}]

Tips