You can use data that is stored as CSV, TSV, and Excel files on remote data sources
to create virtual tables. Segment or combine data from one or more files to create a virtual
table.
Before you begin
To access data that is stored in CSV, TSV, and Excel files, you must install a remote connector
on the data source where the files are located. A remote connector enables Data Virtualization to
automatically access files located in a remote data source. For more information, see Installing connectors on remote data
sources.
Data Virtualization auto-detects a table in an Excel sheet by scanning each row
until it reaches a non-empty
cell. The final column is determined
from the first non-empty cell in the first row. The final row is the first one having empty cells in
all columns.
The Excel source wrapper in Data Virtualization was upgraded in Cloud Pak for Data 4.0.2 to
allow access to spreadsheets of unlimited size. In previous releases, Data Virtualization might
fail to access spreadsheets that contain numerous data, for example greater than 250,000 cells, or
have a large file size, for example greater than 3 MB, which easily happens when a workbook contains
macros. If you encounter any issues (for example, unexpected types or values), you can toggle
between new and old source wrappers with the following SQL API calls.
- To switch to the previous Excel source wrapper named GExcel, use the following
command:
CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.GExcel', '', ?, ?)
- To switch to the current Excel source wrapper named QExcel, use the following command:
CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.QExcel', '', ?, ?)
For more information, see Queries on virtualized flat files fail with incorrect results in Data Virtualization.
Procedure
- On the navigation menu, click .
The service menu opens to the Data
sources page by default.
- On the service menu, click
and then click the Files tab.
The list of available data sources appears. You can narrow down the displayed assets by
using the available filters.
- Select the remote data source where you want to browse for files.
The list of files and directories on your data source appears. You can enter a particular path
on the host system in the
File path field or click the files and directories
on the list to go through the host system.
To edit the remote data source, click Edit
connection in the action menu of the remote data source.
- Select the file that you want to virtualize and click Add to cart.
- To use the first row of your file as column headings in your virtual table, enable the
Use first row as column heading option. If this option is not enabled, you
are prompted to specify column names for your virtual file before you add it to your cart.
- If your file has multiple worksheets, you can select a worksheet to view additional
information and add it to your cart.
-
Click View cart to preview your file data selections as a virtual file.
From this window, you can edit the virtual table name to be created from file data.
Additionally, you can edit schema names, remove a selection from your cart, or edit the list of
remote files that participate in a merged table.
- Select the appropriate option to assign the virtual table to be created from file
data.
| Assign to |
When to use this option |
| Data request |
Select Data request if you created the virtual table in response to a
data request. Then, choose the appropriate request. The table also appears in Virtualized
data. |
| Project |
Select Project if you created the virtual table to use in a specific
project. Then, choose the appropriate project. The table also appears in Virtualized
data. |
| Virtualized data |
Select Virtualized data if the table was not created in response to a data request or to use in a specific project. This setting is
the default if no data requests or projects exist. |
- Select Publish to catalog if you also
want to publish to a selected catalog.
A list of available catalogs is shown in the
drop-down menu. Each catalog is tagged as
Governed or
Not
governed.
Note: You must have at least one catalog in
Watson Knowledge
Catalog.
You must have permission to publish to a
catalog. An administrator can enable whether all virtual objects are published to a selected
governed catalog, which prevents a user from publishing to a specified
catalog.
- Specify a schema in the Schema
field.
You can also create a schema by following these
steps.
- If you have the Data Virtualization
Engineer or User role, leave the
Schema field as default to create a schema with your user ID.
- If you have the Data Virtualization
Admin role, leave the
Schema field as default to create a schema with your user ID or enter the new
schema name in the Schema field.
-
Click Virtualize to complete the process.
When the status window appears, you can select to view your virtualized data
or virtualize more data. However, you must wait until virtualization is complete
before you navigate away from the page.
Results
If Data Virtualization and Watson Knowledge
Catalog are installed in the same OpenShift® project (namespace), your virtual object is
published to the primary catalog.