Importing File connector metadata

Before you use the File connector to read or write data, you can use the File connector in InfoSphere® Metadata Asset Manager to import metadata about files and folders. You can import metadata from Hadoop Distributed File System (HDFS) or from files on an engine tier computer.

Prerequisites

  • If you do not have metadata about files and folders, specify column metadata and metadata about how a file is formatted by using one of the metadata formatting options. You can import metadata that is specified in one of the following ways:
    • As the first row of the file.
    • In an .osh schema, a file that is in the same folder and is named file.osh or folder.osh, where file is the name of a file in the folder and folder is the name of the folder. For example, if fileA.txt is in the sample directory, metadata can be specified in the fileA.txt.osh or sample.osh files.
  • If you use Kerberos or SSL encryption to access HDFS, see Defining a connection.
  • To import from the engine tier, the import file must be on an engine tier computer that is a metadata interchange server.

Context

The File connector imports and stores the following types of metadata in the metadata repository of InfoSphere Information Server:
  • Data file folders
  • Data files
  • Data file structures
  • Data file fields
  • Data file definitions
  • Data file definition structures
  • Data file definition fields

You can choose to import just the data file folders and data files, without importing data file definitions and data file structures. When you select the parameter Import file structure, information from .osh schema files about the structure of the folders and files is imported, along with header information in the files that defines data file structures. If you do not select Import file structure, only data file folders and data files are imported.

You can use the data file structures to create table definitions in InfoSphere DataStage®. You can also create InfoSphere Data Click activities that move data files and their contents to the Hadoop directories. You can browse the assets and view data lineage in InfoSphere Information Governance Catalog.

For detailed instructions on importing metadata, follow the instructions in Importing metadata by using InfoSphere Metadata Asset Manager. Use the information in this topic for reference as you do the procedure.

Import from HDFS

To import metadata from HDFS, you select File Connector - HDFS when you chose your connector in InfoSphere Metadata Asset Manager.

During the import, you select or create a data connection to the Hadoop system that contains the directories that you want to import. In a single import, you can import multiple HDFS directories to create multiple data file folders in the metadata repository.

The connector stores the imported metadata for each directory, including the full name and path of the directory, as a data file folder asset in the metadata repository.

Import from the Engine tier

To import metadata from files on the engine tier, you select File Connector - Engine Tier when you chose your connector in InfoSphere Metadata Asset Manager.

You specify the engine tier computer by selecting a metadata interchange server in the New Import Area window.

During the import, you select or create a data connection to the engine tier computer. Your credentials for InfoSphere Metadata Asset Manager are automatically used to connect to the engine tier computer and are not saved with the data connection.