You can always back up your Watson™ Explorer Engine
installation by adding its installation directory to the list of standard directories that you back
up.
The default location of the Watson Explorer Engine
installation directory is /opt/ibm/WEX/Engine on Linux systems, and is
C:\Program Files\ibm\WEX\Engine on Microsoft Windows systems, though the
installation directory can be changed when installing the software.
Warning: You must not back up the Watson Explorer Engine installation directory while any index updates are taking
place. This is primarily a scheduling issue - make sure that index updates are only scheduled to
occur after backups have completed. If indices are being updated while they are being backed up,
they may not be usable after being restored. Similarly, if your backup application locks files
before backing them up, Watson Explorer Engine might terminate
abnormally because it cannot write to the files and directories that it requires.
At a minimum, you will want to add the files data/users.xml and
data/repository.xml to your daily backups. The
users.xml contains the user accounts for your Watson Explorer Engine installation. The repository.xml file
contains the configuration code for all of your search applications.
Note: If you are developing collaborative search applications and have enabled
annotation backups as described in the
Express Tagging section of the
Watson Explorer Engine User Manual, you will also want to add the
directory where these automatic backups are stored to your system backups. The backups for search
collection annotations are stored in the directory
data/tag-backup, relative to
the directory where
Watson Explorer Engine is installed on your system.
The backup file for search collection annotations is updated each time any annotation in that
collection is added or modified, so you might want to back up this directory more frequently than
you perform standard system backups. This is easily done by creating a shell script or batch file
that runs at scheduled intervals and updates remote copies of the files in this directory if they
have been modified. For information about restoring annotations from a backup file, see
Restoring Annotations from a Backup File.
Beyond these critical files, deciding what to back up depends on how you are using Watson Explorer Engine and the types of search applications that you are
creating. For example, if your Watson Explorer Engine applications
are primarily meta-search applications and therefore do not themselves crawl or index other
sources of online information, you may not want to back up anything beyond the files mentioned
earlier in this section.
However, if you are crawling data repositories to create your own search collections, all of
the indices, log files, and other data from all sources that you are crawling will also be
stored under your Watson Explorer Engine installation directory by
default. The crawled and indexed data for a Watson Explorer Engine
search collection will be stored in the directory
data/search-collections/XYZ/name,
where name is the name of the search collection, and XYZ
is the first three bytes of an internal hash that was calculated from that name. Storing
search collections in distinct directories helps prevent directory size and performance
issues.
Important: Before Velocity 7.5, all search collection data was stored by
default in the data/collections directory of an installation. Any search data
for search collections that you created before Velocity 7.5 in an installation that you
have upgraded to Velocity 7.5 will still be located in the
data/collections directory of the installation. This directory should therefore
also be backed up if you want to back up those collections, taking into account the same
considerations discussed in this section for backing up any search collection.
Note: Search collection data location is configurable on a per-collection basis in
the Directories section of the tab for that search collection in the Watson Explorer Engine administration tool. If you manually configure the
location where the data for any search collections is stored, you also must back up those
directories if you are backing up your search collection data.
The files that make up a search collection can be very large, depending on the amount and
type of data that you are crawling. You must make the traditional system administrator's decision
regarding the trade-offs between increased backup time and backup storage requirements compared to
the time it would take for Watson Explorer Engine to re-crawl the data
and re-create the indexes (or update them if you have restored them from backups).
Note: If you are using Watson Explorer Engine to crawl
and index data, the index and log files that it creates and uses are large, binary files. Therefore,
it is not really possible to make incremental backups of these files with any granularity less than
the frequency with which they are updated by new crawls or index updates. Most backup software uses
file creation and modification times to determine which files have changed and thus need to be
backed up. The timestamps of Watson Explorer Engine log and index files
will be updated each time that index data is actually updated in response to a recrawl or refresh
request. (Refresh requests will only actually update an index if data has changed on the resource
that is being indexed.) Some backup software supports saving only the changes between two versions
of a file, but binary files such as these will usually appear to be 100% different than previous
versions of these files.
In general, if re-indexing your search collections does not take very long (or takes less
time than it takes to restore selected files from backups), you may not want to back up
anything other than the files mentioned earlier in this section.
If re-indexing your search collections takes a significant amount of time but is done
infrequently, you will only need to back up your Watson Explorer Engine installation directory after each time that you
re-index your search collections. This meshes nicely with standard incremental backup
mechanisms, which only back up files that have changed since a given date or since backups
were last run.
If re-indexing your search collections takes a significant amount of time but is done daily,
you must determine, on average, how long it takes to re-crawl your information sources and update an
existing index. If daily updates take a relatively small amount of time, you might want to back up
your Watson Explorer Engine installation directory weekly and, whenever
necessary, update them after they have been restored. If the time that it takes to restore indices
from backups and update them approaches the amount of time it takes to create them, you may not want
to back them up at all.
IBM strongly recommends installing your Watson Explorer Engine
software on RAID storage so that the failure of a single disk will not cause your Watson Explorer Engine applications to fail. If your applications require high
availability, you can take advantage of Watson Explorer Engine features
such as Distributed Indexing to help protect against system failures. You might also
want to consider collocation to protect against local or regional power or network failures.
Tip: For general information about hardware/software requirements and minimum system
configurations, see
Requirements.