You can configure multiple layers of security in Watson Explorer Content Analytics to protect sources from
unauthorized searching and restrict administrative functions to
specific users.
Users can query a wide range of data sources. To ensure that only
users who are authorized to query content do so, and to ensure
that only authorized users are able to access the
Watson Explorer Content Analytics administration console,
the system coordinates and enforces security at several levels.
- Web application server
- The first level of security is the web application server, either
through the embedded web application server or through WebSphere® Application
Server global security settings.
You can configure the system to use an LDAP registry and allow only
registered users to log in to applications or the administration
console. You can also configure the system to use an LTPA key
file to provide single sign-on (SSO) authentication support to application
users.
When you set up security controls, different
procedures are required if your applications are supported
through the embedded web application server or WebSphere Application
Server. If you use the embedded
application server, you can use the administration console
to configure support for SSO authentication and configure support
for communication through secure transport protocols.
- System-level security
- At the system level, you can assign users to administrative roles
and authenticate users who administer the system. When a user
logs in to the administration console, only the functions and
collections that the user is authorized to administer are available
to that user. You can also assign privileges to users and groups
to control application functions. For example, you can limit
the ability to export documents from an application to specific users.
You can also configure credentials that enable crawlers
to access the data sources that you include in collections.
Other system components also need these credentials. For example,
to verify that users are authorized to see documents in the
search results, the search servers can use the credentials
to connect to a data source and check the current access control
lists.
- Collection-level security
- When you create a collection, you can enable security at the collection
level. You cannot change this setting after the collection is
created. If you do not enable collection-level security, you
cannot later specify document-level security controls.
When collection-level
security is enabled:
- The global analysis processes apply different rules for indexing
duplicate documents.
- You can configure options to enforce document-level security.
- You can enforce security by mapping applications (not individual
users) to the collections that they can access. You then
use standard access control mechanisms to permit or deny
users access to applications.
- You can configure the system to use the identity management component,
which enables application users to be authenticated without
configuring an application profile.
There is a trade-off between enabling collection security
and search quality. Enabling collection security reduces the
information that is indexed for each document. A side effect
is that fewer results are found for some queries.
- Document-level security
- When you configure crawlers for a collection, you can enable document-level
security. For example, you can specify options to associate
security tokens with data as the data is collected by crawlers.
Your applications can use these tokens, which are stored with documents
in the index, to pre-filter the results and ensure that only
users with the correct credentials are able to query the data
and view documents.
For certain types of data sources, you
can configure options to validate a user's login credentials with
current access controls during query processing. This extra
layer of post-filtering security ensures that a user's privileges
are validated in real time with the data source. This capability can
protect against instances in which a user's credentials change
after a document and its security tokens are indexed.
The
anchor text processing phase of global analysis normally associates
text that appears in one document (the source document) with another
document (the target document) in which that text does not
necessarily appear. When you configure a Web crawler, you can
specify whether you want to exclude the anchor text from the
index if the link connects to a document that the Web crawler is
not allowed to crawl.
- Encryption
- To protect sensitive data, encryption is used to encode the authentication
data portion of all messages that are transmitted through the
system. The password for the default Watson Explorer Content Analytics administrator is stored
in an encrypted format. Passwords that users specify in user
profiles and passwords that are stored by the system (in configuration
files, the internal databases, and so on) are also encrypted. Encryption
incurs little overhead because only the authentication IDs and
passwords are encrypted.
Security for your collections extends beyond the authentication
and access control mechanisms that the system can use to protect
indexed content. Safeguards also exist to prevent a malicious and
unauthorized user from gaining access to data while it is in transit.
For example, the search servers use protocols such as Transport
Layer Security (TLS), Secure Sockets Layer (SSL), Secure Shell
(SSH), and Secure Hypertext Transfer Protocol (HTTPS) to communicate
with the master server and your applications.
For increased security, you need to ensure that the server hardware
is appropriately isolated and secure from unauthorized intrusion.
By installing a firewall, you can protect the servers from intrusion
through another part of your network. Also, ensure that there are
no open ports on the servers. Configure the system so that it listens
for requests only on ports that are explicitly assigned to Watson Explorer Content Analytics activities and applications.