Installing IBM Document Processing Extension

You can install Document Processing Extension.

Before you begin

Make sure that you have the necessary infrastructure and software before you install Document Processing Extension.

Hardware
  • Supports x86_64 CPU architecture. (Other architectures are not supported, such as ARM, Power/ppc64, zLinux/s390x.)
  • Minimum of 4 CPU cores and 16 GB memory. 8 or more CPU and 16 GB or more memory are recommended for production.
  • Minimum 8 CPU cores and 32 GB or more memory are recommended if you plan to install the optional feature Optical Character Recognition Engine 2.
Note: Install Document Processing Extension on dedicated systems. If you share the systems with other resource-intensive applications, the performance and stability can be unpredictable as the applications compete for resources.
Software
  • Linux operating system. Ubuntu 20.04 LTS and 22.04 LTS are recommended. You can use other Linux distributions that Docker supports.
  • Docker 20.10 or later, with swarm mode enabled.
  • Python 3.8 or later.
  • OpenSSL 3.0 or later.
  • A Linux user that has permission to run docker commands (that is, the user is added to the docker group).
    Note: MacOS and Windows are not officially supported.
Networking
  • Db2 and PostgreSQL SSL connections are not supported.
  • Only IPv4 networking on the host machine is supported. IPv6 and dual stack are not supported.
Remote database server
If you plan to use User-provided remote Db2 server or User-provided remote PostgreSQL server with Manual DB management option, you must create the base database before you install the Document Processing Extension stack. For more information, see Creating base database on a remote Db2 server or Creating base database on a PostgreSQL server.
Note: You can ignore this prerequisite if you plan to use Built-in PostgreSQL container or the User-provided remote PostgreSQL server with Automated DB management option
FIPS compliance

FIPS is not supported. Document Processing Extension cannot run on a FIPS enabled host machine. If you run Document Processing Extension on FIPS enabled environment, you see TLSV1_ALERT_INSUFFICIENT_SECURITY in all celery pods and they cannot connect to RabbitMQ.

dpedeploy tool
The dpedeploy tool does not support Database HA/DR configuration.
Note: It is possible to implement HADR but it needs many manual configurations
Deep learning object detection
The deep-learning object detection feature is not supported.

Procedure

  1. Install Docker Engine. For more information, see Install Docker Engine. Linux distribution software have their own installation channels (such as installing docker by using snap in Ubuntu). Those channels are out-of-date or have restrictions on the Docker installation.
  2. Make sure that Docker swarm mode is enabled. If not already, run docker swarm init to enable the swarm mode.
    • If your system has multiple network interfaces, you can select the advertise address in Docker for communicating with other nodes. Select the advertise address that can communicate with other nodes (if you have any) faster. It's usually the one with an internal IP address.
    • If you are not creating your own certificate, add --cert-expiry to set the Docker created certificate's expiration date. The maximum value of 99999h sets the expiration date to about 11 years in the future.
      docker swarm init --cert-expiry 99999h --advertise-addr 1.2.3.4
  3. Download and extract the dpedeploy tool and copy the database scripts to your database server. For more information, see Downloading and extracting the dpedeploy tool.
  4. You can now run the tool by running the executable file ./dpedeploy. The tool does prerequisites check at starting up.
    You must be entitled to download Document Processing Extension and will be asked for the user and password associated with your entitlement.

What to do next

You need to configure Document Processing Extension, see Configuring IBM Document Processing Extension.