Troubleshooting
Problem
This document describes common configuration settings that is investigated when a performance issue is suspected during FileNet Image Services tuning.
Resolving The Problem
This performance tuning document describes the most common items to check regarding the current configuration:
- Performance Reports
- The FileNet Image Services performance reports are created at any time after the software was started by running the command: perf_report -a. The performance reports are in the /fnsw/local/logs/perf directory on UNIX servers and the DRIVE:\fnsw_loc\logs\perf directory on Windows servers.
By default, the perf_mon process that gathers performance data, stores performance data every 15 minutes. If a performance issue is suspected, IBM suggests changing the interval from 15 minutes to 5 minutes. The perf_mon.script file is located in 1 or 2 locations. Both files are modified if they exist.
UNIX: /fnsw/lib/perf/perf_mon.script
/fnsw/local/sd/perf_mon.script
Windows: DRIVE:\fnsw\lib\perf_mon.script
DRIVE:\fnsw_loc\sd\perf_mon.script
Change:
From:
schedule 0 0:00:00 2:00:00
schedule 0 6:00:00 0:15:00
schedule 0 19:00:00 2:00:00
To:
schedule 0 0:00:00 2:00:00
schedule 0 6:00:00 0:05:00
schedule 0 19:00:00 2:00:00
- Verify Shared Memory Segment Size
- Set Document & Directory Buffers to Maximum Values
- The total number of CSMs, DOCs, and PSMs request handler processes configured in the /fnsw/etc/serverConfig file. If the /fnsw/etc/serverConfig.custom file exists, use the total number from this file instead.
- The number of optical and MSAR drives configured in the Storage Library tab in fn_edit. All MSAR libraries have 12 drives each. The number of optical drives can vary depending upon the model of the optical library.
- The total number of FileNet Image Services committal processes is in the /fnsw/local/sd/as_conf.g file. If the number of processes for a specific committal process is not shown, use 1, which is the default.
- Turn off MKF Verify Disk Writes
- Separate MKF databases and cache to different disk drives
- Separate MKF database RL partitions from their databases
- Set the MKF buffers to get as close as possible to a 100% cache hit ratio
- Tune cache for best performance
- Configure the locked threshold for BES
- The Minimum Allocation for BES cache is 20% and the total cache size is 1 GB
or 200 MB - The largest batch size is five 1000 page documents
- The size of each page 50 KB
- The largest batch size would be 25 MB (5 x 1000 x 50 KB)
- The free space that is required would be 27.5 MB (25 MB + 10%)
- Turn on Fast Batch Committal and Fast Batch Breakup
- Set the number of Integral SDS_worker processes
- Configure the TCP and Ephemeral Ports Network Settings
- Edit the perf_mon.script file or files. Then, restart FileNet Image Services to put the change into effect.
The preferred shared memory segment size is different for each hardware platform. It is important to configure the correct shared memory segment size to avoid running out of the resources that can cause critical errors. See the IBM technote below for detailed information about troubleshooting shared memory issues and setting the correct shared memory segment size.
- Generally, the number and sizes of the document and directory buffers is set to the maximum size and number in the fn_edit -> Performance Tuning -> Server memory tab.
The maximum sizes and counts are:
Document Buffer Count: 256 Directory Buffer Count: 256
Document Buffer Size: 1024 Directory Buffer Size: 256
Document Buffer Count - Each process that accesses cache has a document buffer associated with it. These processes include all of the dtp processes, the number of CSMs processes configured and the committal processes (bes_commit, fbc_commit. rmt_commit, etc.). If a FileNet Image Services server does not have enough document buffers configured, overall system performance is affected because the processes that access cache must wait for a buffer to become available.
Document Buffer Size - A document buffer is used to transfer an object (document page) to and from cache and an optical/MSAR surface. By setting the document buffer size to the maximum size, it requires fewer transfers, which result in better performance because less time is required for the transfer to complete.
To determine the number of processes currently configured that use document buffers, obtain the total of the following three items:
- processes {
notify ds_notify 2
scheduler dsched
dtp dtp
dtp_tran dtp_tran 1
rmt_commit rmt_commit 2
fbc_commit fbc_commit
del_commit del_commit
osi_migrate osi_migrate
}
The dbp -s command is run to monitor the buffer configuration when the FileNet Image Services software is running at peak load or when there is a performance issue. See that the FileNet Image Services System Tools manual got information about interpreting the dbp output.
- This feature is configured in the fn_edit -> MKF databases tab. This feature is only enabled if the system administrator suspects that there is some type of hardware or network error on the device where the MKF databases reside. When Verify Disk Writes is enabled, the FileNet Image Services software reads back and verifies everything it Writes. This verification process causes two transactions to occur for every write operation. Verify Disk Writes slows down performance. By turning it off, only the write transaction occurs. The result is the server performance improves.
- Each of the MKF databases (Permanent, Security, and Transient) and Cache is on different physical disk drives to prevent disk contention with the disk read/write heads on very active systems. Having cache and the transient database on the same physical disk drive would slow down performance when objects are moving in and out of cache. Both the transient database and cache are updated at the same time as documents are processed. The same thing holds true between the transient and permanent databases.
Each of the three MKF databases has one or more database data sets (permanent_db0. permanent_db1, etc.) and redo logs (permanent_rl0, permanent_rl1). The database data sets are where data is stored and the redo log is used to maintain log files as sequential records of database changes. These logs are flushed as information is written to the actual database.
The MKF buffers can be increased in the fn_edit -> Performance Tuning -> Server Memory tab. By increasing the size of the buffers, more of the database is held in memory and avoids having to read out of the database on the physical disk drive. The cache hit ratio for each of the databases is obtained by creating the performance reports and looking at the MKF I/O reports.
- cmb1.permdb_io.Apr15.txt
cmb1.secdb_io.Apr15.txt
cmb1.transdb_io.Apr15.txt
Tuning the MKF buffers is done gradually. MKF buffers are stored in shared memory. If the buffers are made too large, performance problems might result.
In the fn_edit -> System Application Services -> Cache tab, the system administrator can allocate the minimum percentage of cache to devote to the main types of cache.
The minimum percentages for all four caches should add up to 100%.
Allocate most of the cache percentage to page cache so more documents are kept ageable in cache before they are aged out by the CSM_daemon process. Performance improves by preventing the documents in cache from having to be retrieved from an optical surface, MSAR surface, or Integral SDS device.
The current cache hit ratio can be obtained by looking at the performance reports (Client Page Request Report)
Configure the locked threshold for BES cache to leave sufficient free space to hold the largest batch that might be created plus 10%.
For example, for an environment where:
If a server uses Fast Batch Committal, Fast Batch Breakup is enabled in the fn_edit -> System Application Services -> Other tab. Typically, after a Fast Batch Object is migrated, the documents that are contained in Fast Batch Object do not reside in cache. By turning on Fast Batch Breakup, the documents remain in page cache after they are migrated to the storage device. Most users access new documents immediately. By allowing the Fast Batch documents to remain in cache, they do not have to be migrated back into cache if they are immediately requested by a user.
Applications that use fast Batch Committal are COLD, HPII, and Capture. A custom application can also be written to use Fast batch Committal.
The current cache hit ratio can be obtained by looking at the performance reports (Client Page Request Report).
If Integral SDS performance issues are suspected, there are several things that can be used to investigate problems.
1. DOC_tool - All ISDS statistics are kept in memory. When the Image Services software is recycled, all of the information since the software was started is lost. There are no performance reports that track ISDS committals or retrievals. DOC_tool has an “SDS” option that can be used to display the current statistics that are kept in memory. This command is used infrequently in a production environment because it causes all of the SDS_worker processes to pause while the performance information is collected and displayed.
Here is an example of the DOC_tool SDS output for an SDS Unit. It shows a high AVG requests queue wait time, which indicates the number of SDS_worker processes are increased.
The AVG requests queue wait time shows how long the requests are in the queue before they are processed by SDS_worker. Requests are put in the queue by SDS migration background job.
- DOC_tool
DOC_tool> SDS
Summary information, Detailed, Worker information, All information, Find object, or list?
('s', 'd', 'w', 'a', 'f', 'l'): d
The current time is Fri Feb 5 10:25:31 2010
SDS info: ALL option
All SDS units mode (y/n) [y]: : n
SDS unit ID: 2
- ****** SDS unit = cen_kirin (2)
SYSTEM state = SYSTEM ENABLED (0x0)
USER state = USER ENABLED (0x0)
Worker = 'SDS_worker' Number Instances = 4
info = 'Centera2.usca.ibm.com?/fnsw/local/sd/1/QAImport.pri'
SDS priority = high
DEBUG Setting = MAX
dynamic repository lib = 'SDSw_centera'
retention default offset (1 days)
SDS content delete setting=YES
SDS supports: EBR=YES HOLDS=YES Retention Extension=YES
Total Accumulated counters from all workers(4):
** Configured workers =4 active workers=2
TOTAL WORKER COUNTERS (sds_id=2):
Read Requests processed: 0
Write Requests processed: 20
Copy Requests processed: 0
Errors: 0
Requests processed = 20
Successful requests processed = 20
Errors = 0
AVERAGE ACCUMULATED ELAPSE TIMES:
Up time: 274.098067 secs/workers (4.568301 mins)
Idle time: 263.780673 secs/workers (4.396345 mins) (96.24%)
Total processing time: 7.472040 secs/workers
(0.373602 secs/reqs)
(0.373602 secs/image page)
(0.013004 secs/KB)
AVG requests queue wait time: 3.466658 secs/reqs
*****Total READ REQUEST PERFORMANCE (sds_id=2)
Total retrieval requests = 0
Images retrieved from SDS = 0
Data retrieved = 0.000000MB
Number of read requests where the whole blob fits
into the internal image_buffer (1024K): 0
Number of read requests where the whole blob does not fits
into the internal image_buffer (1024 K): 0
Cache hits: 0
Number of redirection: 0
Number of redirection errors: 0
Total Time to process read requests: 0.000000 secs (0.000000 mins)
*****Total WRITE/COPY REQUEST PERFORMANCE (sds_id=2)
Total write requests = 20
Total copy requests = 0
Documents written = 20 (FBC=0, MSAR reads=0, Cache=20, Copy=0)
Images written = 20
Data written = 0.561123 MB
AVG Image Size = 28.729492 K
Cache hits in copy: 0
Total Time to process write requests: 14.943911 secs (0.249065 mins)
Total Time to process copy requests: 0.000000 secs (0.000000 mins)
Time in SDS device create and write object: 0.423189 secs (0.007053 mins)
(0.021159 secs/reqs)
(0.021159 secs/image page)
(0.000737 secs/KB)
Time in SDS device write only: 0.053464 secs (0.000891 mins)
(0.002673 secs/reqs)
(0.002673 secs/image page)
(0.000093 secs/KB)
Time in cache(CSM) to process write/copy: 0.025077 secs (0.000418 mins)
(0.001254 secs/reqs)
(0.001254 secs/page)
(0.000044 secs/KB)
Time in MSAR read to process write/copy: 0.000000 secs (0.000000 mins)
2. Number of SDS_worker processes – The default number of SDS_worker processes is 3 when the SDS unit is configured in fn_edit. Each of the three default worker processes takes on a different function.
SDS_worker # 1 - The first worker process performs only reads.
SDS_worker # 2 - The second worker process performs only copies.
SDS_worker # 3 - The third worker process performs only copies.
If the number of worker processes is not increased from the default of three, performance can be degraded. Only one process performs all of the ISDS writes and one process performs all of the read operations.
When more than three SDS_worker processes are created, SDS_worker #4 and higher, perform all three operations (read, copy, and write). The maximum number of SDS_worker processes that can be configured is 99 for each SDS unit. Setting the number of SDS_worker processes too high can also have an adverse effect. When excessive SDS_worker processes are configured, they can use resources and can never be accessed.
The number of initial SDS_worker processes is set to around 20 and then monitored by using the DOC_tool SDS option. The SDS_worker processes can be monitored to see whether they are active or idle (too many configured).
The DOC_tool SDS statistic, AVG requests queue wait time, can be monitored for the SDS unit to determine whether the number of SDS_worker processes needs to be increased for the SDS Unit.
It is important to set the TCP and ephemeral port settings to avoid ports being in a time wait state or ports being blocked. When the ephemeral port settings are not correct, 15,16,17 errors might be written on the error log. This type of error can cause performance and network connection issues. The preferred settings for each hardware platform are provided below.
AIX
- Preferred settings:
- tcp_keepidle 80
tcp_keepintvl 20
tcp_ephemeral_high 65535
tcp_ephemeral_low 42767
udp_ephemeral_high 65535
udp_ephemeral_low 42767
- Verify current settings:
- The no -a command can be used to verify the current settings.
Resolution:
- /usr/sbin/ no -p -o tcp_keepidle=80
/usr/sbin/ no -p -o tcp_keepintvl=20
/usr/sbin/ no -p -o tcp_ephemeral_high=65535
/usr/sbin/ no -p -o tcp_ephemeral_low=42767
/usr/sbin/ no -p -o udp_ephemeral_high=65535
/usr/sbin/ no -p -o udp_ephemeral_low=42767
- Preferred settings:
- udp_smallest_anon_port 42767
udp udp_largest_anon_port 65535
tcp tcp_smallest_anon_port 42767
tcp tcp_largest_anon_port 65535
tcp tcp_time_wait_interval 30000
Verify current settings:
- ndd -get /dev/udp udp_smallest_anon_port
ndd -get /dev/udp udp_largest_anon_port
ndd -get /dev/tcp tcp_smallest_anon_port
ndd -get /dev/tcp tcp_largest_anon_port
ndd -get /dev/tcp tcp_time_wait_interval
Resolution:
- ndd -set /dev/udp udp_smallest_anon_port 42767
ndd -set /dev/udp udp_largest_anon_port 65535
ndd -set /dev/tcp tcp_smallest_anon_port 42767
ndd -set /dev/tcp tcp_largest_anon_port 65535
ndd -set /dev/tcp tcp_time_wait_interval 30000
Solaris
- Preferred settings:
- udp_smallest_anon_port 42767
udp udp_largest_anon_port 65535
tcp tcp_smallest_anon_port 42767
tcp tcp_largest_anon_port 65535
tcp tcp_close_wait_interval 30000 (for Solaris 2.x only)
tcp tcp_time_wait_interval 30000 (for Solaris 8 and above)
Verify settings:
- ndd -get /dev/udp udp_smallest_anon_port
ndd -get /dev/udp udp_largest_anon_port
ndd -get /dev/tcp tcp_smallest_anon_port
ndd -get /dev/tcp tcp_largest_anon_port
ndd -get /dev/tcp tcp_close_wait_interval (for Solaris 2.x only)
ndd -det /dev/tcp tcp_time_wait_interval (for Solaris 8 and above)
Resolution:
- Make a backup copy of the /etc/rc2.d/S69inet file before you modify it. If the file does not exist, create it.
- As root user, make sure that you have write permission on the file by entering:
- chmod 754 /etc/rc2.d/S69
- Use your preferred text editor (such as vi) to modify the /etc/rc2.d/S69inet file.
Add the following lines somewhere near the end of the file:
- ndd -set /dev/udp udp_smallest_anon_port 42767
ndd -set /dev/udp udp_largest_anon_port 65535
ndd -set /dev/tcp tcp_smallest_anon_port 42767
ndd -set /dev/tcp tcp_largest_anon_port 65535
ndd -set /dev/tcp tcp_close_wait_interval 30000 (for Solaris 2.x only)
ndd -set /dev/tcp tcp_time_wait_interval 30000 ndd -set /dev/udp
- Save your change and exit from the file.
Restart the server or servers.
- Use the Registry Editor (regedt32.exe) to make the modifications.
- MaxUserPort
- Description: Determines the highest port number TCP can assign when an application requests an available user port from the system. Typically, ephemeral ports (those ports that are used briefly) are allocated to port numbers 1024 - 5000.
Note: Windows does not add this entry to the registry. You can add it by editing the registry or by using a program that edits the registry.
- Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
- Data type: REG_DWORD
- Range: 5,000-65,534 (port number)
- Default Value: 5000
- Recommended value: 65534 (65534 DEC)
- TcpMaxConnectTransmissions
- Description: Determines how many times TCP retransmits an unanswered request for a new connection. TCP retransmits new connection requests until they are answered or until this value expires.
- Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
- Data type: REG_DWORD
- Recommended value: 5 (5 DEC)
- TcpMaxConnectRetransmissions
- Description: Determines how many times TCP retransmits an unanswered request for a new connection. TCP retransmits new connection requests until they are answered or until this value expires.
- Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
- Data type: REG_DWORD
- Range: 0–255 (retransmission attempts)
- Default Value: 2
- Recommended value: 5 (5 DEC)
- Description: Determines the time that must elapse before TCP/IP can release a closed connection and reuse its resources. This interval between closure and release is known as the TIME_WAIT state or twice the maximum segment lifetime (2MSL) state. During this time, reopening the connection to the client and server costs less than establishing a new connection. By reducing the value of this entry, TCP/IP can release closed connections faster and provide more resources for new connections. Adjust this parameter if the running application requires rapid release, the creation of new connections, or an adjustment because of a low throughput caused by multiple connections in the TIME_WAIT state.
Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Data type: REG_DWORD
Default Value: 0xF0 (240 DEC)
Recommended value: 0x1F (30 DEC)
- FN_COR_QLEN
- Description: The 15,16,17 error indicates that a process is not able to connect to the COR_Listen process due to the unavailability of COR queue space.
To resolve this issue, an environmental variable that is named FN_COR_QLEN is created.
The default COR queue length is 5. The environmental variable must be set for the user that starts the Image Services software and named FN_COR_LEN. The value set for the environmental variable should initially be 20 - 25.
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
swg21634425