Troubleshooting
Problem
This technote describes some symptoms that will occur when the workload on an IBM Rational ClearCase UNIX server exceeds the capacity of the server, and some measures that can be taken to relieve them.
Symptom
Intermittently, and particularly during peak usage periods of the day, users of both ClearCase Remote Client (CCRC) and full ClearCase client (ClearCase Explorer, for example) experience "hangs" which usually will eventually end with a pop-up message "timed out trying to communicate with ClearCase remote server".
Checking the ClearCase logs on the server shows messages like the following examples:
>cleartool getlog -around now 10 albd db vobrpc ccfs
=============================================================================
Log Name: albd Hostname: vobhost Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:18:04+08 albd_server(1319118): Error: Server vobrpc_server (pid=2809890) on "/vob_store/VOBs/aaa.vbs" died on startup; marking it as "down".
=============================================================================
Log Name: db Hostname: vobhost Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:11:35+08 db_server(880886): Error: albd_rgy_findbyuuid_entry call failed: RPC: Timed out
2011-10-09T10:11:35+08 db_server(880886): Error: Trouble contacting registry on host "vobhost": timed out trying to communicate with ClearCase remote server.
2011-10-09T10:11:35+08 db_server(880886): Error: Error searching for replica e49551ac.d21b11dc.a041.00:02:c3:0d:60:4c in registry: error detected by ClearCase subsystem
2011-10-09T10:11:46+08 db_server(1642540): Error: albd_server_idle call failed: RPC: Timed out
2011-10-09T10:11:46+08 db_server(1642540): Error: Error sending idle message to albd server: timed out trying to communicate with ClearCase remote server
=============================================================================
Log Name: vobrpc Hostname: vobhost Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:13:09+08 vobrpc_server(1868278): Error: albd_sched_info call failed: RPC: Timed out
2011-10-09T10:14:19+08 vobrpc_server(2650278): Error: albd_sched_info call failed: RPC: Timed out
2011-10-09T10:14:41+08 vobrpc_server(1167414): Error: albd_server_busy call failed: RPC: Timed out
2011-10-09T10:15:17+08 vobrpc_server(3293642): Error: albd_sched_info call failed: RPC: Timed out
2011-10-09T10:15:26+08 vobrpc_server(1167414): Error: Unable to contact albd_server on host 'vobhost'
2011-10-09T10:15:26+08 vobrpc_server(1167414): Error: Operation "rgy_findbyuuid_entry" failed: timed out trying to communicate with ClearCase remote server.
2011-10-09T10:15:26+08 vobrpc_server(1167414): Error: Unable to get VOB object registry information for replica uuid "9d2d6700.862011dd.a055.00:02:c3:0d:60:4c" (vobhost:/vob_store/VOBs/aaa.vbs): error detected by ClearCase subsystem
=============================================================================
Log Name: ccfs Hostname: vobhost Date: 2011-10-09T10:23:16+08:00
Selection: Lines between 2011-10-09T10:08:16+08:00 and 2011-10-09T10:38:16+08:00 displayed
-----------------------------------------------------------------------------
2011-10-09T10:21:16+08 albd_server(1319118): Error: ccfs_server(1236996): Error: Unable to contact albd_server on host 'vobhost'
2011-10-09T10:21:16+08 albd_server(1319118): Error: ccfs_server(1236996): Error: Operation "rgy_findbyuuid_entry" failed: timed out trying to communicate with ClearCase remote server.
2011-10-09T10:21:16+08 albd_server(1319118): Error: ccfs_server(1236996): Error: Unable to get VOB tag registry information for replica uuid "faab7d68.bae411df.8043.00:02:c3:0d:60:4c": timed out trying to communicate with ClearCase remote server
Cause
The problem can caused by UDP buffer overrun as a result of too many ClearCase Roles for one machine and or a large inundation of UDP packets simultaneously or in a small time interval wherein the machine cannot handle such a load.
In general, ClearCase scales best horizontally across multiple machines instead of vertically on a machine with massive resources. UDP packet communication of the albd server (registry server), VOB server, VIEW server, and credmap server. If the machine itself is not tuned appropriately or does not have enough resources to accept the scale of UDP packets that are delivered to it in enough time for it to be processed in the machine's UDP receive buffer, the UDP packet will be dropped.
Environment
VOB server, View Server, Registry server, License Server are configured in a single server.
Resolving The Problem
To relieve the issue, you can execute any of the following options:
Option 1: Increase UDP buffer on the receiving problem server machine
Solaris 9 example:
ndd -set /dev/udp udp_max_buf 8388608
ndd -set /dev/udp udp_xmit_hiwat 65535
ndd -set /dev/udp udp_recv_hiwat 65535
Solaris 10 example:
ndd -set /dev/udp udp_max_buf 8388608
ndd -set /dev/udp udp_xmit_hiwat 8388608
ndd -set /dev/udp udp_recv_hiwat 8388608
Solaris 11 example:
# ipadm set-prop -p max_buf=8388608 udp
# ipadm set-prop -p send_buf=8388608 udp
# ipadm set-prop -p recv_buf=8388608 udp
AIX example:
no -p -o udp_recvspace=655360
no -p -o sb_max=1310720
Option 2: Split ClearCase roles across multiple machines
- Use a separate dedicated, licence and or registry server
- Use a separate dedicated, VOB and or View server
Was this topic helpful?
Document Information
Modified date:
16 June 2018
UID
swg21577649