IBM Support

Why do i receive a timeout message when querying VIOS resources from HMC ?

Question & Answer


Question

When trying to query virtual resources configuration on a managed system from HMC, it may happen that it takes
very long time before completing or it fails with a timeout error message.

Cause

Any time a query is performed from HMC, a call is made to all VIOS on the managed system to get details on the
configuration. On VIOS, the vio_daemon will proceed with this request by sending a query to the CMDB and
respond to HMC.
There are different possible issue which could lead in timeout or at least long delay for this query, and the
most common error message seen is :
 -> The system is currently too busy to complete the specified request.
    Please retry the operation at a later time. If the operation continues to fail, check the error log to see
    if the filesystem is full.

Answer

The error above let us think that the VIOS is currently suffering some performance issue. Indeed the VIOS has
to manage all the resource shared to client lpar (including disk access / IO, network communication...), but
it also has to deal with all resource management request from all connected HMC (and in some case Novalink,
PowerVC or other management product).

It is always a good idea to check if VIOS has enough resources (CPU / Memory) to handle its workload.
A good tool to do this is the "part command", for more information on this tool, please refer to the following web page :
  https://www.ibm.com/developerworks/community/blogs/cgaix/entry/the_new_vios_part_command?lang=en_us

Second part of the message also indicates the problem might be related to a full filesystem, which can be
easily checked with "df" command.

If the VIOS does not suffer from resource shortage, or full filesystem, the problem could be related to the
content of the DB itself. We have seen some case in the past where a "corrupted" database causes those long
delay.
Luckily it's quite safe and easy to rebuild the CMDB. This database is only used for resource management, and
if it is unavailable for few seconds, it should not affect other component in your environment.
A script is available to rebuild the CMDB, you can find it here :
   ftp://ftp.software.ibm.com/systems/virtualization/vio/ztools/CMDB/refresh_cmdb_adv.sh
Run it under oem_setup_env, it will stop vio_daemon, remove the old database, restart the vio_daemon and
rebuild the database from zero.

If the problem still exist, there is another issue possible. First of all we need to check the problem is
recreatable locally on VIOS.So we have to reproduce the problem solely on VIOS by manually recreating an
inquiry within OEM privileged environment.
First we need to create the XML query file :
# vi /home/padmin/query.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vio2.00" version="2.00">
   <Request action_str="QUERY_INVENTORY">
       <InventoryRequest inventoryType="base">
           <VioTypeFilter type="SEA"/>
           <VioTypeFilter type="VEA"/>
           <VioTypeFilter type="NIC"/>
           <VioTypeFilter type="LNAGG"/>
           <VioTypeFilter type="IPIF"/>
       </InventoryRequest>
   </Request>
</VIO>

To run the query and check how long it took to complete :
# cat query.xml | /usr/ios/sbin/vioservice lib/libvio/query

The response should look like :
<VIO version="2.00" xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vio2.00">
    <Response>
        <InventoryResponse viosId="8233-E8B021010E1P" sequence="1" inventoryType="base" eventLogOn="true">
            <NIC udid="13U78A0.001.DNWHP7R-P1-C4-T1" name="ent0" description="2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)" locationCode="U78A0.001.DNWHP7R-P1-C4-T1">
                <NIC_base macaddress="00215E89CD28" alt_addr="0x000000000000" jumbo_frames="no" media_speed="Auto_Negotiation" use_alt_addr="no">
                </NIC_base>
            </NIC>
<--- snip --->
        </InventoryResponse>
    </Response>
</VIO>
<VIO version="2.00" xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vio2.00">
    <Response>
        <InventoryResponse viosId="8233-E8B021010E1P" sequence="1" inventoryType="base" eventLogOn="true">
            <VEA udid="14U8233.E8B.1010E1P-V1-C2-T1" name="ent2" description="Virtual I/O Ethernet Adapter (l-lan)" locationCode="U8233.E8B.1010E1P-V1-C2-T1">
                <VEA_base macaddress="DEBD6DABBE02" alt_addr="0x000000000000" use_alt_addr="no">
                </VEA_base>
            </VEA>
<--- snip --->
        </InventoryResponse>
    </Response>
</VIO>
<VIO version="2.00" xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vio2.00">
    <Response>
        <InventoryResponse viosId="8233-E8B021010E1P" sequence="1" inventoryType="base" eventLogOn="true">
            <SEA udid="1039aad0a5ce33f6b5" name="ent5" description="Shared Ethernet Adapter" configuration_state="Configured">
                <SEA_base ha_mode="auto" jumbo_frames="no" pvid="1" queue_size="8192" thread="1" netaddr="0" pvid_adapter="14U8233.E8B.1010E1P-V1-C2-T1">
                    <southbound udid="14U8233.E8B.1010E1P-V1-C2-T1"/>
                    <southbound udid="13U78A0.001.DNWHP7R-P1-C4-T1"/>
                </SEA_base>
            </SEA>
        </InventoryResponse>
    </Response>
</VIO>
<VIO version="2.00" xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vio2.00">
    <Response>
        <InventoryResponse viosId="8233-E8B021010E1P" sequence="1" inventoryType="base" eventLogOn="true">
            <IPIF udid="1530407b36f698a3d" name="en0" description="Standard Ethernet Network Interface">
                <IPIF_base state="down">
                    <southbound udid="13U78A0.001.DNWHP7R-P1-C4-T1"/>
                </IPIF_base>
            </IPIF>
<--- snip --->
        </InventoryResponse>
    </Response>
</VIO>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<VIO xmlns="http://ausgsa.austin.ibm.com/projects/v/vios/schema/vio2.00" version="2.00">
   
<Response/>
</VIO>

(Note : The above output was formated for reading convenience, but the output on VIOS will stand on a single line.)

We would expect to have the answer to our query within a couple of seconds.

If it fails to answer, then this is a different issue, and you should open a case with IBM support to have
this issue further investigated.
If the answer looks correct but took a lot more than few seconds, you might be suffering a Name Resolution
issue.

When vioservice attempt to call the database listener, it tries to resolve the ip address from the VIOS, if
DNS is configured and there's any problem reaching the name server, the name resolution will last as long as
it reaches the timeout value.

To avoid such issue, we recommend that the name resolution for all IP address configured on VIOS are locally
resolved.
 - Add all IP address with a name in the "/etc/hosts" file
 - make sure the "hostname" of the VIOS is resolvable with "/etc/hosts" file.
 - change the file "/etc/netsvc.conf" to have this :
   hosts=local4,bind4
 - make sure the "/etc/resolv.conf" is configured with name server that can be reached.

Then check the name resolution is correct for all IP address in both order with :
# host <ip address>
# host <name from previous command>
--> Both previous output must be the same.

If the name resolution is correct, you can retry the query :
# cat query.xml | /usr/ios/sbin/vioservice lib/libvio/query
--> It should now reply almost instantly.

If it still takes very long to complete, then collect a snap from VIOS and open a case with IBM support for
further investigation.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Component":"VIOS","Platform":[{"code":"PF002","label":"AIX"}],"Version":"2.2;3.1","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
20 October 2021

UID

ibm10744525