Troubleshooting the Developer Portal

This page describes how to troubleshoot common problems that can occur when using the Developer Portal. The steps outlined apply to IBM® API Connect version 5.0.8.4 and later; some of the troubleshooting commands might be missing from earlier releases. It is recommended that you regularly upgrade to the latest fix pack or interim fix.

Basic Checks

Many problems can be caused by incorrect configuration in the basic deployment settings; here are some basic checks that you should do:
  • On each Portal node, run the command status, and confirm that the following properties are correct:
    • APIC Hostname: must match the API Connect Management cluster address, as configured in the Cloud Manager user interface, and if the Developer Portal API's port has been changed in the Cloud Manager, then the same port must be specified here. This value is configured with the set_apim_host command.

      For information on configuring the Management cluster in the Cloud Manager, see Configuring the Management service. For information on setting the Developer Portal API's port, see Specifying the cloud settings.

    • APIC IP: the IP address of the host specified by APIC Hostname.
    • Devportal Hostname: the host name of this Portal server, it must be unique, and is set with the set_hostname or set_apim_host command.
    • Devportal IP: the IP address of the host specified by Devportal Hostname.
    • APIC Certificate Status: indicates whether the certificate that the Portal has for communicating with the Management server is valid. The commands download_apim_cert and set_apim_cert can be used to set this.
    If you identify any problems with these properties, rerun the following Portal configuration commands as necessary:
    • set_hostname
    • set_apim_host
    • download_apim_cert and set_apim_cert
  • Load balancers between the Management server cluster and the Portal cluster must not modify the traffic, and if the Management cluster address port was changed in the Cloud Manager, then the load balancer must use the same port to listen for traffic from the Portal nodes.

Sites are unavailable

If users are unable to access some or all Portal sites, then make the following checks:
  1. Can the site be accessed locally? You can confirm local access by running the check_site command (run list_sites first to get a list of sites); if the site returns status code 200, then there is likely to be a network issue between the user's browser and the Portal server. The log file /var/log/nginx/access.log logs all site access attempts, and if nothing is logged here when the user attempts to access the site then this is also evidence of a network problem between the browser and the Portal server. Check your network routing and load balancing configuration.
  2. Does the site URL that the user enters in their browser match the site URL returned from the list_sites command? These URLs must match.

API Product updates are not appearing in the Portal

If a publish, delete, deprecate, or retire operation on a Product is not reflected in the Portal within a few seconds, then make the following checks:
  1. Does the update appear within 15 minutes? If it does, then try running the command resubscribe_webhooks. If that doesn't fix the problem then collect the logs and open a support request.
  2. If the update still doesn't appear after 15 minutes, check that the configuration for the communication between the Portal and the Management server is correct - see Basic Checks.
  3. Confirm that the site is configured as expected for the Catalog in the Management server API Manager user interface. It should show the same site URL as that listed from the list_sites command on the Portal server.

User registration

When a new user is registered, the Management server sends an activation link and welcome email to the new user. When the user accesses this link, further validation is done between the Portal, the Management server, and any third-party authentication site that might have been configured. Network routing and configuration issues between any of these components can cause user registration problems.

Make the following checks:
  1. Make the checks detailed in Basic Checks.
  2. When attempting a registration step, tail the log files /var/log/cmc.out on the Management server, and /var/log/syslog on the Portal. The cause of the failure might be clear from the messages in these logs; the absence of any log messages related to registration suggests a network issue between the Portal and the Management server.

If the Admin user registration link has been lost, or the email was never received, the command site_login_link can be used to regenerate and display the link.

User login

The same checks apply here as with User registration. Additionally, users might have been blocked with login security or flood control, in which case see: Blocking and unblocking specific users and reset_locked_user. Note that resetting passwords does not unblock users.

Clustering and database problems

Common problems are as follows:
Timezone and time synchronisation
The operating system timezone (not the PHP timezone) of all cluster members must be UTC, and they must all use the same NTP server so that their times are synchronized to sub-second accuracy.
SSH Settings
All cluster members should be able to make SSH connections to each other without requiring a password. The SSH configuration is handled by the clustering scripts. Users should not attempt to modify SSH settings themselves, and if any modifications have been made these should be reverted.
Problems with database clustering
These problems are indicated by the output of the status command showing the cluster members database status as anything other than Active (Primary), unless a transitory operation is running.

For a transitory operation, the status command might show various statuses other than Active (Primary) during clustering operations and upgrades; these statuses are usually Starting, Stopping, SST-Running-X% and sometimes, for a few minutes, Preparing. These states are not a cause for concern unless the operation completed more than 15 minutes earlier, or it returned an error message.

Make the following checks:
  1. Check that all members of the cluster are listed in the output of the status command when run on every node. If there is a mismatch between nodes, this can be corrected by explicitly setting the cluster members on each node with the set_cluster_members <member_1_IP> <member_2_IP> <member_3_IP> ... command.
  2. On one of the nodes, run the command bootstrap_cluster -bf; this will try to restart the database on all the nodes, forcing it to stop and restart.
  3. You can use the stop_db and start_db commands to attempt to restart the database on an individual node.
  4. Check the log file /var/log/syslog for MySQL errors.

Do not attempt to modify the MySQL configuration; if problems persist after checking the above then gather logs with the generate_logs command and open a support ticket.

Problems with file synchronization
When a new cluster member is added, the status command will report that file synchronization is taking place. On a simple deployment with just a few sites this might take about 10 minutes, on larger deployments with many sites and higher network transfer times between nodes this process can take hours.

It is usual for the reported number of files being synchronized to change (increasing and decreasing) during the process, but if it appears to be taking too long for the size of deployment, or regularly restarting or cycling, then gather the logs with the generate_logs command and open a support ticket.

Site creation

If, when attempting to create a site in the Management server API Manager user interface, an error is returned or nothing appears to happen, and no site creation email is received, then make the following checks:
  1. Refer to Basic Checks to confirm that the Management server and Portal are able to communicate with each other.
  2. Run the list_sites command on all Portal nodes and see if the new site is listed. If the site is not listed, or it is reported to be in an error state, or is stuck in the INSTALLING state for more than 30 minutes, then gather the logs with the generate_logs and open a support request.

Backup and restore

Common problems with these operations are due to unexpected files and modules in the site directories; see Developer Portal best practices for administrators.

The log files /var/log/devportal/command_line.log and /var/log/devportal/site_action.log might provide clues as to where the operation is failing. Otherwise gather the logs with the generate_logs command and open a support request.

Upgrade

After an upgrade failure, the portal could be in one of several possible states; run the status command to confirm which, taking note of the version numbers from the output, for example:
  • System version: 7.x-5.0.8.4-iFix-20180828-2355: this refers to the version of the portal executables and libraries.
  • Distribution version: 7.x-5.0.8.4-iFix-20180828-2206: this refers to the version of the portal site template.
Pay attention to the datestamps in these version numbers; they should usually have the same date, although occasionally the Distribution version might be a day earlier, but not the same time. In a successful upgrade, both should show the same datestamp as that which is shown in the file name of the fix pack that was just installed.

If the System version has updated successfully but the Distribution version hasn't, or if neither have updated, then try re-running the fix pack using the -f (force) option.

If both System version and Distribution versions are shown to have been updated, and the error reported was due to sites failing to upgrade, then you can attempt site upgrade again with the following command upgrade_devportal -p <platform> -s <site>, where <platform> is taken from the output of the list_platforms command, for example:
list_platforms
platform_devportal_7_x_5_0_7_2_20170628_2109 => devportal-7.x-5.0.7.2-20170628-2109 : Template Exists
platform_devportal_7_x_5_0_8_2_20180121_2206 => devportal-7.x-5.0.8.2-20180121-2206 : Template Exists
platform_devportal_7_x_5_0_8_3_20180508_2206 => devportal-7.x-5.0.8.3-20180508-2206 : Template Exists (default)
and <site> is taken from the output of the list_sites command, for example:
list_sites
5b97c411e4b014572ebe29e1.5b97c411e4b014572ebe29ed => apimdev0030.hursley.ibm.com/iainsorg/sb (INSTALLED)
5b97c411e4b014572ebe29e1.5b97c453e4b014572ebe2a01 => apimdev0030.hursley.ibm.com/iainsorg/new-catalog-1 (INSTALLED)
The following example shows a sample upgrade_devportal command:
upgrade_devportal -p devportal-7.x-5.0.8.3-20180508-2206 -s 5b97c411e4b014572ebe29e1.5b97c411e4b014572ebe29ed

Emails

The Developer Portal sends emails for forum subscriptions, site creations, and site contact forms, by using the SMTP server that was configured with the set_smtp command. Use this command to configure SMTP and to send a test email. Check the file /var/log/mail.log for any errors in sending emails.

User registration and password reset emails are not sent by the Developer Portal, they are sent by the Management server and are logged in /var/log/cmc.out on the Management server.

Performance

Make the following checks:
  • Ensure that the minimum hardware requirements have been met; see Deploying the Developer Portal OVA file.
  • Confirm that there is sufficient disk space by running the command df -h.
  • If no major operations are in progress, such as new site creations, upgrades, or clustering, and you have at least 3 GB of spare disk space, run the command hardware_performance_test, this tests the CPU and disk I/O speeds; note that the results will vary with each run and you should take an average over several runs. The Developer Portal is particularly sensitive to disk I/O speed, anything below 50 MB per second is likely to cause serious performance problems.

Removing a problem node from a cluster

If you have a node that you suspect is broken, and that is causing the cluster to be inoperable, then you can complete the following steps to shut down and remove the broken node:
  1. Run the command sudo halt -p
  2. Remove the node from the cluster by running the following command on the other nodes to explicitly specify all the nodes that should be in the cluster, excluding the one being removed:
    set_cluster_members <member_1_IP> <member_2_IP> <member_3_IP> ...
  3. If the databases are down on the other cluster members, they should restart themselves. If they do not restart, run the following command to start any of the databases that are down:
    bootstrap_cluster -b
  4. If the databases are up but not functioning or clustering together, run the following command to stop all of the databases and restart them:
    bootstrap_cluster -bf

Analytics

Common reasons for analytics problems in the Developer Portal are as follows:
  • Portal Delegated User Registry is being used. Analytics are not supported if this user registry is selected.
  • The API Manager port that is specified in Cloud Manager user interface Settings > TLS Profiles page is not 443 or does not match the custom port specified in the APIC Hostname output of the status command. This port must match the Developer Portal APIs port, also specified in Cloud Manager user interface Settings > TLS Profiles page, and must be the port specified when configuring the Developer Portal with the set_apim_host command; see Basic Checks.

Specific problems

When I attempt to deploy the Developer Portal OVA template I receive the following error: The following manifest file entry (line 1) is invalid

You must deploy the Developer Portal OVA template by using a version of the VMware vSphere Client that supports the SHA-256 Cryptographic Hash Algorithm.

I see the following message in the Developer Portal user interface: The system is currently experiencing problems. Please try again later.
Check the log files /var/log/devportal/background_sync.log and /var/log/syslog for errors. You can force re-synchronization to determine whether the problem was temporary, by completing the following steps:
  1. Log in to your Developer Portal as the administrator.
  2. On the administrator dashboard, click Configuration > System > Cron.
  3. From the Operations list for the job that is titled Background sync, select Run.

If you see the message Allowed memory size of number_of_bytes exhausted in the log file /var/log/syslog, run the command php_max_memory 1024 to increase the maximum memory to 1024 MB, then run the background synchronization task again.

In the log files, I see warnings that contain the string using password: NO
You can ignore these warnings; the following example shows such a message:
Sep 27 17:52:46 myservername mysqld: 2017-09-27 17:52:46 5085 [Warning] Access denied for user 'root'@'localhost' (using password: NO)
I see the error Access Denied when updating a cluster
If, when adding a new Developer Portal node to a cluster, you see the following error message:
Access denied for user 'myuser@myhost' (using password: password)
complete the following steps:
  1. Log in to the Developer Portal CLI as the root user by entering the command: sudo -i
  2. Check the file /etc/mysql/debian.cnf in the new and existing node or nodes to see if the passwords match. If they do not match, copy the password from the existing node into the new node, and attempt the update process again.

Restoring a standalone Portal server by using a backup file

To recover a standalone Portal server onto a new OVA deployment by using a backup file, complete the following steps:
  1. Ensure that the Portal backup file is available on a separate FTP/SFTP server.
    Note: Do not confuse a Portal backup file with a Portal site backup file. By default, a Portal backup file has the following filename format: apim-portal-<hostname>-<datestamp>.tgz.
  2. Ensure that you have the Developer Portal OVA file that matches the Portal server that you are restoring (it must have the same fix pack and build number, for example 5.0.8.3-APIConnect-Portal-Ubuntu16-20180508-1349.ova).
  3. Shut down and delete your corrupted Portal server.
  4. Deploy the Developer Portal OVA file in place of the deleted server, and set the same IP address and host name as the deleted server had. For detailed instructions, see Deploying the Developer Portal OVA file.
  5. Configure the Developer Portal to connect to the Management server. For detailed steps, see Installing the Developer Portal.
  6. Copy the backup file to the newly deployed Portal server by using FTP, SFTP, or SCP.
  7. Then, restore the Portal by using the restore_devportal -scn command. For example:
    restore_devportal -scn backup_file_absolute_path
    where backup_file_absolute_path is the location of your backup file. For more information, see restore_devportal.
  8. Run the status command. Check that the server is marked as SUCCESS.

Restoring a cluster of Portal servers by using a backup file

To recover a cluster of Portal servers onto a new OVA deployment by using a backup file, complete the following steps:
  1. Ensure that you have a backup file of one of the Portal servers. If the Portal Cluster Address (as configured in the Cloud Manager) was set to one of the Portal servers (instead of a load balancer), then use the backup from that server.
  2. Shutdown and delete all of the Portal cluster servers.
  3. Deploy and configure a single Portal server with the same hostname and IP address as that of the backup, and connect it to the Management server (by using set_apim_host and download_apim_cert/set_apim_cert). For detailed steps, see Installing the Developer Portal.
  4. Run the status command to confirm that the Portal server was setup with no errors.
  5. Run the following command:
    sudo mkdir /etc/mysql/certs ; sudo chmod 750 /etc/mysql/certs ; sudo chgrp mysql /etc/mysql/certs
  6. Copy the backup file to the newly deployed Portal server by using FTP, SFTP, or SCP.
  7. Then, restore the Portal on the new server by using the restore_devportal -scpn command. For example:
    restore_devportal -scpn backup_file_absolute_path
    where backup_file_absolute_path is the location of your backup file. For more information, see restore_devportal.
  8. Run the status and list_sites commands, and confirm that the Portal and its sites have been restored and that there are no errors.
  9. Run the following command to create a cluster:
    set_cluster_members -c
  10. Deploy the remaining Portal cluster servers, and set them to the same IP address and hostname that they had before (by using the set_hostname command). Then run the following command:
    set_cluster_members hostname/IP_of_existing_cluster_member
    where hostname/IP_of_existing_cluster_member is the hostname or IP address of the server where steps 3-9 were run.
  11. You can monitor the progress of the clustering by running the status command.
Your cluster of Portal servers are now restored onto a new OVA deployment.

Restoring a Portal site by re-creating

As an alternative to restoring a Portal site from a backup file, a site can be re-created from the Management server. The Portal users, and their applications and subscriptions, are all stored on the Management server, and so this data is saved. However, the following points must be noted:
  • It is not possible to re-create Portal Delegated User Registry (PDUR) sites; this type of site can be restored only from a backup file.
  • Any customizations that were made to the site, for example modifications to forms, uploaded images, custom themes and modules, are lost when the site is re-created.
To re-create a Portal site by using the Management server, complete the following steps:
  1. In the API Manager UI, select the Catalog > Settings > Portal page that corresponds to the Portal site that you want to re-create.
  2. Set the Select Portal field to None, and click Save.
  3. After a few minutes the site is deleted from the Portal; run the list_sites command on the Portal CLI to confirm that the site is removed.
  4. Return to the same Catalog > Settings > Portal page in the API Manager UI, set the Select Portal field back to IBM Developer Portal, and set the URL field to what it was set to previously.
After a few minutes the Portal site is re-created, and an admin user invitation email is sent.