Edit a TensorFlow training model for deep learning insights

Before uploading a TensorFlow training model, edit the model to work with the deep learning insights feature in IBM Spectrum Conductor Deep Learning Impact.

About this task

In order for the deep learning insights feature to work with your single-node TensorFlow or distributed TensorFlow model, your model must call additional IBM Spectrum Conductor Deep Learning Impact APIs in order to collect training metrics. Training metrics are what are displayed in the training insight charts.

Note: For IBM Fabric distributed TensorFlow models and IBM Fabric distributed Caffe models, only the loss value and accuracy metrics are supported.

Procedure

  1. The model must import the monitor_cb module, a Python module for metric collection.
    import monitor_cb
    
  2. The model must include the monitor_cb.CMonitor class instance which handles the collection of many types of training metrics.
    monitor = monitor_cb.CMonitor(log_dir, tf_parameter_mgr.getTestInterval(), tf_parameter_mgr.getMaxSteps())
  3. Include metric summaries for the training metrics that you want to include.
    loss = cifar10.loss(logits, labels)
    accuracy = cifar10.accuracy(logits, labels)
    
    # Summary metrics for the layers
    graph = tf.get_default_graph()
    for layer in ['conv1', 'conv2', 'local3', 'local4']:
    
      # Summary weight and activation histogram of the layer
      monitor.SummaryHist("weight", graph.get_tensor_by_name(layer+'/weights:0'), layer)
      monitor.SummaryHist("activation", graph.get_tensor_by_name(layer+'/'+layer+':0'), layer)
    
      # Summary histogram of gradient with respect to weight 
      monitor.SummaryNorm2("weight", graph.get_tensor_by_name(layer+'/weights:0'), layer)
      monitor.SummaryGradient("weight", loss)
    
      # Summary gradient/weight ratio
      monitor.SummaryGWRatio()
    
      # Summary loss value and accuracy of train/test phase
      monitor.SummaryScalar("train loss", loss)
      monitor.SummaryScalar("train accuracy", accuracy)
      monitor.SummaryScalar("test loss", loss)
      monitor.SummaryScalar("test accuracy", accuracy)
    
  4. Include the train and test operation summaries.
    train_summaries = tf.summary.merge_all(monitor_cb.DLMAO_TRAIN_SUMMARIES)
    test_summaries = tf.summary.merge_all(monitor_cb.DLMAO_TEST_SUMMARIES)
    
  5. Write the summary values to disk so that IBM Spectrum Conductor Deep Learning Impact can display training insights.
    _, train_summary, gs = session.run([train_op, test_summaries, global_step],
    feed_dict = {is_training:False})
    summaryWriter.add_summary(train_summary, gs)
    test_summary = session.run(test_summaries, feed_dict = {is_training:False})
    summaryWriter.add_summary(test_summary, gs)
    

Results

The edited TensorFlow model is ready for training insights.