singlefabric
AI DevelopmentDistributed Training

TensorBoard

Learn how to use TensorBoard to monitor and compare distributed training tasks in the AI Computing Platform.

Introduction

TensorBoard is a visualization tool for monitoring and comparing distributed training tasks in the AI Computing Platform.

Prerequisites

  • The training task has been created.
  • The status of a training task is Running or Completed.
  • The user has written code to log relevant data to TENSORBOARD_LOG_PATH.

Single Task TensorBoard

  1. Log in to the management console.
  2. In the top navigation bar, click Products and Services > AI Computing Platform > AI Computing Platform to go to its overview page.
  3. In the left navigation bar, select Distributed Training. The distributed training task list page is displayed by default.
  4. On the Distributed Training List page, click TensorBoard in the Operation column on the right side of the row where the specified task is located.

Notice

  • The task status to be viewed must be in Running or has been completed.
  • If you cannot open the TensorBoard page, please check whether the browser pop-up blocker is turned off.

TensorBoard Comparison of Multiple Tasks

  1. On the distributed training list page, select multiple training tasks.
  2. Click Start TensorBoard Comparison above the list and view it on the pop-up TensorBoard page.

For more information, see the TensorBoard official tutorial.


On this page