AI DevelopmentDistributed Training
TensorBoard
Learn how to use TensorBoard to monitor and compare distributed training tasks in the AI Computing Platform.
Introduction
TensorBoard is a visualization tool for monitoring and comparing distributed training tasks in the AI Computing Platform.
Prerequisites
- The training task has been created.
- The status of a training task is
Running
orCompleted
. - The user has written code to log relevant data to
TENSORBOARD_LOG_PATH
.
Single Task TensorBoard
- Log in to the management console.
- In the top navigation bar, click
Products and Services
>AI Computing Platform
>AI Computing Platform
to go to its overview page. - In the left navigation bar, select
Distributed Training
. The distributed training task list page is displayed by default. - On the Distributed Training List page, click
TensorBoard
in theOperation
column on the right side of the row where the specified task is located.
Notice
- The task status to be viewed must be in
Running
or has been completed. - If you cannot open the TensorBoard page, please check whether the browser pop-up blocker is turned off.
TensorBoard Comparison of Multiple Tasks
- On the distributed training list page, select multiple training tasks.
- Click
Start TensorBoard Comparison
above the list and view it on the pop-up TensorBoard page.
For more information, see the TensorBoard official tutorial.