singlefabric
Model ServingOnline Inference Service

Online Inference Service Expansion/Reduction

Instructions on how to scale up or scale down an online inference service in the management console.

Online Inference Service Expansion/Reduction

Prerequisites

  • The account and password for the management console have been obtained.
  • The online inference service has been created and its status is either Running or Closed.

Scaling

Expanding Capacity

  1. Log in to the management console.
  2. In the top navigation bar, click Products and Services > AI Computing Platform > AI Computing Platform to go to its overview page.
  3. In the left navigation bar, select Inference Service > Online Inference Service to enter the Online Inference Service List page.
  4. Click Service Details in the Operation column of the specified inference service to view its detailed information page.
  5. On the Inference Service Details page, click More Actions in the upper right corner and select Expand Capacity.
  6. In the expansion window that appears, set the number of expansion instances, click OK, and wait for the service update to complete.

Notice: You can only expand the capacity of resources of the same specifications as the original online inference service.

Reducing Capacity

Important: Only inference services with multiple instances support scaling down.

  1. Go to the details page of the specified inference service, click More Actions in the upper right corner, and select Scale Down.
  2. In the pop-up scaling window, set the number of instances to scale down, click OK, and wait for the service update to complete. After scaling down, the instance fee used by the current inference service will be reduced simultaneously.

Notice: An online inference service must contain at least one instance.


On this page