Model ServingOnline Inference Service
Online Inference Service Expansion/Reduction
Instructions on how to scale up or scale down an online inference service in the management console.
Online Inference Service Expansion/Reduction
Prerequisites
- The account and password for the management console have been obtained.
- The online inference service has been created and its status is either
Running
orClosed
.
Scaling
Expanding Capacity
- Log in to the management console.
- In the top navigation bar, click Products and Services > AI Computing Platform > AI Computing Platform to go to its overview page.
- In the left navigation bar, select Inference Service > Online Inference Service to enter the Online Inference Service List page.
- Click Service Details in the Operation column of the specified inference service to view its detailed information page.
- On the Inference Service Details page, click More Actions in the upper right corner and select Expand Capacity.
- In the expansion window that appears, set the number of expansion instances, click OK, and wait for the service update to complete.
Notice: You can only expand the capacity of resources of the same specifications as the original online inference service.
Reducing Capacity
Important: Only inference services with multiple instances support scaling down.
- Go to the details page of the specified inference service, click More Actions in the upper right corner, and select Scale Down.
- In the pop-up scaling window, set the number of instances to scale down, click OK, and wait for the service update to complete. After scaling down, the instance fee used by the current inference service will be reduced simultaneously.
Notice: An online inference service must contain at least one instance.