singlefabric
Model ServingOnline Inference Service

Creating an Online Inference Service

Learn how to create and manage an online inference service on the AI computing platform.

Creating an Online Inference Service

Online reasoning service refers to an online service that synchronously gives reasoning results for each reasoning request. The AI intelligent computing platform provides users with full life cycle management of online reasoning service instances. Reasoning instances can be expanded and reduced online, monitored in real time, and can retrieve and query reasoning logs, etc., making it convenient for users to manage the reliability of various AI services.

Users can use existing preset models in the Model Catalog to deploy models with one click, or upload private models and deploy models using custom inference images, thereby obtaining high-concurrency and stable online model services at a lower resource cost.

Creating an Inference Service from a Model Catalog

Prerequisites

  • The management console account and password have been obtained.

Procedure

  1. Log in to the management console.
  2. In the top navigation bar, click Products and Services > AI Computing Platform > AI Computing Platform to go to its overview page.
  3. In the left navigation bar, select Model Catalog to enter the Model Catalog page.
  4. On the Model Catalog page, enter the key field in the search box to filter the model to be used, hover the mouse over the model, and click Online Inference.
  5. In the Create Online Inference Service window that pops up, configure various parameters.
    • Model Name: The model name selected by the user in the model Catalog is automatically generated by the system.
    • Service Name: Optional item, the name of the current online reasoning service, customized by the user.
    • Resource Configuration: Required field, resource configuration required for online reasoning of the current model. Users can select the corresponding resource type and configuration based on actual conditions.
  6. Click Create to enter the service information page and wait for the service to be created.
  7. The status of a successfully created service should be running, and users can view the corresponding model inference results.

Building a Private Model Inference Service

Prerequisites

  • The management console account and password have been obtained.
  • If you build an inference service for a private model, you need to create a file storage in advance and upload the model file to the corresponding directory.

Procedure

  1. Log in to the management console.

  2. In the left navigation bar, select Inference Service > Online Inference Service, enter the Online Inference Service List page, and click + Online Inference Service.

  3. Configure various parameters in the Create Online Inference Service page that pops up and click Create.

    • Service Name: Optional item, the name of the current online reasoning service, customized by the user.
    • Image Selection: Required field. You can select a public image, a custom image, or a private image address.
    • Model Configuration: The model file and mounting address need to be configured.
    • Environment Variables: Optional item: environment variables that the user customizes for the current inference service.
    • Third-party Dependencies: Optional item, used to load environment dependencies not included in the image.
    • Startup Command: Optional item. Fill in the startup command for model inference according to the path of the startup code in the model file uploaded by the user.
    • Network Ports: Optional, the network port on which the Pod instance is started.
    • Resource Configuration: Required field, resource configuration required for online inference of the current model.
  4. Wait for the inference service to be created and the status to running indicate successful creation. Then, users can view the corresponding model inference results.


On this page