Deploying tuned foundation models
You can tune a model to adapt it to a specific task, dataset, or use case . The tuning process adjusts the parameters or weights of a pre-trained model and improves the model's performance and accuracy. Deploy a tuned model so you can add it to a business workflow and start using foundation models in a meaningful way.
Ways to work
Whichever method you use to tune your model, you must wait for the tuning experiment to finish running before you deploy the tuned model.
Depending on the method you choose for training your model, you can deploy tuned models in the following ways:
- From the Projects UI: A graphical user interface to deploy tuned models that are stored as tuning experiment assets in your project. For details, see Deploying a tuned model from a project.
- Programmatic methods to deploy tuned models: Use these methods for parameter-efficient fine-tuned (PEFT) models.
After deploying a tuned model, you can inference the model by providing text data as input to generate predictions in real-time.
Deploying a tuned model from a project
When you use the Tuning Studio to create your tuning experiment, you can deploy the resulting tuned model directly.
Before you begin
You must set up your task credentials by generating an API key. For more information, see Managing task credentials.
Procedure
To deploy a tuned model, complete the following steps:
-
From the project's Assets tab, click the Experiments asset type.
-
Click to open the tuning experiment for the model you want to deploy.
-
From the Tuned models list, find the completed tuning experiment, and then click New deployment.
-
Name the tuned model.
The name of the tuning experiment is used as the tuned model name if you don't change it. The name has a number after it in parentheses, which counts the deployments. The number starts at one and is incremented by one each time you deploy this tuning experiment.
-
Optional: Add a description and tags.
-
For the Deployment container, choose one of the following options:
- This project: Deploys the tuned model and adds it to your project where you can test the tuned model. You can promote the tuned model deployment to a deployment space at any time. Choose this option if you want to do more testing of the tuned model before the model is used in production.
- Deployment space: Promotes the tuned model to a deployment space and deploys the tuned model. A deployment space is separate from the project where you create the asset. This separation enables you to promote assets from multiple projects to a space, and deploy assets to more than one space. Choose this option when the tuned model is ready to be promoted for production use.
For more information about this option, see Using a deployment space.
Tip: Select the option to view after creating so you can easily find your tuned model after the deployment process completes. -
Click Deploy.
After the tuned model is deployed, a copy of the tuned model is stored in your project as a model asset.
Using a deployment space
When you choose a deployment space as the container for your tuned model, the tuned model is promoted to a deployment space, and then deployed. A deployment space is associated with the following services that it uses to deploy assets:
-
watsonx.ai Runtime: A product with tools and services you can use to build, train, and deploy machine learning models. This service hosts your turned model.
-
IBM Cloud Object Storage: A secure platform for storing structured and unstructured data. Your deployed model asset is stored in a Cloud Object Storage bucket that is associated with your project.
For more information, see Deployment spaces.
To use a deployment space, complete the following steps:
-
After you choose Deployment space as the deployment container, in the Target deployment space field, choose a deployment space.
The deployment space must be associated with a machine learning instance that is in the same account as the project where the tuned model was created.
If you don't have a deployment space, choose Create a new deployment space, and then follow the steps in Creating deployment spaces.
-
In the Deployment serving name field, add a label for the deployment.
The serving name is used in the URL for the API endpoint that identifies your deployment. Adding a name is helpful because the human-readable name that you add replaces a long, system-generated ID that is assigned otherwise.
The serving name also abstracts the deployment from its service instance details. Applications can refer to this name which allows for the underlying service instance to be changed without impacting users.
The name can have up to 36 characters. The supported characters are [a-z,0-9,_].
The name must be unique across the IBM Cloud region. You might be prompted to change the serving name if the name you choose is already in use.
Retrieving the model deployment endpoint
Follow these steps to retrieve the endpoint URL for your tuned model deployment:
- From the Deployments tab of your project or deployment space, click the deployment name.
- In the API Reference tab, find the private and public endpoint links and code snippets that you can use to include the endpoint details in an application.
You need the model endpoint URL to access the deployment from your applications.
Next steps
After you deploy a tuned model, you can test your model by inferencing it. You can manage your model deployment by updating, scaling, or deleting the deployment details.
Learn more
Parent topic: Deploying foundation model assets