Skip to content

Commit 397c0ce

Browse files
committed
Updated README with TGI and vLLM end points
1 parent 42fe816 commit 397c0ce

File tree

1 file changed

+26
-37
lines changed
  • model-deployment/containers/llm/mistral

1 file changed

+26
-37
lines changed

model-deployment/containers/llm/mistral/README.md

Lines changed: 26 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,9 @@ Public [documentation](https://docs.oracle.com/en-us/iaas/data-science/using/pol
4545
### Policy to check Data Science work requests
4646
`allow group <group_name> to manage data-science-work-requests in compartment <compartment_name>`
4747

48+
### Policy to access Model deployment end-point in Container Instance
49+
`allow dynamic-group <group_name> to manage {DATA_SCIENCE_MODEL_DEPLOYMENT_PREDICT} in compartment <compartment_name>`
50+
4851
For all other Data Science policies, please refer these [details](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/distributed_training/README.md#3-oci-policies).
4952

5053
## Methods for model weight downloads
@@ -89,7 +92,7 @@ The model will be downloaded at container startup time, we just need to provide
8992
* Download/Clone the model's repository that we are targetting to deploy, from huggingface repository. This can be done in Notebooks session for faster downloads and upload to bucket.
9093
```bash
9194
git lfs install
92-
git clone https://huggingface.co/meta-llama/Llama-2-13b-hf
95+
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
9396
```
9497
* Zip all items of the folder using zip/tar utility, preferrably using below command to avoid creating another hierarchy of folder structure inside zipped file.
9598
```bash
@@ -179,13 +182,13 @@ Container creation process is going to be same as TGI. All associated files are
179182
* Set custom environment variable key `STORAGE_SIZE_IN_GB` with value `950` for 7b model. This is required as model will be downloaded at runtime, so we need to keep extra storage size to accomodate various model sizes.
180183
* Since in api server file, we have already changed the prediction endpoint to /predict, we don't need any other overrides.
181184
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
182-
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU3.2` instance
185+
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.2` instance
183186
* Under `Networking` leave the Default option
184187
* Under `Logging` select the Log Group where you've created your predict and access log and select those correspondingly
185188
* Click on `Show advanced options` at the bottom
186189
* Select the checkbox `Use a custom container image`
187190
* Select the OCIR repository and image we pushed earlier
188-
* To use vLLM as Openai-copmatibel server we need to mention the healthcheck port.
191+
* To use vLLM as OpenAI-compatible server we need to mention the healthcheck port.
189192
* Key: `healthCheckPort`, Value: 5002
190193
No need to change port for running vLLM as default server, as default port is mentioned 8080. But is is available as ENV variable in Dockerfile, so feel free to change as needed.
191194
* Leave CMD and Entrypoint blank
@@ -194,6 +197,14 @@ Container creation process is going to be same as TGI. All associated files are
194197
* Once the model is deployed and shown as `Active`, you can execute inference against it.
195198
```bash
196199
oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --request-body '{"inputs": "Tell me about Data Science"}'
200+
```
201+
To execute OpenAI-compatible inference, you can run below command.
202+
```bash
203+
oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --request-body '{"model": "/opt/ds/model/deployed_model",
204+
"prompt":"what are some good skills deep learning expert. Give us some tips on how to structure interview with some coding example?",
205+
"max_tokens":250,
206+
"temperature": 0.7,
207+
"top_p":0.8}'
197208
```
198209
199210
## Inference
@@ -203,7 +214,7 @@ oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --requ
203214
* Under the left side under `Resources` select `Invoking your model`
204215
* You will see the model endpoint under `Your model HTTP endpoint` copy it
205216
* Open the `config.yaml` file
206-
* Depending on which model you decided to deploy the 7b or 14b change the endpoint URL with the one you've just copied
217+
* Change the endpoint URL for the model with the one you've just copied
207218
* Install the dependencies
208219
209220
```bash
@@ -225,7 +236,7 @@ oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --requ
225236
```bash
226237
oci raw-request \
227238
--http-method POST \
228-
--target-uri "https://modeldeployment.eu-frankfurt-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.eu-frankfurt-1.amaaaaaanif7xwiahljboucy47byny5xffyc3zbkpfk4jtcdrtycjb6p2tsa/predict" \
239+
--target-uri "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.amaaaaaav66vvniam45ujbnig43wiltlf6h2p4ohrauk7kq5tspnn427pkra/predict" \
229240
--request-body '{
230241
"inputs": "Write a python program to randomly select item from a predefined list?",
231242
"parameters": {
@@ -236,18 +247,17 @@ oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --requ
236247
```
237248
238249
* vLLM Inference
239-
250+
251+
* OpenAI compatible server
240252
```bash
241-
oci raw-request \
242-
--http-method POST \
243-
--target-uri "https://modeldeployment.eu-frankfurt-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.eu-frankfurt-1.amaaaaaanif7xwiaje3uc4c5igep2ppcefnyzuab3afufefgepicpl5whm6q/predict" \
244-
--request-body '{
245-
"inputs": "are you smart?",
246-
"use_beam_search": true,
247-
"n": 4,
248-
"temperature": 0
249-
}' \
250-
--auth resource_principal
253+
oci raw-request
254+
--http-method POST
255+
--target-uri "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.amaaaaaav66vvniabq7ahm2h2pbvh6ti37svti5n5fk7jirucxdtdfcuo22q/predict"
256+
--request-body '{"model": "/opt/ds/model/deployed_model",
257+
"prompt":"what are some good skills deep learning expert. Give us some tips on how to structure interview with some coding example?",
258+
"max_tokens":250,
259+
"temperature": 0.7,
260+
"top_p":0.8}'
251261
```
252262
253263
## Deploying using ADS
@@ -324,24 +334,3 @@ Customer should check the predict and health check endpoints, if defined through
324334
### Advanced debugging options: Code debugging inside the container using job
325335
For more detailed level of debugging, user can refer [README-DEBUG.md](./README-DEBUG.md).
326336
327-
## Additional Make Commands
328-
329-
### TGI containers
330-
331-
`make build.tgi` to build the container
332-
333-
`make run.tgi` to run the container
334-
335-
`make shell.tgi` to launch container with shell prompt
336-
337-
`make stop.tgi` to stop the running container
338-
339-
### vLLM containers
340-
341-
`make build.vllm` to build the container
342-
343-
`make run.vllm` to run the container
344-
345-
`make shell.vllm` to launch container with shell prompt
346-
347-
`make stop.vllm` to stop the running container

0 commit comments

Comments
 (0)