You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: model-deployment/containers/llm/mistral/README.md
+26-37Lines changed: 26 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,9 @@ Public [documentation](https://docs.oracle.com/en-us/iaas/data-science/using/pol
45
45
### Policy to check Data Science work requests
46
46
`allow group <group_name> to manage data-science-work-requests in compartment <compartment_name>`
47
47
48
+
### Policy to access Model deployment end-point in Container Instance
49
+
`allow dynamic-group <group_name> to manage {DATA_SCIENCE_MODEL_DEPLOYMENT_PREDICT} in compartment <compartment_name>`
50
+
48
51
For all other Data Science policies, please refer these [details](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/distributed_training/README.md#3-oci-policies).
49
52
50
53
## Methods for model weight downloads
@@ -89,7 +92,7 @@ The model will be downloaded at container startup time, we just need to provide
89
92
* Download/Clone the model's repository that we are targetting to deploy, from huggingface repository. This can be done in Notebooks session for faster downloads and upload to bucket.
* Zip all items of the folder using zip/tar utility, preferrably using below command to avoid creating another hierarchy of folder structure inside zipped file.
95
98
```bash
@@ -179,13 +182,13 @@ Container creation process is going to be same as TGI. All associated files are
179
182
* Set custom environment variable key `STORAGE_SIZE_IN_GB` with value `950`for 7b model. This is required as model will be downloaded at runtime, so we need to keep extra storage size to accomodate various model sizes.
180
183
* Since in api server file, we have already changed the prediction endpoint to /predict, we don't need any other overrides.
181
184
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
182
-
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU3.2` instance
185
+
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.2` instance
183
186
* Under `Networking` leave the Default option
184
187
* Under `Logging` select the Log Group where you've created your predict and access log and selectthose correspondingly
185
188
* Click on `Show advanced options` at the bottom
186
189
* Select the checkbox `Use a custom container image`
187
190
* Select the OCIR repository and image we pushed earlier
188
-
* To use vLLM as Openai-copmatibel server we need to mention the healthcheck port.
191
+
* To use vLLM as OpenAI-compatible server we need to mention the healthcheck port.
189
192
* Key: `healthCheckPort`, Value: 5002
190
193
No need to change port forrunning vLLM as default server, as default port is mentioned 8080. But is is available as ENV variablein Dockerfile, so feel free to change as needed.
191
194
* Leave CMD and Entrypoint blank
@@ -194,6 +197,14 @@ Container creation process is going to be same as TGI. All associated files are
194
197
* Once the model is deployed and shown as `Active`, you can execute inference against it.
195
198
```bash
196
199
oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --request-body '{"inputs": "Tell me about Data Science"}'
200
+
```
201
+
To execute OpenAI-compatible inference, you can run below command.
202
+
```bash
203
+
oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --request-body '{"model": "/opt/ds/model/deployed_model",
204
+
"prompt":"what are some good skills deep learning expert. Give us some tips on how to structure interview with some coding example?",
0 commit comments