Endpoint marked ready before model is fully loaded

**Describe the bug**
I have a model which takes fairly long to load (40s). When I run some constant traffic against the endpoint and then scale in another instance, I see a short spike of errors. From logging timestamps I could conclude that these errors happened before the model loading completed.

I found that, on startup of the model_server there is a fixed 1s wait period (https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py#L266) and afterwards we just check if there is a matching process and return this without double checking if the model is actually loaded.

**To reproduce**
- Have running endpoint with model which takes some time to initialize.
- Run some constant traffic agianst endpoint
- Scale in another instance
**Expected behavior**
No errors spikes on scaling events and waiting til the model is fully loaded.


**System information**
- I am using custom docker image with CPU and noticed this issue for multiple frameworks


**Additional context**
Is there a parameter to control this initial loading time of the model which I might have missed?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Endpoint marked ready before model is fully loaded #138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Endpoint marked ready before model is fully loaded #138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions