Skip to content

Empty inference cache #41

@malte-aws

Description

@malte-aws

If there is an error in the inference runner then it will return an empty list as a result. The evaluator inserts the empty list into the inference cache. The leads to downstream errors that become hard to trace back.

The evaluator should not insert empty inference results into the cache.

This occurs if the AWS credentials expire.

Code to replicate this issue:

prompt_adapter = ... # setup as in the example notebook
test_set = ... # setup as in the example notebook
metric_adapter = ... # setup as in the example notebook


import os
# Setup your AWS Access Key and Secret Key as environment variables.
os.environ["AWS_ACCESS_KEY_ID"]="EXPIRED/INVALID CREDENTIALS"
os.environ["AWS_SECRET_ACCESS_KEY"]="EXPIRED/INVALID CREDENTIALS"

# Setup Nova Model
NOVA_MODEL_ID = "us.amazon.nova-lite-v1:0"

from amzn_nova_prompt_optimizer.core.inference.adapter import BedrockInferenceAdapter

inference_adapter = BedrockInferenceAdapter(region_name="us-east-1")

from amzn_nova_prompt_optimizer.core.evaluation import Evaluator

evaluator = Evaluator(prompt_adapter, test_set, metric_adapter, inference_adapter)

evaluation_score = evaluator.aggregate_score(model_id=NOVA_MODEL_ID)
# at this point the cache will contain an empty list

# now let's fix your credentials 

os.environ["AWS_ACCESS_KEY_ID"]="VALID CREDENTIALS"
os.environ["AWS_SECRET_ACCESS_KEY"]="VALID CREDENTIALS"

inference_adapter = BedrockInferenceAdapter(region_name="us-east-1")

evaluator = Evaluator(prompt_adapter, test_set, metric_adapter, inference_adapter)

evaluation_score = evaluator.aggregate_score(model_id=NOVA_MODEL_ID)
# now there is a downstream failure because of the empty results inserted in the cache. the correct expected behavior should be that the cache does not have any result because they are empty which would run inference again 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions