Skip to content

some question about the final reward score #25

@wh0x

Description

@wh0x

i have some question, when use single_inference, the rewards is a tensor like tensor([[ 1.4358, -1.9688]], device='cuda:0'), and in the inference code, get the reward[0] as final result, what does the tensor value mean ? and why choose reward[0], The paper mentions that the predictions are for the mean and variance. does 1.4358 is mean and -1.9688 is var? and why choose mean as final result, by the way, i am trying this model as a scorer for image edit data quality eval

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions