some question about the final reward score

i have some question, when use single_inference, the rewards is a tensor like tensor([[ 1.4358, -1.9688]], device='cuda:0'), and in the inference code, get the reward[0] as final result, what does the tensor value mean ? and why choose reward[0], The paper mentions that the predictions are for the mean and variance. does 1.4358 is mean and -1.9688 is var? and why choose mean as final result, by the way, i am trying this model as a scorer for image edit data quality eval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some question about the final reward score #25

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

some question about the final reward score #25

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions