I utilized the BA40 model following the specified parameters outlined in the paper and conducted a generalization test on BA500. However, when employing the soft policy, I achieved an approximation ratio of approximately 0.589, which differed from the results presented in the paper. I'm curious, why did you employ the deterministic version for that particular test?
I utilized the BA40 model following the specified parameters outlined in the paper and conducted a generalization test on BA500. However, when employing the soft policy, I achieved an approximation ratio of approximately 0.589, which differed from the results presented in the paper. I'm curious, why did you employ the deterministic version for that particular test?