-
Notifications
You must be signed in to change notification settings - Fork 7
Description
There are various situations where we don't want AL to go through a full feedback loop because we don't want AL to produce actions, receive feedback, etc.
From the perspective of AL_Train, typically:
1(A). An act() request is sent to AL for an action <- (i.e. AL needs to self-explain the next step)
2(F). AL's next action(s) is received and feedback is sent back to AL
or
3(D). AL has no next actions and an example is sent back to AL
Right now:
-examples_only: causes A to happen and D to happen regardless of AL's response
-test_mode: causes A. to happen, but not F. or D.
But we would also like a way for D. to happen without A.
These cases (at least) should be possible, lets call them "feedback_modes", potentially the user could just choose among these mutually exclusive options instead of setting flags preventing the three illegal ones:
-full/default: A, F, D <-Normal ITS training loop
-no_hints : A, F, _ <- Warning: Infinite Loop (Really only works if AL has finite action space + tries random things)
-predict_observe: A, _, D <- Demonstrations are always given
-observe_only: _, _, D <- Demonstrations are always given
-test: A, _, _ <- Moves to next item on first incorrect
-stepwise_test: A, _, _ <- Moves to next step (without sending demonstration) on incorrect
(, , ), (,F,D), (,F,) are impossible
There is an added complexity if we incorporate other levels of hint beyond bottomout. Additionally no_hints would probably require some kind of empty hint response to be given to AL that would prompt the agent to guess.