Hotel Receptionist Scenario Expansion#6186
Conversation
Adds 11 new scenarios (check-out early, dinner move/cancel, wake-up move, red-eye hold, valuables/liability, local-area, callback-to-finish, hostile free-night, can't-verify change, room/floor confirm) plus the example logic they exercise: view-based room moves (agent.py, hotel_db.py, modify_booking.py) and restaurant-reservation modification (hotel_db.py), and two new policy docs (local_area.md, safe_deposit.md). Ported on top of the benchmark PR branch so the gradeable expected_state versions of shared scenarios are preserved.
| try: | ||
| report_dict = report.to_dict() | ||
| report_dict["tags"] = sorted(ctx.tagger.tags) | ||
| report_dict["evaluations"] = ctx.tagger.evaluations | ||
| report_dict["outcome"] = ctx.tagger.outcome | ||
| report_dict["outcome_reason"] = ctx.tagger.outcome_reason | ||
| with open(os.path.join(report_dir, f"session_report-{room}.json"), "w") as f: | ||
| json.dump(report_dict, f, indent=2) | ||
| except Exception: | ||
| logger.exception("error dumping session report") |
There was a problem hiding this comment.
π© run_artifacts.py references potentially new SessionReport/Tagger API surface
dump_run_artifacts calls report.to_dict(), ctx.tagger.evaluations, ctx.tagger.outcome, and ctx.tagger.outcome_reason (run_artifacts.py:40-44). These may be newer SDK APIs not present in older versions. The entire block is wrapped in a try/except so a missing attribute wouldn't crash the session, but the artifact dump would silently fail. Worth verifying these APIs exist in the targeted SDK version.
Was this helpful? React with π or π to provide feedback.
| async def start_restaurant_booking(self, ctx: RunContext[Userdata]) -> str | None: | ||
| """Start the restaurant-reservation flow. Call it the moment the caller wants a table - the flow collects date, party size, time, name, and phone itself. Its return is the FINAL result of the reservation: relay it and move on - nothing further to confirm or call afterwards.""" | ||
| reservation = await BookRestaurantTask( | ||
| db=ctx.userdata.db, chat_ctx=speech_only(self.chat_ctx) | ||
| ) | ||
| return ( | ||
| f"You're set for {speak_time(reservation.time)} on " | ||
| f"{reservation.date.strftime('%A, %B %-d')} for " | ||
| f"{reservation.party_size} guest{'s' if reservation.party_size != 1 else ''}. " | ||
| f"Confirmation code: {_speak_code(reservation.code)}. " | ||
| "| reservation complete - relay this to the caller; no further tool call is needed." | ||
| ) |
There was a problem hiding this comment.
π© Asymmetric duplicate-prevention: room bookings guarded but restaurant bookings are not
The PR adds a duplicate-prevention guard for start_room_booking (tools_rooms.py:183-195) using last_room_booking and caller_turns_at_last_booking in Userdata. No equivalent guard exists for start_restaurant_booking (tools_restaurant.py:50-61). The Userdata class in common.py has no last_restaurant_booking field. This is presumably intentional β the room booking flow is longer and more prone to model re-entry than the restaurant flow β but it creates an asymmetry. If the same re-entry problem occurs with restaurant bookings, it would silently double-book a table.
Was this helpful? React with π or π to provide feedback.
This PR:
Next: