Skip to content

Discussion: What should SessionReport.duration mean, and should it depend on audio recording? #6259

Description

@TimCares

Feature Type

Nice to have

Feature Description

Hello there! This is a follow-up design question to PR #6258, which fixes
the formatting gap (started_at/duration not being emitted
by to_dict()) and a small cleanup. This issue is the part I left
out of that PR because it changes the meaning of an existing field and should
be agreed on first.

Context:
JobContext.make_session_report,
SessionReport.

duration is gated on audio recording

In make_session_report, duration is only computed when
audio_recording_started_at is set:

if sr.audio_recording_started_at:
    sr.duration = sr.timestamp - sr.audio_recording_started_at

(-> New code from PR #6258)

This is the case both in the original code and
in the adjusted code of PR #6258 and boils down to whether audio is being recorded in the session.

So a session with no audio recording gets duration = None, even though
started_at (session start) and timestamp (report build time) are both
available and would yield a perfectly valid duration. A session's duration
arguably shouldn't depend on whether audio happened to be recorded.

Additionally, the start point used here is the audio start
(recording_started_at, set on the first audio frame) while the end point is
the report build time (timestamp). So the two ends of duration come from
two different clocks/concepts.

What should duration mean semantically?

There seem to be two distinct concepts being confused:

  • Session duration: started_at (session start) -> session end. Probably
    what a thing called SessionReport should report.
  • Audio/call duration: first audio frame -> last audio frame. A property of
    the recording, not the session.

The current duration mixes the two (audio start + report end). It's also worth
mentioning timestamp is when the report is built, not when the last audio
frame arrived, so it isn't a precise "end of audio" either.

Open questions

  • Should duration be the session duration (based on started_at and an
    explicit session end), and audio timing be exposed separately?
  • If audio/call timing is wanted, "recording length" and "session
    duration" are different metrics, not both answers to one
    question:
    • Recording length could be derived from the encoded audio.ogg itself
      (total samples / sample rate, i.e. the sum of written frame durations). This
      is the true length of the artifact and can't shift from the file.
    • A wall-clock audio_recording_ended_at - audio_recording_started_at is a
      separate measurement that can be different with the file's real length depending
      on how gaps etc. are handled, and on exactly when the end timestamp is sampled.
  • Is there an incentive for an explicit session-end timestamp (e.g. set in
    AgentSession.aclose)
    instead of relying on report.timestamp?

Backward-compatibility note

SessionReport.to_dict() feeds observability, so redefining the meaning of
duration (audio-based -> session-based) is a behavior change a consumer could
rely on. It might be safer to keep an clear audio duration and add session
duration as its own clearly-named field rather than silently redefining
duration.

Proposed direction (open to feedback)

  1. Stop gating duration on recording, compute it from session lifecycle
    timestamps (started_at -> session end).
  2. Decide whether to introduce clearly-named audio/session timing fields rather
    than overloading duration.

Happy to PR once the semantics here are agreed.

Workarounds / Alternatives

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions