Skip to content

add(observation): db pool flushes observation#4668

Merged
steve-chavez merged 2 commits intoPostgREST:mainfrom
mkleczek:pool-flush-obs
Apr 9, 2026
Merged

add(observation): db pool flushes observation#4668
steve-chavez merged 2 commits intoPostgREST:mainfrom
mkleczek:pool-flush-obs

Conversation

@mkleczek
Copy link
Copy Markdown
Collaborator

@mkleczek mkleczek commented Feb 24, 2026

DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

Emit a dedicated PoolFlushed observation when the DB pool is released during schema cache reload.

Comment thread test/io/test_io.py
Comment thread test/spec/Main.hs Outdated
Comment on lines +100 to +101
obsChan <- newChan
appState <- AppState.initWithPool sockets pool config loggerState metricsState (Metrics.observationMetrics metricsState <> writeChan obsChan)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, so we use the newChan instead of logging to stdout/stderr right?

What would be the advantage of this over the python tests?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, so we use the newChan instead of logging to stdout/stderr right?

Haskell makes it so easy to add new observers (Observation -> IO () is a Semigroup) that it is too tempting not to use it :)

What would be the advantage of this over the python tests?

More seriously (this is a mix of more or less subjective opinions):

It is not really clear to me what is the project policy on "what tests should go where". My take on this is that I really think it is quite a bad idea to base your business logic tests on debug output.

I spent a considerable amount of time chasing failing io tests after adding this new observation and was really annoyed with the experience. So decided to move to HSpec tests instead.

I think HSpec tests are really well suited for logic testing. They are type safe and make refactoring easy. Also, reading observations as a stream of typed messages in tests (instead of untyped string debug output) makes the tests less brittle and more readable IMHO.

Chan is a simple but powerful concurrency primitive and, IMHO, perfect fit for testing asynchronous messaging scenarios. See https://github.com/PostgREST/postgrest/pull/4668/changes#diff-0f4d50d33174e92e7c2b8121202cbcb7923360111854c13c2f4baa9eadb80602R44 in this PR. Thanks to some higher level functions based on Chan and dupChan we have a simple DSL to validate observations that also provides nice error messages that simplify debugging.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a considerable amount of time chasing failing io tests after adding this new observation and was really annoyed with the experience. So decided to move to HSpec tests instead.

Agree, I've had the same experience before.

My take on this is that I really think it is quite a bad idea to base your business logic tests on debug output.
Also, reading observations as a stream of typed messages in tests (instead of untyped string debug output) makes the tests less brittle and more readable IMHO.

Agree too. I like the direction overall.


Sometimes we've had cases where the ordering of the log messages change and that was reflected in tests. Say if we decided to move all logging python tests to Haskell, would a change in the order be detected too? (order does matter in some cases, for clarity)

Copy link
Copy Markdown
Collaborator Author

@mkleczek mkleczek Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we've had cases where the ordering of the log messages change and that was reflected in tests. Say if we decided to move all logging python tests to Haskell, would a change in the order be detected too? (order does matter in some cases, for clarity)

That depends on how observations order is related to debug log entries order. They should be the same but we don't test it in HSpec tests - we only read observations from a Chan.
There is a simple helper function waitFor which reads from the Chan skipping messages until it reads the message meeting provided criteria (or hangs waiting for an observation (or timeout) if none available). I think it is sufficient primitive for most cases.
You can test if required observations happen in right order (but possibly with other observations in between), eg:

        waitFor (2 * sec) "PoolFlushed 2" $ \x -> [ o | o@PoolFlushed <- pure x ]
        waitFor (1 * sec) "SchemaCacheLoadedObs" $ \x -> [ o | o@SchemaCacheLoadedObs{} <- pure x ]

This will skip SchemaCacheQueriedObs in between.

If necessary we can implement non-lenient validation.

Let's move this discussion to #4671 (comment)

@steve-chavez
Copy link
Copy Markdown
Member

add(observation): db pool flushes observation

  • Should be called add: log pool flushes?
  • This should only be done on log-level=debug it seems?
  • Also needs an entry on the CHANGELOG

@taimoorzaeem
Copy link
Copy Markdown
Member

Friendly reminder - The commit descriptions in this PR also contain the DISCLAIMER: .... So, just in case you forgot, we don't prefer these as mentioned in #4640 (comment).

@mkleczek mkleczek force-pushed the pool-flush-obs branch 2 times, most recently from a8b1293 to 4f8b622 Compare April 8, 2026 12:45
@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented Apr 8, 2026

add(observation): db pool flushes observation

  • Should be called add: log pool flushes?

Done

  • This should only be done on log-level=debug it seems?

Yes, the observation is logged at debug log level.

  • Also needs an entry on the CHANGELOG

Done

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented Apr 8, 2026

Friendly reminder - The commit descriptions in this PR also contain the DISCLAIMER: .... So, just in case you forgot, we don't prefer these as mentioned in #4640 (comment).

Fixed

Comment thread CHANGELOG.md Outdated
AppState.putConfig appState cfg

-- Wait for 2 seconds so that retry can happen
waitFor (2 * sec) "PoolFlushed 2" $ \x -> [ o | o@PoolFlushed <- pure x ]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, what does the 2 in PoolFlushed 2 mean?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text in quotes is just description and used for easier tracking the exact spot of failure in case we miss a specific observation.

So in this case it is just to make sure that it is easy to identify which specific PoolFlushed observation failed (the second one).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the msg in waitForObs can be anything really?

I might have misunderstood the function, I was thinking the msg needed to match the constructor name of the observation.

waitForObs :: HasCallStack => ObsChan -> Int -> Text -> (Observation -> Maybe a) -> IO ()
waitForObs (ObsChan orig copy) t msg f =

Copy link
Copy Markdown
Collaborator Author

@mkleczek mkleczek Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the msg in waitForObs can be anything really?

Yes.

I might have misunderstood the function, I was thinking the msg needed to match the constructor name of the observation.

Constructor names and msg have different purpose:

msg is to distinguish between multiple expectations, eg:

waitFor (2 * sec) "PoolFlushed 1" $ \x -> [ o | o@PoolFlushed <- pure x ]
waitFor (2 * sec) "PoolFlushed 2" $ \x -> [ o | o@PoolFlushed <- pure x ]
waitFor (2 * sec) "PoolFlushed 3" $ \x -> [ o | o@PoolFlushed <- pure x ]

Upon failure, without msg, you would get a message saying "Waiting for an observation failed" and you wouldn't be able to tell where exactly the failure happened (line numbers from HasCallstack might help but they are not always precise enough in Haskell).

Constructor names are used to produce output with all observations that were produced until failure (so that it is easier to diagnose the test failures), along the lines of:

Failed waiting for "PoolFlushed 2" observation.
Received observations were:
SchemaCacheQueried
OtherObservationConstructorName
SchemaCacheLoaded

Looking at this and the test code above, you know one "PoolFlushed" was received (msg is "PoolFlushed 2") and after that 3 observations were received but none of them was the one we expected.

Copy link
Copy Markdown
Member

@steve-chavez steve-chavez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍. Just needs to correct the CHANGELOG entry as per the above feedback.

@steve-chavez
Copy link
Copy Markdown
Member

Github has been so buggy lately, I cannot merge through UI:

image

@laurenceisla
Copy link
Copy Markdown
Member

laurenceisla commented Apr 8, 2026

Github has been so buggy lately, I cannot merge through UI:

Neither can I. Got the same error:

Base branch was modified. Review and try the merge again.

Edit: Tried approving and merge again. Didn't work.

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented Apr 8, 2026

Github has been so buggy lately, I cannot merge through UI:

Neither can I. Got the same error:

Base branch was modified. Review and try the merge again.

Edit: Tried approving and merge again. Didn't work.

Hmm... I've rebased it on top of current main - can you try again?

@laurenceisla
Copy link
Copy Markdown
Member

Hmm... I've rebased it on top of current main - can you try again?

Just tried it and getting the same error again.

@steve-chavez
Copy link
Copy Markdown
Member

steve-chavez commented Apr 9, 2026

Same for me too 😞

Maybe try opening another PR?

@taimoorzaeem
Copy link
Copy Markdown
Member

Manually checking the log for this PR:

commit adf6802c85c3f31bf882ee7a5a232c79dcb520f7 (HEAD -> pool-flush-obs)
Merge: db998f3c 8d5b1018
Author: Michał Kłeczek 
Date:   Wed Feb 25 18:36:03 2026 +0100

    add: log pool flushes
    
    Emit a dedicated PoolFlushed observation when the DB pool is released during schema cache reload.

commit db998f3c4ff0fe3669ff22450fd4791f07206615
Author: Michał Kłeczek 
Date:   Wed Feb 25 06:23:18 2026 +0100

    test(refactor): make debug output testing more robust
    
    This change introduces a match_log function, that implements lenient matching of log output lines. Tests of log output are retrofited to use the function.

The top commit is a merge commit (it has two parent commits). I think that might be the reason that the rebase is getting rejected.

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented Apr 9, 2026

The top commit is a merge commit (it has two parent commits). I think that might be the reason that the rebase is getting rejected.

Ahh, good catch... That seems an artifact of my use of https://github.com/jj-vcs/jj to manage stacked PRs.

Fixed.

@steve-chavez @laurenceisla - sorry for confusion, will pay more attention to this in the future.

mkleczek added 2 commits April 9, 2026 14:25
This change introduces a match_log function, that implements lenient matching of log output lines. Tests of log output are retrofited to use the function.
Emit a dedicated PoolFlushed observation when the DB pool is released during schema cache reload.
@steve-chavez steve-chavez merged commit 5d4f82d into PostgREST:main Apr 9, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants