add(observation): db pool flushes observation#4668
add(observation): db pool flushes observation#4668steve-chavez merged 2 commits intoPostgREST:mainfrom
Conversation
| obsChan <- newChan | ||
| appState <- AppState.initWithPool sockets pool config loggerState metricsState (Metrics.observationMetrics metricsState <> writeChan obsChan) |
There was a problem hiding this comment.
This is interesting, so we use the newChan instead of logging to stdout/stderr right?
What would be the advantage of this over the python tests?
There was a problem hiding this comment.
This is interesting, so we use the newChan instead of logging to stdout/stderr right?
Haskell makes it so easy to add new observers (Observation -> IO () is a Semigroup) that it is too tempting not to use it :)
What would be the advantage of this over the python tests?
More seriously (this is a mix of more or less subjective opinions):
It is not really clear to me what is the project policy on "what tests should go where". My take on this is that I really think it is quite a bad idea to base your business logic tests on debug output.
I spent a considerable amount of time chasing failing io tests after adding this new observation and was really annoyed with the experience. So decided to move to HSpec tests instead.
I think HSpec tests are really well suited for logic testing. They are type safe and make refactoring easy. Also, reading observations as a stream of typed messages in tests (instead of untyped string debug output) makes the tests less brittle and more readable IMHO.
Chan is a simple but powerful concurrency primitive and, IMHO, perfect fit for testing asynchronous messaging scenarios. See https://github.com/PostgREST/postgrest/pull/4668/changes#diff-0f4d50d33174e92e7c2b8121202cbcb7923360111854c13c2f4baa9eadb80602R44 in this PR. Thanks to some higher level functions based on Chan and dupChan we have a simple DSL to validate observations that also provides nice error messages that simplify debugging.
There was a problem hiding this comment.
I spent a considerable amount of time chasing failing io tests after adding this new observation and was really annoyed with the experience. So decided to move to HSpec tests instead.
Agree, I've had the same experience before.
My take on this is that I really think it is quite a bad idea to base your business logic tests on debug output.
Also, reading observations as a stream of typed messages in tests (instead of untyped string debug output) makes the tests less brittle and more readable IMHO.
Agree too. I like the direction overall.
Sometimes we've had cases where the ordering of the log messages change and that was reflected in tests. Say if we decided to move all logging python tests to Haskell, would a change in the order be detected too? (order does matter in some cases, for clarity)
There was a problem hiding this comment.
Sometimes we've had cases where the ordering of the log messages change and that was reflected in tests. Say if we decided to move all logging python tests to Haskell, would a change in the order be detected too? (order does matter in some cases, for clarity)
That depends on how observations order is related to debug log entries order. They should be the same but we don't test it in HSpec tests - we only read observations from a Chan.
There is a simple helper function waitFor which reads from the Chan skipping messages until it reads the message meeting provided criteria (or hangs waiting for an observation (or timeout) if none available). I think it is sufficient primitive for most cases.
You can test if required observations happen in right order (but possibly with other observations in between), eg:
waitFor (2 * sec) "PoolFlushed 2" $ \x -> [ o | o@PoolFlushed <- pure x ]
waitFor (1 * sec) "SchemaCacheLoadedObs" $ \x -> [ o | o@SchemaCacheLoadedObs{} <- pure x ]
This will skip SchemaCacheQueriedObs in between.
If necessary we can implement non-lenient validation.
Let's move this discussion to #4671 (comment)
cefc5d3 to
b9e24c5
Compare
b9e24c5 to
86113ec
Compare
75f4490 to
279ae2d
Compare
8b15180 to
144c6e5
Compare
e986d2e to
e0afbba
Compare
|
|
Friendly reminder - The commit descriptions in this PR also contain the |
a8b1293 to
4f8b622
Compare
Done
Yes, the observation is logged at debug log level.
Done |
Fixed |
| AppState.putConfig appState cfg | ||
|
|
||
| -- Wait for 2 seconds so that retry can happen | ||
| waitFor (2 * sec) "PoolFlushed 2" $ \x -> [ o | o@PoolFlushed <- pure x ] |
There was a problem hiding this comment.
Curious, what does the 2 in PoolFlushed 2 mean?
There was a problem hiding this comment.
The text in quotes is just description and used for easier tracking the exact spot of failure in case we miss a specific observation.
So in this case it is just to make sure that it is easy to identify which specific PoolFlushed observation failed (the second one).
There was a problem hiding this comment.
Ah, so the msg in waitForObs can be anything really?
I might have misunderstood the function, I was thinking the msg needed to match the constructor name of the observation.
postgrest/test/observability/ObsHelper.hs
Lines 194 to 195 in 0e820b9
There was a problem hiding this comment.
Ah, so the
msginwaitForObscan be anything really?
Yes.
I might have misunderstood the function, I was thinking the
msgneeded to match the constructor name of the observation.
Constructor names and msg have different purpose:
msg is to distinguish between multiple expectations, eg:
waitFor (2 * sec) "PoolFlushed 1" $ \x -> [ o | o@PoolFlushed <- pure x ]
waitFor (2 * sec) "PoolFlushed 2" $ \x -> [ o | o@PoolFlushed <- pure x ]
waitFor (2 * sec) "PoolFlushed 3" $ \x -> [ o | o@PoolFlushed <- pure x ]
Upon failure, without msg, you would get a message saying "Waiting for an observation failed" and you wouldn't be able to tell where exactly the failure happened (line numbers from HasCallstack might help but they are not always precise enough in Haskell).
Constructor names are used to produce output with all observations that were produced until failure (so that it is easier to diagnose the test failures), along the lines of:
Failed waiting for "PoolFlushed 2" observation.
Received observations were:
SchemaCacheQueried
OtherObservationConstructorName
SchemaCacheLoaded
Looking at this and the test code above, you know one "PoolFlushed" was received (msg is "PoolFlushed 2") and after that 3 observations were received but none of them was the one we expected.
steve-chavez
left a comment
There was a problem hiding this comment.
LGTM 👍. Just needs to correct the CHANGELOG entry as per the above feedback.
Neither can I. Got the same error: Edit: Tried approving and merge again. Didn't work. |
Hmm... I've rebased it on top of current |
Just tried it and getting the same error again. |
|
Same for me too 😞 Maybe try opening another PR? |
|
Manually checking the log for this PR: commit adf6802c85c3f31bf882ee7a5a232c79dcb520f7 (HEAD -> pool-flush-obs)
Merge: db998f3c 8d5b1018
Author: Michał Kłeczek
Date: Wed Feb 25 18:36:03 2026 +0100
add: log pool flushes
Emit a dedicated PoolFlushed observation when the DB pool is released during schema cache reload.
commit db998f3c4ff0fe3669ff22450fd4791f07206615
Author: Michał Kłeczek
Date: Wed Feb 25 06:23:18 2026 +0100
test(refactor): make debug output testing more robust
This change introduces a match_log function, that implements lenient matching of log output lines. Tests of log output are retrofited to use the function.
The top commit is a merge commit (it has two parent commits). I think that might be the reason that the rebase is getting rejected. |
Ahh, good catch... That seems an artifact of my use of https://github.com/jj-vcs/jj to manage stacked PRs. Fixed. @steve-chavez @laurenceisla - sorry for confusion, will pay more attention to this in the future. |
This change introduces a match_log function, that implements lenient matching of log output lines. Tests of log output are retrofited to use the function.
Emit a dedicated PoolFlushed observation when the DB pool is released during schema cache reload.

DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.Emit a dedicated PoolFlushed observation when the DB pool is released during schema cache reload.