fix: remove package fuzzyset in favor of fuzzystrmatch-pg by taimoorzaeem · Pull Request #4837 · PostgREST/postgrest

taimoorzaeem · 2026-04-22T09:05:13Z

WIP. Remove package fuzzyset in favor of fuzzystrmatch-pg. Closes #4686.

Remove fuzzyset and replace with fuzzystrmatch-pg
Update/add tests
Compare loadtest results
Update docs (if any)
Add changelog (add details if breaking)

taimoorzaeem · 2026-04-23T07:50:08Z

The slight performance drop in loadtest(errors) happened due to this request, which previously didn't generate hint, but now it does.

postgrest/test/load/errors.http

Lines 9 to 11 in de4ad62

    
           # Misspelled function names 
        
           GET http://postgrest/rpc/call_em_x?name=John 
        
           Prefer: tx=commit

WIP Signed-off-by: Taimoor Zaeem <taimoorzaeem@gmail.com>

steve-chavez · 2026-04-23T17:18:31Z

rate 3966.4539316152445 2903.213981532442 3839.1671719650526
https://github.com/PostgREST/postgrest/actions/runs/24824073137?pr=4837#summary-72655504410

Hm, looks like perf loss is ~26% while we also use more CPU and MEM.

which previously didn't generate hint, but now it does.

🤔 Could you summarize why? Would it be possible to generate a hint with the current implementation to compare more accurately?

steve-chavez · 2026-04-23T17:21:24Z

+    -- |
+    -- Check Levenshtein Distance and return hint lower than max distance
+    checkLevenshteinDistance :: Text -> [Text] -> Int -> Maybe Text
+    checkLevenshteinDistance _ [] _ = Nothing
+    checkLevenshteinDistance identName (suggest:suggests) dist =
+      case Fuzzy.levenshteinLessEqual identName suggest dist of
+        Just _  -> Just suggest
+        Nothing -> checkLevenshteinDistance identName suggests dist
+
+    -- |
+    -- Check Levenshtein Distance and return hint with minimum distance
+    checkMinimumLevenshteinDistance :: Text -> [Text] -> Maybe Text -> Int -> Maybe Text
+    checkMinimumLevenshteinDistance _ [] Nothing _ = Nothing
+    checkMinimumLevenshteinDistance _ [] (Just suggest) _ = Just suggest
+    checkMinimumLevenshteinDistance identName (suggest:suggests) currentSuggest minDist =
+      let dist = Fuzzy.levenshtein identName suggest
+      in if dist < minDist
+          then
+            if dist == 0 && hintType == HintRelChildren -- Do not give suggestion if the child is found in the relations (dist = 0)
+              then checkMinimumLevenshteinDistance identName suggests currentSuggest minDist -- Go with current suggestion
+              else checkMinimumLevenshteinDistance identName suggests (Just suggest) dist -- Update suggestion
+          else checkMinimumLevenshteinDistance identName suggests currentSuggest minDist -- Go with current suggestion


Could the library somehow include this logic? It seems a bit long and related to the algorithm.

taimoorzaeem · 2026-04-23T18:05:52Z

🤔 Could you summarize why? Would it be possible to generate a hint with the current implementation to compare more accurately?

Yeah, I think first off all we should modify this target request to generate error with hint, the comparison would be more clear after that.

In hindsight, I think it used to generate the hint, but after #4684, it doesn't and went unnoticed.

wolfgangwalther · 2026-04-24T10:40:18Z

I'm not sure I understand why this is created as a separate library, if its only use is for PostgREST and it will have to be maintained by the team anyway. Why not have the code in this repo directly, if we're not using an existing library?

mkleczek · 2026-04-24T16:07:22Z

  | NoRpc Text Text [Text] MediaType Bool [QualifiedIdentifier] [Routine]
  | ColumnNotFound Text Text
-  | TableNotFound Text Text SchemaCache
+  | TableNotFound Text Text [Table]


See our discussion here:
https://github.com/PostgREST/postgrest/pull/4472/changes#r2596118601

We decided to have SchemaCache here to avoid constant back-and-forth API changes every time the hint calculation algorithm changes (which this edit confirms is a problem :) ).

I'd leave SchemaCache here. There would be no changes in Response and Plan needed - less code churn.

mkleczek · 2026-04-24T16:31:57Z

rate 3966.4539316152445 2903.213981532442 3839.1671719650526
https://github.com/PostgREST/postgrest/actions/runs/24824073137?pr=4837#summary-72655504410

Hm, looks like perf loss is ~26% while we also use more CPU and MEM.

which previously didn't generate hint, but now it does.

🤔 Could you summarize why? Would it be possible to generate a hint with the current implementation to compare more accurately?

Code in main does not calculate any hint for schemas with more than 500 tables (we don't populate the FuzzySet in these cases). Code in errors.sql creates 20000 tables. Code that doesn't do anything cannot be slower than code that linearly processes 20000 entries...

@taimoorzaeem - I understand the algorithm to find a hint is linear in the size of the tables list?

I suspect there is something off with our tests right now and IMHO we need to make sure we test this functionality correctly before doing any changes to the implementation.

mkleczek · 2026-04-24T16:37:30Z

I'm not sure I understand why this is created as a separate library, if its only use is for PostgREST and it will have to be maintained by the team anyway. Why not have the code in this repo directly, if we're not using an existing library?

Well... @taimoorzaeem decided to implement a library that he wishes to maintain as an OSS project (which he's a sole copyright owner from what I read in the LICENSE file).
He also thinks this library is a better fit for PostgREST than the one we're using right now.

I don't think that's any different from any other third-party dependency we have (other than @taimoorzaeem is also a contributor to PostgREST).

taimoorzaeem · 2026-04-24T17:10:38Z

I opened this PR to compare with the loadtest results. Obviously it's in no shape for merging as yet. I recently found out about more ways to optimize the algorithm using BK-trees which I might try. I usually try to work on this library slowly in separate time.

I'm not sure I understand why this is created as a separate library, [..] Why not have the code in this repo directly, if we're not using an existing library?

I think this much code would be very invasive in PostgREST source and that it should be maintained in a separate repo. Maybe in the future we can add more algorithms from https://www.postgresql.org/docs/current/fuzzystrmatch.html. I also think that as a package it might be useful for others in the open source community (although I am skeptical that happens anytime soon).

if its only use is for PostgREST and it will have to be maintained by the team anyway.

How about we move this repo under PostgREST/ organization later after the implementation gets merged? That way the whole team can access it.

wolfgangwalther · 2026-04-24T19:48:30Z

Well... @taimoorzaeem decided to implement a library that he wishes to maintain as an OSS project (which he's a sole copyright owner from what I read in the LICENSE file). He also thinks this library is a better fit for PostgREST than the one we're using right now.

I don't think that's any different from any other third-party dependency we have (other than @taimoorzaeem is also a contributor to PostgREST).

Well, moving from an unmaintained library to a just released new library with a single contributor and no track record of ongoing maintenance (no offense, Taimoor!) does not look like the right move from the project's perspective.

But that's not my primary concern. My primary concern is, that we have other such libraries already in place:

configurator-pg, which was eventually moved to the PostgREST org
hasql-notifications, which has taken a tiny bit of traction I believe, because it is used by one other project (ihp) and not just PostgREST, but was originally developed for PostgREST, too (I might be wrong, I'm judging by the author of the lib actually :D)

We already have overhead and friction, because we need to keep these packages building alongside our own, but can't move as freely as we could otherwise.

From practical experience, I don't see any advantage by having configurator-pg as a separate package, it actually just limits our ability to iterate on config-related work faster.

How about we move this repo under PostgREST/ organization later after the implementation gets merged? That way the whole team can access it.

To be clear, my primary concern is not who is maintaining it, but whether it's mono-repo or a separate package.

mkleczek · 2026-04-25T06:29:56Z

I don't think that's any different from any other third-party dependency we have (other than @taimoorzaeem is also a contributor to PostgREST).

Well, moving from an unmaintained library to a just released new library with a single contributor and no track record of ongoing maintenance (no offense, Taimoor!) does not look like the right move from the project's perspective.

I agree.

From practical experience, I don't see any advantage by having configurator-pg as a separate package, it actually just limits our ability to iterate on config-related work faster.

I would go in the opposite direction - reuse established configuration language and libraries from the market instead of re-inventing something PostgREST specific (see also below).

To be clear, my primary concern is not who is maintaining it, but whether it's mono-repo or a separate package.

The flip side being the growing amount of non PostgREST specific code needed to maintain in PostgREST. Including some generic libraries (like Levenshtein distance implementation or sieve cache) in PostgREST incurs long term costs and risks.

I'm afraid it is trade-offs all the way down - there is a natural tension here, especially in case of Haskell with its weak library ecosystem comparing to, let's say, Java. It needs good judgment on a case by case basis. PostgREST being one of the most prolific Haskell software is a good driver for new libraries and it would be a shame if all the code we implement is PostgREST specific.

wolfgangwalther · 2026-04-27T09:24:39Z

The flip side being the growing amount of non PostgREST specific code needed to maintain in PostgREST.

Sure, but this argument only works when other, well maintained options are available. Which was clearly not the case for fuzzyset. And was not the case for configurator-ng, so now we have configurator-pg.

The point is: The amount of code to maintain for us is the same, if we're effectively the only users of these libraries. So we might as well make it easier for us to maintain them.

But yes, I agree: If we can use a well maintained and well used library from hackage, that's even better.

I would go in the opposite direction - reuse established configuration language and libraries from the market instead of re-inventing something PostgREST specific (see also below).

This has been suggested before, but we were reluctant to make that breaking change, so far.

steve-chavez · 2026-04-27T15:09:46Z

I would go in the opposite direction - reuse established configuration language and libraries from the market instead of re-inventing something PostgREST specific (see also below).

This has been suggested before, but we were reluctant to make that breaking change, so far.

Discussed on #3101 (comment).

taimoorzaeem marked this pull request as draft April 22, 2026 09:11

fix: remove package fuzzyset in favor of fuzzystrmatch-pg

0ff70ab

WIP Signed-off-by: Taimoor Zaeem <taimoorzaeem@gmail.com>

taimoorzaeem force-pushed the remove/fuzzyset branch from f4ca00a to 0ff70ab Compare April 23, 2026 08:03

steve-chavez reviewed Apr 23, 2026

View reviewed changes

taimoorzaeem mentioned this pull request Apr 23, 2026

test(load): update error loadtest endpoint to generate hint #4839

Merged

mkleczek reviewed Apr 24, 2026

View reviewed changes

Uh oh!

Conversation

taimoorzaeem commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taimoorzaeem commented Apr 23, 2026

Uh oh!

steve-chavez commented Apr 23, 2026

Uh oh!

steve-chavez Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taimoorzaeem commented Apr 23, 2026

Uh oh!

wolfgangwalther commented Apr 24, 2026

Uh oh!

mkleczek Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleczek commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taimoorzaeem commented Apr 24, 2026

Uh oh!

wolfgangwalther commented Apr 24, 2026

Uh oh!

mkleczek commented Apr 25, 2026

Uh oh!

wolfgangwalther commented Apr 27, 2026

Uh oh!

steve-chavez commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

taimoorzaeem commented Apr 22, 2026 •

edited

Loading

steve-chavez Apr 23, 2026 •

edited

Loading

mkleczek Apr 24, 2026 •

edited

Loading

mkleczek commented Apr 24, 2026 •

edited

Loading

mkleczek commented Apr 24, 2026 •

edited

Loading