Skip to content

C API for parsing, deparsing and fingerprinting queries#321

Open
levkk wants to merge 5 commits intopganalyze:17-latestfrom
pgdogdev:17-latest
Open

C API for parsing, deparsing and fingerprinting queries#321
levkk wants to merge 5 commits intopganalyze:17-latestfrom
pgdogdev:17-latest

Conversation

@levkk
Copy link

@levkk levkk commented Jan 2, 2026

Bypasses Protobuf (de)ser and allows bindings to access the libpg_query API via the C ABI. Used in a fork of pg_query.rs to accelerate parse and deparse.

More context: pganalyze/pg_query.rs#72

Note: there is a bit of copy/paste going on here because I wanted to leave the protobuf paths untouched. Let me know if you'd prefer I merge them to re-use more code.

result.query = strdup(str.data);
}
PG_CATCH();
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have MemoryContextSwitchTo here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, that seems like an oversight - CopyErrorData will fail when the context isn't switched.

result.tokens = malloc(sizeof(PgQueryRawScanToken) * token_count);
result.n_tokens = token_count;

// Second pass: fill in token data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't it be done with just a single pass? (using a growable array - eg, start with 64, then realloc 2x when needed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a copy/paste of the existing implementation, sans protobuf conversion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the existing implementation here is lazy, since its cheap enough to just run twice. We could totally modify this to use realloc, but I don't think we need to do it in this PR.

Copy link
Member

@lfittl lfittl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there is a bit of copy/paste going on here because I wanted to leave the protobuf paths untouched. Let me know if you'd prefer I merge them to re-use more code.

Yeah, the copy/paste isn't great, it would be good to reduce duplication. I think we should be good to call a shared "raw" implementation in many of these cases.

pg_query_exit_memory_context((MemoryContext) ctx);
}

/* Node allocation helper */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I see why you added this and the other methods, but I do wonder if this is the right approach (e.g. why not access the include header that has palloc0 directly?)

result.query = strdup(str.data);
}
PG_CATCH();
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, that seems like an oversight - CopyErrorData will fail when the context isn't switched.

#define PG_VERSION "17.7"
#define PG_VERSION_NUM 170007

// Raw parse tree access (bypasses protobuf serialization)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better if we put all of this into pg_query_raw.h, so authors of bindings that do not use the direct access to the AST structs don't get confused which methods to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants