Skip to content

Conversation

@ArnavBalyan
Copy link
Member

  • Added --conf parameter that accepts user's key=value configuration pairs
  • users can now configure any Avro/Parquet setting via CLI.
  • The conf can be passed any number of times with the user defined values and will be appended to the internally constructed config.
  • Fixes: Providing parquet-avro configuration to parquet-cli #2967

@ArnavBalyan
Copy link
Member Author

cc @gszadovszky @shangxinli could you please review thanks!

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement. Thanks a lot for working on it!

I have two ideas to improve it:

  • What do you think about adding the conf support centrally for every commands by setting it in Main?
  • Maybe adding an additional option that supports reading a config file (and the config pairs at command line can override these values)?

@ArnavBalyan
Copy link
Member Author

This is a great improvement. Thanks a lot for working on it!

I have two ideas to improve it:

  • What do you think about adding the conf support centrally for every commands by setting it in Main?
  • Maybe adding an additional option that supports reading a config file (and the config pairs at command line can override these values)?

Thanks for the review and great suggestions! I'll address them shortly.
Supporting it centrally for every command is a great point, let me check how we can wire it for all commands.
Will add support for the config file also thanks!

@ArnavBalyan
Copy link
Member Author

Added global config support through Main, can I take the file based config feature in a follow up PR or should we add it here @gszadovszky? thanks!

@gszadovszky
Copy link
Contributor

Added global config support through Main, can I take the file based config feature in a follow up PR or should we add it here @gszadovszky? thanks!

@ArnavBalyan, it is up to you. It was just an improvement idea. I'm fine having it in a separate PR.

@ArnavBalyan
Copy link
Member Author

Added global config support through Main, can I take the file based config feature in a follow up PR or should we add it here @gszadovszky? thanks!

@ArnavBalyan, it is up to you. It was just an improvement idea. I'm fine having it in a separate PR.

The file based config would probably be very useful, I can take it up in a follow up PR, will go through more code to understand where it can be best added thanks

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArnavBalyan, unfortunately, we do not have end-to-end tests for parquet-cli. Were you able to validate this manually?

  • Are the conf key-value pairs really reach the related code path at different commands?
  • Is the --conf argument properly described in the help?

It looks good to me otherwise.

@ArnavBalyan
Copy link
Member Author

@ArnavBalyan, unfortunately, we do not have end-to-end tests for parquet-cli. Were you able to validate this manually?

  • Are the conf key-value pairs really reach the related code path at different commands?
  • Is the --conf argument properly described in the help?

It looks good to me otherwise.

Thank you so much for the review, I'll check if we can add some more end to end tests for the CLI. I verified manually that the configs are correctly reaching the other side to the command, example output from one of the manual logs:

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
=== CONVERT COMMAND CONFIGURATION RECEIVED ===
  CONFIG: parquet.avro.write-parquet-uuid = true
  CONFIG: parquet.avro.write-old-list-structure = false
  CONFIG: test.property = test.value
  Total configuration properties available: 925
=== END CONVERT COMMAND CONFIGURATION ===
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.413 s -- in org.apache.parquet

The above configs were passed through the --config option.

It seems to be well described in help, pls let me know if we should add anything else thanks!

@ArnavBalyan
Copy link
Member Author

cc @gszadovszky can this be merged if all looks good? Have raised #3296 to introduce e2e testing thanks!

@gszadovszky
Copy link
Contributor

Thanks for the ping, @ArnavBalyan. I usually don't like merging just when I'm approving to give some time for other comments. But 3 days are more than enough...

@gszadovszky gszadovszky merged commit 8be0dad into apache:master Sep 1, 2025
7 checks passed
@ArnavBalyan
Copy link
Member Author

Thanks for the ping, @ArnavBalyan. I usually don't like merging just when I'm approving to give some time for other comments. But 3 days are more than enough...

Oh got it thanks! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Providing parquet-avro configuration to parquet-cli

2 participants