-
Notifications
You must be signed in to change notification settings - Fork 63
Add a tool to merge several podio files into a single one #681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
0e32711
Add a tool to merge several podio files into a single one
jmcarcell 1037351
Fix format
jmcarcell 6af93d1
Fix format
jmcarcell 0e0fe5c
Improve message
jmcarcell 61fca90
Generate a metadata frame if it doesn't exist
jmcarcell 7efcdee
Format with black
jmcarcell 02c0ebb
Add configuration for the metadata parameter name
jmcarcell aec7f7f
Fix pre-commit
jmcarcell 926812f
Fix pre-commit
jmcarcell 74e55d6
Disable pylint check
jmcarcell 5bd511b
Hardcode the metadata parameters
jmcarcell 5fddc2a
Add a comment
jmcarcell 2ffa8c8
Fix f-string
jmcarcell 977c75b
Fix pre-commit
jmcarcell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| #!/usr/bin/env python3 | ||
| """podio-merge-files tool to merge any number of podio files into one""" | ||
|
|
||
| import argparse | ||
| import sys | ||
| import podio | ||
| import podio.root_io | ||
| from podio import reading | ||
|
|
||
| parser = argparse.ArgumentParser( | ||
| description="Merge any number of podio files into one, can merge TTree and RNTuple files" | ||
| ) | ||
|
|
||
| parser.add_argument("--output-file", help="name of the output file", required=True) | ||
| parser.add_argument("files", nargs="+", help="which files to merge") | ||
| parser.add_argument( | ||
| "--metadata", | ||
| choices=["none", "all", "first"], | ||
| default="first", | ||
| help="metadata to include in the output file, default: " | ||
| "only the one from the first file, other options: all files, none", | ||
| ) | ||
| args = parser.parse_args() | ||
|
|
||
| all_files = set() | ||
| for f in args.files: | ||
| if f in all_files: | ||
| raise ValueError(f"File {f} is present more than once in the input list") | ||
| all_files.add(f) | ||
|
|
||
| ROOT_FORMAT = reading._determine_root_format(args.files[0]) # pylint: disable=protected-access | ||
| if ROOT_FORMAT == reading.RootFileFormat.TTREE: | ||
| reader = podio.root_io.Reader(args.files) | ||
| writer = podio.root_io.Writer(args.output_file) | ||
| elif ROOT_FORMAT == reading.RootFileFormat.RNTUPLE: | ||
| reader = podio.root_io.RNTupleReader(args.files) | ||
| writer = podio.root_io.RNTupleWriter(args.output_file) | ||
| else: | ||
| raise ValueError(f"Input file {args.files[0]} is not a TTree or RNTuple file") | ||
|
|
||
| categories = list(reader.categories) | ||
| is_metadata_available = True # pylint: disable=invalid-name | ||
| try: | ||
| # All frames will be copied as they are except the metadata ones | ||
| categories.remove("metadata") | ||
| except ValueError: | ||
| is_metadata_available = False # pylint: disable=invalid-name | ||
|
|
||
| for category in categories: | ||
| all_frames = reader.get(category) | ||
| for frame in all_frames: | ||
| writer.write_frame(frame, category) | ||
|
|
||
| if args.metadata == "none": | ||
| sys.exit(0) | ||
|
|
||
| if not is_metadata_available: | ||
| print("Warning: metadata category 'metadata' not found in the input files, it will be created") | ||
| all_frames = [podio.Frame()] | ||
| else: | ||
| if args.metadata == "first": | ||
| all_frames = [reader.get("metadata")[0]] | ||
| else: | ||
| all_frames = reader.get("metadata") | ||
| for frame in all_frames: | ||
| frame.put_parameter("MergedInputFiles", args.files) | ||
| writer.write_frame(frame, "metadata") | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.