-
Notifications
You must be signed in to change notification settings - Fork 130
feat(sdmx): Add utilities to discover SDMX data #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a data discovery feature to the SDMX client and CLI, allowing users to list, search, and view details of dataflows. Includes unit tests and sample scripts for OECD and Eurostat.
Summary of ChangesHello @SandeepTuniki, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant enhancements to the SDMX client and command-line interface by adding robust data discovery capabilities. Users can now easily explore available dataflows from various SDMX endpoints, search for specific datasets, and view their detailed metadata, streamlining the process of identifying and understanding statistical data sources. This feature aims to make the SDMX client more user-friendly and efficient for data exploration. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a data discovery feature to the SDMX client and CLI, enabling users to list, search, and view details of dataflows. It includes new sample scripts for OECD and Eurostat and adds a new command to the CLI. The changes involve adding new files for the sample scripts and tests, and modifying the sdmx_cli.py and sdmx_client.py files to implement the new functionality. The code appears well-structured and includes error handling and logging.
Addresses review comments by adding a null check in the sdmx client and replacing print statements with logging in the cli.
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a useful data discovery feature to the SDMX client and CLI. The implementation is mostly solid, with new methods for listing, searching, and getting details for dataflows, along with corresponding unit tests and sample scripts.
I've identified a bug in get_dataflow_details where it incorrectly handles the response from the pandaSDMX library, causing it to fail for valid dataflows. The associated unit tests are also a bit misleading and don't catch this bug, so I've suggested improvements for them as well. Additionally, I've recommended a change to how the discovered dataflows are displayed in the CLI for better user experience.
Overall, these are great additions. Once the feedback is addressed, this will be a strong feature.
| sdmx.add_source(custom_source, override=True) | ||
| return sdmx.Client(source_id) | ||
|
|
||
| def list_dataflows(self) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a util to search/dump across series within a dataflow.
…actored `get_dataflow_details` to robustly handle both DataFrame and Series return types from `sdmx.to_pandas` and to gracefully catch `HTTPError` for non-existent dataflows. Updated `handle_discover_dataflows` to print DataFrame output to stdout. Adjusted `sdmx_client_test.py` to reflect these changes with accurate mocks and imports.
|
Re-created this PR in #1698. Closing this. |
Adds a data discovery feature to the SDMX client and CLI, allowing users to list, search, and view details of dataflows. Includes unit tests and sample scripts for OECD and Eurostat.