-
Notifications
You must be signed in to change notification settings - Fork 32
Use fetch_indicators from python library for search #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @keyurva, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the indicator search mechanism within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively introduces a new method for fetching indicators using the datacommons-client library, controlled by a feature flag. This is a solid strategy for migrating to a new API. The changes are well-structured and include a good test case for the new flag. My review includes a few suggestions for improvement: ensuring API response consistency by adding a potentially missing description field, refining exception handling to be more specific for better error diagnostics and to prevent sensitive data exposure, and clarifying an apparent removal of Python 3.12 support in the project's metadata.
| results.append( | ||
| { | ||
| "SV": candidate.dcid, | ||
| "CosineScore": score, | ||
| "alternate_descriptions": [sentence] | ||
| if sentence | ||
| else [], | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dictionary being created here for the new _call_fetch_indicators path is missing the "description" key, which was present in the results from the old _transform_search_indicators_to_svs_format method. This will cause result.get("description") in _search_vector to return None, and the final API response will be missing descriptions for variables.
If the sentence from the metadata is intended to be the main description, it should be added here to maintain consistency with the old API path. If not, this change in the response contract should be confirmed as intentional.
| results.append( | |
| { | |
| "SV": candidate.dcid, | |
| "CosineScore": score, | |
| "alternate_descriptions": [sentence] | |
| if sentence | |
| else [], | |
| } | |
| ) | |
| results.append( | |
| { | |
| "SV": candidate.dcid, | |
| "CosineScore": score, | |
| "description": sentence, | |
| "alternate_descriptions": [sentence] | |
| if sentence | |
| else [], | |
| } | |
| ) |
| ) | ||
| results_map[query] = results | ||
|
|
||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching a broad Exception can hide bugs and make debugging difficult. If the datacommons-client library defines more specific exceptions for API or network errors, it would be better to catch those explicitly. A general except Exception: could then be used as a fallback for unexpected errors. This would improve error handling and observability. Additionally, when logging exceptions, avoid logging the entire exception object directly as it may contain sensitive information. Instead, log only the exception type or a sanitized message.
For example:
try:
# ...
except datacommons.client.DCException as e:
logger.error("API error calling fetch_indicators: %s", type(e).__name__)
except requests.exceptions.RequestException as e:
logger.error("Network error calling fetch_indicators: %s", type(e).__name__)
except Exception as e:
logger.error(
"Unexpected error calling fetch_indicators for queries '%s': %s", queries, type(e).__name__
)(Assuming datacommons.client.DCException and requests.exceptions.RequestException are relevant exceptions)
References
- Do not log the entire exception object directly, as it may contain sensitive information like API keys. Instead, log only the exception type or a sanitized message to prevent sensitive data exposure.
| "License :: OSI Approved :: Apache Software License", | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3.11", | ||
| "Programming Language :: Python :: 3.12", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
clincoln8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Keyur! A couple nonblocking nits/suggestions.
|
|
||
| use_search_indicators: bool = Field( | ||
| default=False, | ||
| alias="USE_SEARCH_INDICATORS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we maintain the DC_ prefix for this envvar?
| alias="USE_SEARCH_INDICATORS", | |
| alias="DC_USE_SEARCH_INDICATORS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Done.
|
|
||
| def _call_fetch_indicators(self, queries: list[str]) -> dict: | ||
| """ | ||
| Helper method to call the new datacommons-client fetch_indicators method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| Helper method to call the new datacommons-client fetch_indicators method. | |
| Helper method to call the datacommons-client fetch_indicators and transform the response. | |
| Returns: | |
| dict[str, list[dict]]: A mapping where each key is the requested query and the | |
| value is a list of candidate matches. Each candidate dict contains: | |
| - 'SV' (str): The identifier (DCID) of the matched entity. | |
| - 'CosineScore' (float): The similarity score for the match. | |
| - 'alternate_descriptions' (list[str]): A list containing the | |
| contextual sentence if available, otherwise an empty list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| "License :: OSI Approved :: Apache Software License", | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3.11", | ||
| "Programming Language :: Python :: 3.12", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| requires = ["setuptools"] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [dependency-groups] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this used? should we just put that in the project.optional-dependencies.test group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should. Done.
| use_search_indicators: bool = Field( | ||
| default=False, | ||
| alias="USE_SEARCH_INDICATORS", | ||
| description="Whether to use the legacy search-indicators endpoint logic (True) or the new client library logic (False).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| description="Whether to use the legacy search-indicators endpoint logic (True) or the new client library logic (False).", | |
| description="Whether to use the legacy search-indicators endpoint (True) or the client library (False) for fetching indicators.", |
keyurva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
|
|
||
| def _call_fetch_indicators(self, queries: list[str]) -> dict: | ||
| """ | ||
| Helper method to call the new datacommons-client fetch_indicators method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| use_search_indicators: bool = Field( | ||
| default=False, | ||
| alias="USE_SEARCH_INDICATORS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Done.
| "License :: OSI Approved :: Apache Software License", | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3.11", | ||
| "Programming Language :: Python :: 3.12", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| requires = ["setuptools"] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [dependency-groups] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should. Done.
DCClientto use the newfetch_indicatorsmethod from the python library.USE_SEARCH_INDICATORSflag (default:False) to toggle between the new client method and thesearch-indicatorsendpoint.search-indicatorscompletely in the next few weeks.