Merged
Conversation
This reverts commit 2e71414.
…ix-duckdb_engine Made-with: Cursor
…ularExtractParams
- Replace connection_string field with connector: Optional[SQLDatabase] in TabularExtractParams
- Add arbitrary_types_allowed=True to TabularExtractParams model_config for Pydantic v2
- Update extract_tabular_db_data to read connector from params; return {} when absent
- Guard store_relational_db_in_neo4j against empty data dict
- Update connectors/__init__.py to remove DuckDB export after move to tabular-dev-tools
- Update debug_ingest.py to instantiate DuckDB connector and pass via TabularExtractParams
- Fix test_tabular_pipeline.py: _DummyDuckDB now subclasses SQLDatabase; tests updated
- Update setup_spider2.py and duckdb.py docstring import examples
- Add langchain-nvidia-ai-endpoints import and pyproject.toml platform env for macOS ARM
Made-with: Cursor
… folder - Move connectors/sql_database.py → tabular_data/sql_database.py - Delete connectors/__init__.py and the now-empty connectors/ package - Update all imports to nemo_retriever.tabular_data.sql_database Made-with: Cursor
Collaborator
|
seems like this has conflicts now and needs to be rebased. |
…uckdb package The file was named duckdb.py which caused Python to import itself instead of the installed duckdb package, resulting in AttributeError on duckdb.connect(). Also changed default connection mode to read_only=True to prevent exclusive file lock conflicts when multiple processes access the same .duckdb file. Made-with: Cursor
…cture
- Accept deletion of ingest_modes/batch.py (removed in main as part of the
architecture migration away from the legacy ingestor pattern).
- Update TabularIngestOperator:
* _BatchEmbedActor import moved from ingest_modes.batch → text_embed.operators
* _LanceDBWriteActor (gone in main) replaced with handle_lancedb() from
vector_store.lancedb_store, materialising the Ray Dataset before writing.
Made-with: Cursor
_build_lancedb_rows_from_df expects a list[dict]; iterating a raw DataFrame yields column names (strings), causing AttributeError on row.get(). Made-with: Cursor
jperez999
requested changes
Apr 6, 2026
nemo_retriever/src/nemo_retriever/graph/tabular_ingest_operator.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/graph/tabular_ingest_operator.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/graph/tabular_ingest_operator.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/graph/tabular_ingest_operator.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/graph/tabular_ingest_operator.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/graph/tabular_ingest_operator.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/graph/model/node.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/neo4j/neo4j_connection.py
Show resolved
Hide resolved
Remove TabularDescriptionParams, TabularFetchParams, TabularSemanticLayerParams, and TabularUsageWeightsParams; inline hardcoded keys in fetch_tabular_embedding_dataframe. Made-with: Cursor
Split into _TabularIngestBase, TabularIngestCPUActor, and a graph-facing TabularIngestOperator(ArchetypeOperator) that always resolves to the CPU variant, since GPU embedding is handled internally via Ray actor pools. Made-with: Cursor
Made-with: Cursor
…erators Remove the monolithic TabularIngestOperator and replace it with two focused operators each in their own file: - TabularExtractOp: extracts DB schema and writes to Neo4j, returns DataFrame - FetchTabularEmbeddingDfOp: fetches entity descriptions from Neo4j Users now compose these with the existing _BatchEmbedActor to build their own graph; LanceDB write is handled outside the graph (matching the unstructured pipeline pattern). Made-with: Cursor
TabularExtractOp -> TabularSchemaExtractOp (tabular_schema_extract_operator.py) FetchTabularEmbeddingDfOp -> TabularFetchEmbeddingsOp (tabular_fetch_embeddings_operator.py) Made-with: Cursor
Made-with: Cursor
…nature Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Remove the graph/ wrapper folder and promote its sub-packages (model/, dal/, parsers/, services/, utils.py, indexes.py) directly under ingestion/. Update all import paths accordingly. Made-with: Cursor
The class was already renamed to Neo4jNode; align the filename to match. Update all four import sites accordingly. Made-with: Cursor
jperez999
approved these changes
Apr 7, 2026
Resolved conflicts in ingestor.py and params/__init__.py by keeping both StoreParams (from main) and TabularExtractParams/TextChunkParams (from staging). Made-with: Cursor
nemo_retriever/src/nemo_retriever/tabular_data/retrieval/generate_sql.py
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/dal/db_dal.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/write_to_graph.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/neo4j/neo4j_connection.py
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/neo4j/neo4j_connection.py
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/embeddings.py
Outdated
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/extract_data.py
Show resolved
Hide resolved
- Add SPDX license headers to all new tabular_data Python files - Fix tuple literals in JSON schema description fields (generate_sql.py) - Replace naive datetime.now() with datetime.now(timezone.utc) in populate_fks() - Make num_workers and dialect configurable params in store_relational_db_in_neo4j() - Switch read-only MATCH query from query_write() to query_read() in embeddings.py - Replace print() with logger.error() and re-raise in neo4j_connection.py - Make Neo4j connection lazy (call get_neo4j_conn() per function, not at import time) - Fix unterminated string literals caused by literal newlines in single-quoted strings - Fix empty DataFrame column handling in normalize_tables/normalize_columns - Add early-exit guard in populate_tabular_data() for empty tables_df - Change DuckDB connector default read_only from True to False Made-with: Cursor
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/dal/schemas_dal.py
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/schemas_parser.py
Show resolved
Hide resolved
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/services/schema.py
Show resolved
Hide resolved
- Use datetime.now(timezone.utc) for db_node pulled property in schemas_parser.py - Re-raise exception and use logger.error in add_schema() failure handler Made-with: Cursor
Remove NEO4J_HEAP_INITIAL, NEO4J_HEAP_MAX, and NEO4J_PAGECACHE which have no counterpart in helm/values.yaml, causing Helm/Docker parity gap. Retain NEO4J_PLUGINS for APOC which is required by the ingestion pipeline. Made-with: Cursor
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/dal/utils_dal.py
Show resolved
Hide resolved
Cypher does not permit trailing commas in function arguments; remove them from both apoc.merge.node.eager() calls in add_edges() to prevent a syntax error on every invocation. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Checklist