Migration Script to Add Glific NGO to AI Platform: Onboarding, Credential and Assistant Sync#291
Conversation
WalkthroughA set of new modules and scripts were introduced to support CSV-based migration and onboarding processes, including base and specialized CSV processors, API client utilities, and validators. Entry-point scripts for running onboarding, assistant sync, and credential sync workflows were added. Logging, validation, and error handling are consistently integrated throughout the new code. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script
participant Processor
participant APIClient
participant API
User->>Script: Run entry-point script (e.g., run_onboarding.py)
Script->>Processor: Instantiate with input/output files, API params
Script->>Processor: Call run()
Processor->>Processor: load_csv()
Processor->>Processor: validate_csv(rows)
alt Validation passes
Processor->>Processor: process_rows(rows)
loop For each row
Processor->>APIClient: post(url, data)
APIClient->>API: POST request
API-->>APIClient: Response
APIClient-->>Processor: (success, response)
Processor->>Processor: append_to_csv(result)
end
else Validation fails
Processor->>Processor: Log errors, abort
end
Processor-->>Script: Done
Estimated code review effort4 (~90 minutes) Possibly related issues
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (11)
backend/glific_migration/base_processor.py (2)
5-5: Use built-in types instead of deprecated typing module importsThe code uses deprecated typing imports. Python 3.9+ supports using built-in types directly.
Replace the import and update type annotations:
-from typing import List, Dict +from typing import List, Dict # Can be removed after updating annotations class BaseCSVProcessor(ABC): """Base class for CSV processing with common functionality.""" - def __init__(self, input_file: str, output_file: str, headers: List[str]): + def __init__(self, input_file: str, output_file: str, headers: list[str]): - def load_csv(self) -> List[Dict[str, str]]: + def load_csv(self) -> list[dict[str, str]]: - def append_to_csv(self, row: Dict[str, str]) -> None: + def append_to_csv(self, row: dict[str, str]) -> None: @abstractmethod - def validate_csv(self, rows: List[Dict[str, str]]) -> bool: + def validate_csv(self, rows: list[dict[str, str]]) -> bool: @abstractmethod - def process_rows(self, rows: List[Dict[str, str]]) -> None: + def process_rows(self, rows: list[dict[str, str]]) -> None:Also applies to: 13-13, 46-46, 58-58, 69-69, 74-74
46-56: Consider memory efficiency for large CSV filesLoading the entire CSV into memory might cause issues with large files. Consider implementing a streaming approach for better scalability.
For large files, consider yielding rows instead of loading all at once:
def load_csv_streaming(self) -> Generator[dict[str, str], None, None]: """Stream CSV file row by row.""" try: with open(self.input_file, newline="", encoding="utf-8") as f: reader = csv.DictReader(f) for row in reader: yield row except FileNotFoundError: self.logger.error(f"Input file not found: {self.input_file}") raise except Exception as e: self.logger.error(f"Error reading CSV file {self.input_file}: {str(e)}") raisebackend/glific_migration/client.py (2)
5-5: Update deprecated type annotationsReplace deprecated
typingimports with built-in types for Python 3.9+ compatibility.-from typing import Tuple, Dict, Optional +from typing import Optional
27-27: Update method signature to use modern type annotationsUpdate the type hints to use built-in types instead of deprecated typing module types.
- def post(self, url: str, data: Optional[Dict] = None) -> Tuple[bool, Dict]: + def post(self, url: str, data: Optional[dict] = None) -> tuple[bool, dict]:backend/glific_migration/sync_credentials/processor.py (1)
2-2: Remove unused import and update deprecated type annotationsThe
Settype is imported but never used. Also update deprecated type annotations.-from typing import List, Dict, Set +from typing import List, DictThen update the type annotations in method signatures:
- def validate_csv(self, rows: List[Dict[str, str]]) -> bool: + def validate_csv(self, rows: list[dict[str, str]]) -> bool:- def process_rows(self, rows: List[Dict[str, str]]) -> None: + def process_rows(self, rows: list[dict[str, str]]) -> None:backend/glific_migration/organization_onboarding/processor.py (1)
2-2: Remove unused import and update deprecated type annotationsThe
Settype is imported but never used. Also update deprecated type annotations.-from typing import List, Dict, Set +from typing import List, DictThen update the type annotations in method signatures to use built-in types.
backend/glific_migration/validator.py (2)
2-2: Update deprecated type annotationsReplace deprecated typing imports with built-in types for better compatibility.
-from typing import List, Dict, Tuple, Set +from typing import List, Dict, Tuple, SetThen update the function signatures:
-def validate_required_fields(row: Dict[str, str], fields: Set[str]) -> List[str]: +def validate_required_fields(row: dict[str, str], fields: set[str]) -> list[str]:-def validate_email_format(email: str) -> Tuple[bool, str]: +def validate_email_format(email: str) -> tuple[bool, str]:Also applies to: 6-6, 11-11
20-22: Consider additional password validation rulesThe current validation only checks minimum length. Consider if additional rules are needed such as complexity requirements.
If the API requires specific password complexity rules, I can help implement additional validation checks.
backend/glific_migration/sync_assistant/processor.py (3)
2-2: Remove unused import and update deprecated type annotationsThe
Settype is imported but never used. Also update deprecated type annotations.-from typing import List, Dict, Set +from typing import List, Dict
36-40: Remove redundant empty value checksThe
validate_required_fieldsfunction already checks for empty values after stripping whitespace, making these explicit checks redundant.- if not row.get("assistant_id", "").strip(): - row_errors.append("Empty assistant_id") - - if not row.get("api_key", "").strip(): - row_errors.append("Empty api_key")
71-71: Consider URL-encoding the assistant_id parameterWhile the assistant_id is validated to contain only alphanumeric characters, it's a best practice to URL-encode path parameters.
+from urllib.parse import quote + url = f"{self.base_url}/assistant/{assistant_id}/ingest" + # Or use: url = f"{self.base_url}/assistant/{quote(assistant_id, safe='')}/ingest"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
backend/glific_migration/organization_onboarding/sample_input.csvis excluded by!**/*.csvbackend/glific_migration/sync_assistant/sample_input.csvis excluded by!**/*.csvbackend/glific_migration/sync_credentials/sample_input.csvis excluded by!**/*.csv
📒 Files selected for processing (10)
backend/.gitignore(1 hunks)backend/glific_migration/base_processor.py(1 hunks)backend/glific_migration/client.py(1 hunks)backend/glific_migration/organization_onboarding/processor.py(1 hunks)backend/glific_migration/organization_onboarding/run_onboarding.py(1 hunks)backend/glific_migration/sync_assistant/processor.py(1 hunks)backend/glific_migration/sync_assistant/sync_assistant.py(1 hunks)backend/glific_migration/sync_credentials/processor.py(1 hunks)backend/glific_migration/sync_credentials/run_credentials.py(1 hunks)backend/glific_migration/validator.py(1 hunks)
🧬 Code Graph Analysis (1)
backend/glific_migration/sync_credentials/processor.py (3)
backend/glific_migration/base_processor.py (4)
BaseCSVProcessor(10-90)validate_csv(69-71)process_rows(74-76)append_to_csv(58-66)backend/glific_migration/client.py (2)
APIClient(10-50)post(27-50)backend/glific_migration/validator.py (2)
validate_required_fields(6-8)is_valid_api_key(25-31)
🪛 Ruff (0.12.2)
backend/glific_migration/client.py
5-5: typing.Tuple is deprecated, use tuple instead
(UP035)
5-5: typing.Dict is deprecated, use dict instead
(UP035)
27-27: Use X | None for type annotations
Convert to X | None
(UP045)
27-27: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
27-27: Use tuple instead of Tuple for type annotation
Replace with tuple
(UP006)
27-27: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/organization_onboarding/processor.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
2-2: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
41-41: Use list instead of List for type annotation
Replace with list
(UP006)
41-41: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
78-78: Use list instead of List for type annotation
Replace with list
(UP006)
78-78: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/validator.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Tuple is deprecated, use tuple instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
6-6: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
6-6: Use set instead of Set for type annotation
Replace with set
(UP006)
6-6: Use list instead of List for type annotation
Replace with list
(UP006)
11-11: Use tuple instead of Tuple for type annotation
Replace with tuple
(UP006)
backend/glific_migration/sync_credentials/processor.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
2-2: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
29-29: Use list instead of List for type annotation
Replace with list
(UP006)
29-29: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
63-63: Use list instead of List for type annotation
Replace with list
(UP006)
63-63: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/sync_assistant/processor.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
2-2: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
25-25: Use list instead of List for type annotation
Replace with list
(UP006)
25-25: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
62-62: Use list instead of List for type annotation
Replace with list
(UP006)
62-62: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/base_processor.py
5-5: typing.List is deprecated, use list instead
(UP035)
5-5: typing.Dict is deprecated, use dict instead
(UP035)
13-13: Use list instead of List for type annotation
Replace with list
(UP006)
46-46: Use list instead of List for type annotation
Replace with list
(UP006)
46-46: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
58-58: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
69-69: Use list instead of List for type annotation
Replace with list
(UP006)
69-69: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
74-74: Use list instead of List for type annotation
Replace with list
(UP006)
74-74: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/glific_migration/sync_credentials/processor.py (3)
backend/glific_migration/base_processor.py (4)
BaseCSVProcessor(10-90)validate_csv(69-71)process_rows(74-76)append_to_csv(58-66)backend/glific_migration/client.py (2)
APIClient(10-50)post(27-50)backend/glific_migration/validator.py (2)
validate_required_fields(6-8)is_valid_api_key(25-31)
🪛 Ruff (0.12.2)
backend/glific_migration/client.py
5-5: typing.Tuple is deprecated, use tuple instead
(UP035)
5-5: typing.Dict is deprecated, use dict instead
(UP035)
27-27: Use X | None for type annotations
Convert to X | None
(UP045)
27-27: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
27-27: Use tuple instead of Tuple for type annotation
Replace with tuple
(UP006)
27-27: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/organization_onboarding/processor.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
2-2: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
41-41: Use list instead of List for type annotation
Replace with list
(UP006)
41-41: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
78-78: Use list instead of List for type annotation
Replace with list
(UP006)
78-78: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/validator.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Tuple is deprecated, use tuple instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
6-6: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
6-6: Use set instead of Set for type annotation
Replace with set
(UP006)
6-6: Use list instead of List for type annotation
Replace with list
(UP006)
11-11: Use tuple instead of Tuple for type annotation
Replace with tuple
(UP006)
backend/glific_migration/sync_credentials/processor.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
2-2: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
29-29: Use list instead of List for type annotation
Replace with list
(UP006)
29-29: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
63-63: Use list instead of List for type annotation
Replace with list
(UP006)
63-63: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/sync_assistant/processor.py
2-2: typing.List is deprecated, use list instead
(UP035)
2-2: typing.Dict is deprecated, use dict instead
(UP035)
2-2: typing.Set is deprecated, use set instead
(UP035)
2-2: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
25-25: Use list instead of List for type annotation
Replace with list
(UP006)
25-25: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
62-62: Use list instead of List for type annotation
Replace with list
(UP006)
62-62: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
backend/glific_migration/base_processor.py
5-5: typing.List is deprecated, use list instead
(UP035)
5-5: typing.Dict is deprecated, use dict instead
(UP035)
13-13: Use list instead of List for type annotation
Replace with list
(UP006)
46-46: Use list instead of List for type annotation
Replace with list
(UP006)
46-46: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
58-58: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
69-69: Use list instead of List for type annotation
Replace with list
(UP006)
69-69: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
74-74: Use list instead of List for type annotation
Replace with list
(UP006)
74-74: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
🔇 Additional comments (3)
backend/.gitignore (1)
9-10: LGTM!The additions properly exclude log files from version control, which aligns with the logging functionality introduced in the migration modules.
backend/glific_migration/client.py (1)
35-47: Fix potential UnboundLocalError in error handlingThe
responsevariable may not be defined when the HTTPError exception is caught, leading to an UnboundLocalError when trying to accessresponse.json().except requests.exceptions.HTTPError as http_err: + error_detail = "No error detail provided." try: - error_detail = response.json().get("error", "No error detail provided.") + if 'response' in locals(): + error_detail = response.json().get("error", "No error detail provided.") except Exception: error_detail = "Unable to parse error response."Likely an incorrect or invalid review comment.
backend/glific_migration/organization_onboarding/processor.py (1)
84-84: Consider filtering sensitive data before sending to APIThe entire row data, including the password field, is being sent to the API endpoint. Ensure that the API endpoint expects and properly handles sensitive fields like passwords.
Consider filtering the data to only include expected fields:
payload = { "organization_name": row.get("organization_name"), "project_name": row.get("project_name"), "email": row.get("email"), "password": row.get("password"), "user_name": row.get("user_name") } success, resp = self.client.post(self.api_url, data=payload)
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
backend/glific_migration/organization_onboarding/run_onboarding.py (1)
9-10: Remove unnecessary mode argument and add error handling.The static analysis tool correctly identifies that the
"r"mode argument is unnecessary since it's the default foropen(). Additionally, the file operations lack proper error handling.Apply this diff:
- with open(base_dir / "../config.json", "r") as file: - config = json.load(file) + try: + with open(base_dir / "../config.json") as file: + config = json.load(file) + except FileNotFoundError: + raise FileNotFoundError("Configuration file 'config.json' not found in parent directory") + except json.JSONDecodeError as e: + raise ValueError(f"Invalid JSON in configuration file: {e}")backend/glific_migration/sync_assistant/sync_assistant.py (1)
9-10: Apply static analysis suggestion to remove unnecessary mode argumentThe
"r"mode argument is unnecessary when opening files for reading as it's the default behavior.- with open(base_dir / "../config.json", "r") as file: + with open(base_dir / "../config.json") as file:backend/glific_migration/sync_credentials/run_credentials.py (1)
9-9: Remove unnecessary file mode argument.The "r" mode is the default when opening files, making this argument redundant.
- with open(base_dir / "../config.json", "r") as file: + with open(base_dir / "../config.json") as file:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/glific_migration/config.json(1 hunks)backend/glific_migration/organization_onboarding/run_onboarding.py(1 hunks)backend/glific_migration/sync_assistant/sync_assistant.py(1 hunks)backend/glific_migration/sync_credentials/run_credentials.py(1 hunks)
🪛 Ruff (0.12.2)
backend/glific_migration/organization_onboarding/run_onboarding.py
9-9: Unnecessary mode argument
Remove mode argument
(UP015)
backend/glific_migration/sync_assistant/sync_assistant.py
9-9: Unnecessary mode argument
Remove mode argument
(UP015)
backend/glific_migration/sync_credentials/run_credentials.py
9-9: Unnecessary mode argument
Remove mode argument
(UP015)
✅ Files skipped from review due to trivial changes (1)
- backend/glific_migration/config.json
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/glific_migration/organization_onboarding/run_onboarding.py
9-9: Unnecessary mode argument
Remove mode argument
(UP015)
backend/glific_migration/sync_assistant/sync_assistant.py
9-9: Unnecessary mode argument
Remove mode argument
(UP015)
backend/glific_migration/sync_credentials/run_credentials.py
9-9: Unnecessary mode argument
Remove mode argument
(UP015)
🔇 Additional comments (6)
backend/glific_migration/organization_onboarding/run_onboarding.py (1)
8-23: Good improvement from hardcoded values to centralized configuration.This implementation addresses the security concerns raised in previous reviews by moving from hardcoded API keys and URLs to a centralized configuration file approach. This enhances both security and maintainability.
backend/glific_migration/sync_assistant/sync_assistant.py (3)
1-3: LGTM: Clean imports structureThe imports are well-organized with standard library imports first, followed by the project-specific import.
8-21: Past review comment has been addressedGood improvement! The hardcoded API URL issue from the previous review has been resolved by reading the configuration from
config.json. This makes the script much more flexible for different environments.
24-25: LGTM: Standard entry point guardThe entry point guard follows Python best practices for making the script executable.
backend/glific_migration/sync_credentials/run_credentials.py (2)
1-6: LGTM!The imports and path setup are well-structured, using appropriate standard library modules and following good practices for relative path resolution.
28-29: LGTM!Standard and correct implementation of the script entry point.
kartpop
left a comment
There was a problem hiding this comment.
Should the directory structure be backend/scripts/glific_migration instead of backend/glific_migration?
That way we can keep all stand-alone progarms/scripts as part of the scripts/ dir
| with open(base_dir / "../config.json", "r") as file: | ||
| config = json.load(file) | ||
|
|
||
| input_file = base_dir / config["organization_onboarding"]["input_csv"] |
There was a problem hiding this comment.
Would be good to avoid magic strings like "organization_onboarding", "input_csv".
Libraries like dynaconf can help; they support key traversing like settings.organization_onboarding.input_csv
|
|
||
| input_file = base_dir / config["organization_onboarding"]["input_csv"] | ||
| output_file = base_dir / config["organization_onboarding"]["output_csv"] | ||
| api_url = config["base_url"] + "/onboard" |
There was a problem hiding this comment.
API endpoints like "/onboard" may also be kept in a string constant somewhere in a common file.
The routes in backend/app/api/routes/onboarding.py can also use the same string constant.
@router.post(
"/onboard",
dependencies=[Depends(get_current_active_superuser)],
response_model=OnboardingResponse,
)
Not an urgent change, can be picked up as part of a separate PR where all such routes, other constant strings are refactored.
| with open(base_dir / "../config.json", "r") as file: | ||
| config = json.load(file) | ||
|
|
||
| input_csv = base_dir / config["assistant_ingest"]["input_csv"] |
There was a problem hiding this comment.
same as for organization_onboarding, can benefit from use of a config management library
This PR introduces a complete, migration utility to onboard Glific NGOs and their bots into the AI Platform. The migration consists of three distinct processors, each responsible for a specific phase:
Target issue #289 and #290
Organization and Project Onboarding
Calls the /onboard endpoint to create the organization, project, and user
Generates and stores API key per project
Credential Synchronization
Assistant Ingestion
Calls /assistant/{assistant_id}/ingest with the corresponding API key
Ingests the assistant into the internal database
Structure & Utilities
Summary by CodeRabbit
New Features
Chores