When running the code conversion for a DataStage XML file that is 125MB it crashes. When talking with Databricks it turns out that there are issues with files that are >100MB.
This is a blocker for large migrations. Would it be possible to add support for files > 100MB?
databricks labs lakebridge transpile datastage --input-source /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml --output-folder /Users/sergio.ballesteros/Code/lakebridge/output/ --debug
sergio.ballesteros@VKMGW54Q6V lakebridge % databricks labs lakebridge transpile datastage --input-source /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml --output-folder /Users/sergio.ballesteros/Code/lakebridge/output/ --debug
10:30:13 Info: start pid=5143 version=0.271.0 args="databricks, labs, lakebridge, transpile, datastage, --input-source, /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml, --output-folder, /Users/sergio.ballesteros/Code/lakebridge/output/, --debug"
10:30:13 Debug: Fetching latest releases for databrickslabs/lakebridge from GitHub API pid=5143
10:30:13 Debug: Loading installed version info from: /Users/sergio.ballesteros/.databricks/labs/lakebridge/state/version.json pid=5143
10:30:13 Debug: Loading login configuration from: /Users/sergio.ballesteros/.databricks/labs/lakebridge/config/login.json pid=5143
10:30:13 Debug: Using workspace-level login profile: DEFAULT pid=5143
10:30:13 Debug: Loading DEFAULT profile from /Users/sergio.ballesteros/.databrickscfg pid=5143 sdk=true
10:30:13 Debug: Resolved login: Config: host=https://adb-2690017451936431.11.azuredatabricks.net, token=***, profile=DEFAULT, config_file=/Users/sergio.ballesteros/.databrickscfg pid=5143 sdk=true
10:30:13 Debug: Passing down environment variables: DATABRICKS_HOST, DATABRICKS_TOKEN pid=5143
10:30:13 Debug: Forwarding subprocess: /Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/bin/python3 /Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py {"command":"transpile","flags":{"catalog-name":"","error-file-path":"","input-source":"/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml","log_level":"debug","output-folder":"/Users/sergio.ballesteros/Code/lakebridge/output/","schema-name":"","skip-validation":"true","source-dialect":"","transpiler-config-path":""},"output_type":""} pid=5143
10:30:13 Debug: starting: /Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/bin/python3 /Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py {"command":"transpile","flags":{"catalog-name":"","error-file-path":"","input-source":"/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml","log_level":"debug","output-folder":"/Users/sergio.ballesteros/Code/lakebridge/output/","schema-name":"","skip-validation":"true","source-dialect":"","transpiler-config-path":""},"output_type":""} pid=5143
10:30:16 DEBUG [d.labs.lakebridge] Leaving DATABRICKS_HOST as-is: https://adb-2690017451936431.11.azuredatabricks.net
10:30:16 DEBUG [databricks.sdk] Loaded from environment
10:30:16 DEBUG [databricks.sdk] Attempting to configure auth: pat
10:30:17 DEBUG [databricks.sdk] GET /api/2.0/preview/scim/v2/Me
< 200 OK
< {
< "active": true,
< "displayName": "Sergio Ballesteros Solanas",
< "emails": [
< {
< "primary": true,
< "type": "work",
< "value": "**REDACTED**"
< }
< ],
< "entitlements": [
< {
< "value": "**REDACTED**"
< },
< "... (1 additional elements)"
< ],
< "externalId": "4601b55a-daa0-4daa-aabc-ab7580d6a240",
< "groups": [
< {
< "$ref": "Groups/1025253049853392",
< "display": "admins",
< "type": "direct",
< "value": "**REDACTED**"
< }
< ],
< "id": "7978958767567807",
< "name": {
< "familyName": "Solanas",
< "givenName": "Sergio Ballesteros"
< },
< "schemas": [
< "urn:ietf:params:scim:schemas:core:2.0:User",
< "... (1 additional elements)"
< ],
< "userName": "sergio.ballesteros@databricks.com"
< }
10:30:17 DEBUG [d.l.blueprint.installation] Loading TranspileConfig from config.yml
10:30:18 DEBUG [databricks.sdk] GET /api/2.0/workspace/export?path=/Users/sergio.ballesteros@databricks.com/.lakebridge/config.yml&direct_download=true
< 200 OK
< [raw stream]
10:30:18 DEBUG [d.labs.lakebridge] Preconfigured transpiler config: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source=None, output_folder='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/transpiled', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:18 DEBUG [d.l.l.contexts.application] Added User-Agent extra cmd=execute-transpile
10:30:18 DEBUG [d.labs.lakebridge] Setting input_source to: '/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml'
10:30:18 DEBUG [d.labs.lakebridge] Setting output_folder to: '/Users/sergio.ballesteros/Code/lakebridge/output/'
10:30:18 DEBUG [d.labs.lakebridge] Setting skip_validation to: True
10:30:18 DEBUG [d.labs.lakebridge] Checking config: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source='/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml', output_folder='/Users/sergio.ballesteros/Code/lakebridge/output/', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:18 DEBUG [d.labs.lakebridge] Using configured source_dialect: 'datastage'
10:30:18 DEBUG [d.labs.lakebridge] Validated config: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source='/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml', output_folder='/Users/sergio.ballesteros/Code/lakebridge/output/', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:18 DEBUG [d.labs.lakebridge] Final configuration for transpilation: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source='/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml', output_folder='/Users/sergio.ballesteros/Code/lakebridge/output/', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:18 DEBUG [d.l.l.contexts.application] Added User-Agent extra transpiler_source_tech=datastage
10:30:18 DEBUG [d.l.l.contexts.application] Added User-Agent extra transpiler_plugin_name=Bladebridge
10:30:19 DEBUG [databricks.sdk] GET /api/2.0/preview/scim/v2/Me
< 200 OK
< {
< "active": true,
< "displayName": "Sergio Ballesteros Solanas",
< "emails": [
< {
< "primary": true,
< "type": "work",
< "value": "**REDACTED**"
< }
< ],
< "entitlements": [
< {
< "value": "**REDACTED**"
< },
< "... (1 additional elements)"
< ],
< "externalId": "4601b55a-daa0-4daa-aabc-ab7580d6a240",
< "groups": [
< {
< "$ref": "Groups/1025253049853392",
< "display": "admins",
< "type": "direct",
< "value": "**REDACTED**"
< }
< ],
< "id": "7978958767567807",
< "name": {
< "familyName": "Solanas",
< "givenName": "Sergio Ballesteros"
< },
< "schemas": [
< "urn:ietf:params:scim:schemas:core:2.0:User",
< "... (1 additional elements)"
< ],
< "userName": "sergio.ballesteros@databricks.com"
< }
10:30:19 DEBUG [d.labs.lakebridge] User: User(active=True, display_name='Sergio Ballesteros Solanas', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sergio.ballesteros@databricks.com')], entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create'), ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-instance-pool-create')], external_id='4601b55a-daa0-4daa-aabc-ab7580d6a240', groups=[ComplexValue(display='admins', primary=None, ref='Groups/1025253049853392', type='direct', value='1025253049853392')], id='7978958767567807', name=Name(family_name='Solanas', given_name='Sergio Ballesteros'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sergio.ballesteros@databricks.com')
10:30:19 DEBUG [d.l.l.contexts.application] Added User-Agent extra cmd=execute-transpile
10:30:19 DEBUG [d.labs.lakebridge] User: User(active=True, display_name='Sergio Ballesteros Solanas', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sergio.ballesteros@databricks.com')], entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create'), ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-instance-pool-create')], external_id='4601b55a-daa0-4daa-aabc-ab7580d6a240', groups=[ComplexValue(display='admins', primary=None, ref='Groups/1025253049853392', type='direct', value='1025253049853392')], id='7978958767567807', name=Name(family_name='Solanas', given_name='Sergio Ballesteros'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sergio.ballesteros@databricks.com')
10:30:19 DEBUG [d.l.l.t.lsp.lsp_engine] Detected virtual environment to use at: /Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/.venv
10:30:19 DEBUG [d.l.l.t.lsp.lsp_engine] Using PATH for launching LSP server: /Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/.venv/bin:/Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/bin:/opt/homebrew/opt/openjdk@17/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Applications/VMware Fusion.app/Contents/Public:/Applications/iTerm.app/Contents/Resources/utilities
10:30:19 DEBUG [d.l.l.t.lsp.lsp_engine] Starting LSP engine: /Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/.venv/bin/python3 ['-m', 'databricks.labs.bladebridge.server', '--log_level=DEBUG'] (cwd=/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib)
10:30:19 DEBUG [d.l.l.t.lsp.lsp_engine] LSP init params: InitializeParams(capabilities=ClientCapabilities(workspace=None, text_document=None, notebook_document=None, window=None, general=None, experimental=None), process_id=5145, client_info=ClientInfo(name='lakebridge', version='0.10.11'), locale=None, root_path=None, root_uri='file:///Users/sergio.ballesteros/Code/lakebridge/input', initialization_options={'remorph': {'source-dialect': 'datastage'}, 'options': {'overrides-file': None, 'target-tech': 'SPARKSQL'}, 'custom': {}}, trace=None, work_done_token=None, workspace_folders=None)
10:30:20 DEBUG [d.l.l.t.lsp.lsp_engine] Registered capability: document/transpileToDatabricks
10:30:20 DEBUG [d.l.l.transpiler.execute] Starting to process input file: /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml
10:30:20 INFO [d.l.l.transpiler.execute] Transpiling file: /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml
10:30:20 DEBUG [d.l.l.transpiler.execute] Started processing file: /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml
10:30:20 DEBUG [d.l.blueprint.paths] XML declaration detected, sniffing further with encoding: us-ascii
10:30:20 DEBUG [d.l.blueprint.paths] XML declaration encoding detected: UTF-8
^C
sergio.ballesteros@VKMGW54Q6V lakebridge % Traceback (most recent call last):
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py", line 719, in <module>
lakebridge()
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/lib/python3.10/site-packages/databricks/labs/blueprint/cli.py", line 187, in __call__
run_main(self._route)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/lib/python3.10/site-packages/databricks/labs/blueprint/entrypoint.py", line 35, in run_main
main(*sys.argv[1:])
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/lib/python3.10/site-packages/databricks/labs/blueprint/cli.py", line 118, in _route
cmd.fn(**kwargs)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py", line 126, in transpile
result = asyncio.run(_transpile(ctx, config, engine))
File "/opt/homebrew/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/homebrew/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/homebrew/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/homebrew/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once
event_list = self._selector.select(timeout)
File "/opt/homebrew/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/selectors.py", line 562, in select
kev_list = self._selector.control(None, max_ev, timeout)
KeyboardInterrupt
sergio.ballesteros@VKMGW54Q6V lakebridge %
sergio.ballesteros@VKMGW54Q6V lakebridge %
sergio.ballesteros@VKMGW54Q6V lakebridge % databricks labs lakebridge transpile datastage --input-source /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml --output-folder /Users/sergio.ballesteros/Code/lakebridge/output/ --debug > logs.txt
10:30:36 Info: start pid=5174 version=0.271.0 args="databricks, labs, lakebridge, transpile, datastage, --input-source, /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml, --output-folder, /Users/sergio.ballesteros/Code/lakebridge/output/, --debug"
10:30:36 Debug: Loading installed version info from: /Users/sergio.ballesteros/.databricks/labs/lakebridge/state/version.json pid=5174
10:30:36 Debug: Loading login configuration from: /Users/sergio.ballesteros/.databricks/labs/lakebridge/config/login.json pid=5174
10:30:36 Debug: Using workspace-level login profile: DEFAULT pid=5174
10:30:36 Debug: Loading DEFAULT profile from /Users/sergio.ballesteros/.databrickscfg pid=5174 sdk=true
10:30:36 Debug: Resolved login: Config: host=https://adb-2690017451936431.11.azuredatabricks.net, token=***, profile=DEFAULT, config_file=/Users/sergio.ballesteros/.databrickscfg pid=5174 sdk=true
10:30:36 Debug: Passing down environment variables: DATABRICKS_HOST, DATABRICKS_TOKEN pid=5174
10:30:36 Debug: Forwarding subprocess: /Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/bin/python3 /Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py {"command":"transpile","flags":{"catalog-name":"","error-file-path":"","input-source":"/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml","log_level":"debug","output-folder":"/Users/sergio.ballesteros/Code/lakebridge/output/","schema-name":"","skip-validation":"true","source-dialect":"","transpiler-config-path":""},"output_type":""} pid=5174
10:30:36 Debug: starting: /Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/bin/python3 /Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py {"command":"transpile","flags":{"catalog-name":"","error-file-path":"","input-source":"/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml","log_level":"debug","output-folder":"/Users/sergio.ballesteros/Code/lakebridge/output/","schema-name":"","skip-validation":"true","source-dialect":"","transpiler-config-path":""},"output_type":""} pid=5174
10:30:37 DEBUG [d.labs.lakebridge] Leaving DATABRICKS_HOST as-is: https://adb-2690017451936431.11.azuredatabricks.net
10:30:37 DEBUG [databricks.sdk] Loaded from environment
10:30:37 DEBUG [databricks.sdk] Attempting to configure auth: pat
10:30:38 DEBUG [databricks.sdk] GET /api/2.0/preview/scim/v2/Me
< 200 OK
< {
< "active": true,
< "displayName": "Sergio Ballesteros Solanas",
< "emails": [
< {
< "primary": true,
< "type": "work",
< "value": "**REDACTED**"
< }
< ],
< "entitlements": [
< {
< "value": "**REDACTED**"
< },
< "... (1 additional elements)"
< ],
< "externalId": "4601b55a-daa0-4daa-aabc-ab7580d6a240",
< "groups": [
< {
< "$ref": "Groups/1025253049853392",
< "display": "admins",
< "type": "direct",
< "value": "**REDACTED**"
< }
< ],
< "id": "7978958767567807",
< "name": {
< "familyName": "Solanas",
< "givenName": "Sergio Ballesteros"
< },
< "schemas": [
< "urn:ietf:params:scim:schemas:core:2.0:User",
< "... (1 additional elements)"
< ],
< "userName": "sergio.ballesteros@databricks.com"
< }
10:30:38 DEBUG [d.l.blueprint.installation] Loading TranspileConfig from config.yml
10:30:39 DEBUG [databricks.sdk] GET /api/2.0/workspace/export?path=/Users/sergio.ballesteros@databricks.com/.lakebridge/config.yml&direct_download=true
< 200 OK
< [raw stream]
10:30:39 DEBUG [d.labs.lakebridge] Preconfigured transpiler config: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source=None, output_folder='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/transpiled', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:39 DEBUG [d.l.l.contexts.application] Added User-Agent extra cmd=execute-transpile
10:30:39 DEBUG [d.labs.lakebridge] Setting input_source to: '/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml'
10:30:39 DEBUG [d.labs.lakebridge] Setting output_folder to: '/Users/sergio.ballesteros/Code/lakebridge/output/'
10:30:39 DEBUG [d.labs.lakebridge] Setting skip_validation to: True
10:30:39 DEBUG [d.labs.lakebridge] Checking config: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source='/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml', output_folder='/Users/sergio.ballesteros/Code/lakebridge/output/', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:39 DEBUG [d.labs.lakebridge] Using configured source_dialect: 'datastage'
10:30:39 DEBUG [d.labs.lakebridge] Validated config: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source='/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml', output_folder='/Users/sergio.ballesteros/Code/lakebridge/output/', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:39 DEBUG [d.labs.lakebridge] Final configuration for transpilation: TranspileConfig(transpiler_config_path='/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/config.yml', source_dialect='datastage', input_source='/Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml', output_folder='/Users/sergio.ballesteros/Code/lakebridge/output/', error_file_path='/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/errors.log', sdk_config=None, skip_validation=True, catalog_name='remorph', schema_name='transpiler', transpiler_options={'overrides-file': None, 'target-tech': 'SPARKSQL'})
10:30:39 DEBUG [d.l.l.contexts.application] Added User-Agent extra transpiler_source_tech=datastage
10:30:39 DEBUG [d.l.l.contexts.application] Added User-Agent extra transpiler_plugin_name=Bladebridge
10:30:40 DEBUG [databricks.sdk] GET /api/2.0/preview/scim/v2/Me
< 200 OK
< {
< "active": true,
< "displayName": "Sergio Ballesteros Solanas",
< "emails": [
< {
< "primary": true,
< "type": "work",
< "value": "**REDACTED**"
< }
< ],
< "entitlements": [
< {
< "value": "**REDACTED**"
< },
< "... (1 additional elements)"
< ],
< "externalId": "4601b55a-daa0-4daa-aabc-ab7580d6a240",
< "groups": [
< {
< "$ref": "Groups/1025253049853392",
< "display": "admins",
< "type": "direct",
< "value": "**REDACTED**"
< }
< ],
< "id": "7978958767567807",
< "name": {
< "familyName": "Solanas",
< "givenName": "Sergio Ballesteros"
< },
< "schemas": [
< "urn:ietf:params:scim:schemas:core:2.0:User",
< "... (1 additional elements)"
< ],
< "userName": "sergio.ballesteros@databricks.com"
< }
10:30:40 DEBUG [d.labs.lakebridge] User: User(active=True, display_name='Sergio Ballesteros Solanas', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sergio.ballesteros@databricks.com')], entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create'), ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-instance-pool-create')], external_id='4601b55a-daa0-4daa-aabc-ab7580d6a240', groups=[ComplexValue(display='admins', primary=None, ref='Groups/1025253049853392', type='direct', value='1025253049853392')], id='7978958767567807', name=Name(family_name='Solanas', given_name='Sergio Ballesteros'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sergio.ballesteros@databricks.com')
10:30:40 DEBUG [d.l.l.contexts.application] Added User-Agent extra cmd=execute-transpile
10:30:40 DEBUG [d.labs.lakebridge] User: User(active=True, display_name='Sergio Ballesteros Solanas', emails=[ComplexValue(display=None, primary=True, ref=None, type='work', value='sergio.ballesteros@databricks.com')], entitlements=[ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-cluster-create'), ComplexValue(display=None, primary=None, ref=None, type=None, value='allow-instance-pool-create')], external_id='4601b55a-daa0-4daa-aabc-ab7580d6a240', groups=[ComplexValue(display='admins', primary=None, ref='Groups/1025253049853392', type='direct', value='1025253049853392')], id='7978958767567807', name=Name(family_name='Solanas', given_name='Sergio Ballesteros'), roles=[], schemas=[<UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_CORE_2_0_USER: 'urn:ietf:params:scim:schemas:core:2.0:User'>, <UserSchema.URN_IETF_PARAMS_SCIM_SCHEMAS_EXTENSION_WORKSPACE_2_0_USER: 'urn:ietf:params:scim:schemas:extension:workspace:2.0:User'>], user_name='sergio.ballesteros@databricks.com')
10:30:40 DEBUG [d.l.l.t.lsp.lsp_engine] Detected virtual environment to use at: /Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/.venv
10:30:40 DEBUG [d.l.l.t.lsp.lsp_engine] Using PATH for launching LSP server: /Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/.venv/bin:/Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/bin:/opt/homebrew/opt/openjdk@17/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Applications/VMware Fusion.app/Contents/Public:/Applications/iTerm.app/Contents/Resources/utilities
10:30:40 DEBUG [d.l.l.t.lsp.lsp_engine] Starting LSP engine: /Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib/.venv/bin/python3 ['-m', 'databricks.labs.bladebridge.server', '--log_level=DEBUG'] (cwd=/Users/sergio.ballesteros/.databricks/labs/remorph-transpilers/bladebridge/lib)
10:30:40 DEBUG [d.l.l.t.lsp.lsp_engine] LSP init params: InitializeParams(capabilities=ClientCapabilities(workspace=None, text_document=None, notebook_document=None, window=None, general=None, experimental=None), process_id=5175, client_info=ClientInfo(name='lakebridge', version='0.10.11'), locale=None, root_path=None, root_uri='file:///Users/sergio.ballesteros/Code/lakebridge/input', initialization_options={'remorph': {'source-dialect': 'datastage'}, 'options': {'overrides-file': None, 'target-tech': 'SPARKSQL'}, 'custom': {}}, trace=None, work_done_token=None, workspace_folders=None)
10:30:40 DEBUG [d.l.l.t.lsp.lsp_engine] Registered capability: document/transpileToDatabricks
10:30:40 DEBUG [d.l.l.transpiler.execute] Starting to process input file: /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml
10:30:40 INFO [d.l.l.transpiler.execute] Transpiling file: /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml
10:30:40 DEBUG [d.l.l.transpiler.execute] Started processing file: /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml
10:30:40 DEBUG [d.l.blueprint.paths] XML declaration detected, sniffing further with encoding: us-ascii
10:30:40 DEBUG [d.l.blueprint.paths] XML declaration encoding detected: UTF-8
11:30:03 ERROR [d.l.lakebridge.transpile] Failed to call transpile
Traceback (most recent call last):
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/state/venv/lib/python3.10/site-packages/databricks/labs/blueprint/cli.py", line 118, in _route
cmd.fn(**kwargs)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py", line 126, in transpile
result = asyncio.run(_transpile(ctx, config, engine))
File "/opt/homebrew/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/homebrew/Cellar/python@3.10/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/cli.py", line 492, in _transpile
status, errors = await do_transpile(ctx.workspace_client, engine, config)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/execute.py", line 336, in transpile
status, errors = await _do_transpile(workspace_client, engine, config)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/execute.py", line 368, in _do_transpile
result = await _process_input_file(config, validator, engine)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/execute.py", line 328, in _process_input_file
no_of_sqls, error_list = await _process_one_file(context)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/execute.py", line 107, in _process_one_file
transpile_result = await _transpile(
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/execute.py", line 417, in _transpile
return await engine.transpile(from_dialect, to_dialect, source_code, input_path)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/lsp/lsp_engine.py", line 590, in transpile
response = await self.transpile_document(file_path)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/lsp/lsp_engine.py", line 608, in transpile_document
result = await self._client.transpile_document_async(params)
File "/Users/sergio.ballesteros/.databricks/labs/lakebridge/lib/src/databricks/labs/lakebridge/transpiler/lsp/lsp_engine.py", line 298, in transpile_document_async
return await self.protocol.send_request_async(TRANSPILE_TO_DATABRICKS_METHOD, params)
pygls.exceptions.JsonRpcInternalError: OSError: [Errno 24] Too many open files: '/var/folders/km/xwj6s6cn3cb_j60j0slv82xh0000gp/T/bladerunner__j0hdtqf/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled/transpiled'
Is there an existing issue for this?
Category of Bug / Issue
Converter bug
Current Behavior
Hi,
When running the code conversion for a DataStage XML file that is 125MB it crashes. When talking with Databricks it turns out that there are issues with files that are >100MB.
This is a blocker for large migrations. Would it be possible to add support for files > 100MB?
Expected Behavior
Convert correctly the code
Steps To Reproduce
databricks labs lakebridge transpile datastage --input-source /Users/sergio.ballesteros/Code/lakebridge/input/FinancialCredit.xml --output-folder /Users/sergio.ballesteros/Code/lakebridge/output/ --debug
Relevant log output or Exception details
Logs Confirmation
--debuglsp-server.logunder USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.logSample Query
Operating System
macOS
Version
latest via Databricks CLI