Skip to content

dbt-databricks tries to establish connection to Databricks when running dbt parse #940

Description

@ghjklw

Describe the bug

Running dbt parse should, according to dbt documentation, work in an isolated environment with no Databricks workspace available. dbt parse documentation

Starting in v1.5, dbt parse will write or return a manifest, enabling you to introspect dbt's understanding of all the resources in your project. Since dbt parse doesn't connect to your warehouse, this manifest will not contain any compiled code.

This is especially useful when building CI/CD pipeline where you want to be able to generate a manifest.json file.

This used to work as expected with dbt-databricks, but it seems to have been broken in version 1.9.0.

Steps To Reproduce

Define some dummy http_path (or other) value in dbt profile and run dbt parse. Using dbt-databricks 1.8.7, that works as expected, whereas any version since 1.9.0 produces an unhandled exception.

Expected behavior

Generate a manifest.json without trying to connect to Databricks.

Screenshots and log output

Here is part of the traceback you get when running dbt parse

13:43:50  Traceback (most recent call last):
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 138, in wrapper
    result, success = func(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 101, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 218, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 247, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 294, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 320, in wrapper
    ctx.obj["manifest"] = parse_manifest(
                          ^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/parser/manifest.py", line 1895, in parse_manifest
    register_adapter(runtime_config, get_mp_context())
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/factory.py", line 203, in register_adapter
    FACTORY.register_adapter(config, mp_context, adapter_registered_log_level)
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/factory.py", line 118, in register_adapter
    adapter: Adapter = adapter_type(config, mp_context)  # type: ignore
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/impl.py", line 176, in __init__
    super().__init__(config, mp_context)
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/base/impl.py", line 271, in __init__
    self.connections = self.ConnectionManager(config, mp_context)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/connections.py", line 712, in __init__
    super().__init__(profile, mp_context)
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/connections.py", line 385, in __init__
    self.api_client = DatabricksApiClient.create(creds, 15 * 60)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/api_client.py", line 560, in create
    credentials_provider = credentials.authenticate(None)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/credentials.py", line 262, in authenticate
    self._credentials_provider = provider.as_dict()

System information

The output of dbt --version:

Core:
  - installed: 1.9.2
  - latest:    1.9.2 - Up to date!

Plugins:
  - spark:      1.9.1 - Up to date!
  - databricks: 1.9.4 - Up to date!

Debian Bookworm
Python 3.12.8

Additional context

While trying to identify the origin of the issue, I was able to install the combination dbt-core==1.8.8, dbt-databricks==1.9.4 which gives the error above, while the combination dbt-core==1.8.8, dbt-databricks==1.8.7 doesn't, which seems to confirm that it is not a change to dbt-core, but a change to dbt-databricks that is the root cause.

I have not been able to pinpoint a specific change since there has been a significant refactor between these two versions. I suspect that the issue might be there:

self.api_client = DatabricksApiClient.create(creds, 15 * 60)

This seems to have been introduced by #849.

It looks like DatabricksConnectionManager calls DatabricksApiClient.create in its initializer, which in turns establishes a connection to Databricks, maybe that's a bit early in the process and should only happen later, when open is called?

According to the documentation of ConnectionManager:

open() is a classmethod that gets a connection object (which could be in any state, but will have a Credentials object with the attributes you defined above) and moves it to the 'open' state.`

I would therefore not expect the connection to be opened before this function is called.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions