Skip to content

Qualytics/qualytics-self-hosted

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

331 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is Qualytics?

Qualytics is a closed-source container-native platform for assessing, monitoring, and facilitating enterprise data quality. Learn more about our product and capabilities here.

What is in this chart?

This chart will deploy a single-tenant instance of the qualytics platform to a CNCF compliant kubernetes control plane.

Architecture

flowchart LR
  user([User Browser])

  datastores[("Datastores (external)<br/>JDBC: Snowflake, BigQuery, Redshift, Databricks, …<br/>DFS: Amazon S3, Google Cloud Storage, Azure Data Lake Storage")]

  subgraph k8s["Kubernetes Cluster"]
    direction TB
    subgraph ns["Qualytics Namespace"]
      direction TB

      nginx["nginx-ingress<br/>(optional)"]

      subgraph appPool["Application Nodes — appNodes=true"]
        direction LR
        fe["Frontend"]
        api["Controlplane API<br/>(8 replicas)"]
        cmd["Controlplane CMD"]
        pg[("PostgreSQL<br/>StatefulSet")]
        rmq[("RabbitMQ<br/>StatefulSet, emptyDir")]
      end

      subgraph driverPool["Spark Driver Node — driverNodes=true"]
        driver["Dataplane (Spark Driver)<br/>spark-submit --deploy-mode client"]
      end

      subgraph execPool["Spark Executor Nodes — executorNodes=true"]
        executors["Spark Executors<br/>(dynamic allocation, 1..12 pods)"]
      end
    end
  end

  user --> nginx
  nginx -->|/| fe
  nginx -->|/api/...| api
  api <--> pg
  api <--> rmq
  cmd <--> pg
  cmd <--> rmq
  driver <-->|dataplane queue| rmq
  driver -.->|launches and deletes<br/>executor pods| executors

  driver -->|metadata| datastores
  executors -->|reads data| datastores
Loading

A Qualytics deployment is split into a Controlplane, a Dataplane, and the Datastores it monitors:

  • Controlplane — the API and CMD services plus the Frontend UI, all running on Application Nodes (appNodes=true). The API serves user requests and orchestrates work; CMD is the background processor that schedules and tracks operations. PostgreSQL holds platform state and RabbitMQ is the message broker between the Controlplane and the Dataplane.
  • Dataplane — a Spark application: a single driver pod (driverNodes=true) that runs spark-submit in client mode, plus executor pods (executorNodes=true) the driver creates and reaps dynamically based on workload (dataplane.dynamicAllocation.minExecutors..maxExecutors, default 1..12).
  • Datastores — the external systems Qualytics is profiling and scanning. Two connector families are supported: JDBC (Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, Oracle, Microsoft SQL Server, …) and DFS (Amazon S3, Google Cloud Storage, Azure Data Lake Storage). The driver opens metadata connections; the executors do the parallel data reads.

Datastores are configured in the Qualytics UI after deployment. See the user guide's Source Datastores Overview for the full connector list.

Prerequisites

Before deploying Qualytics, ensure you have:

  • A Kubernetes cluster (recommended version 1.30+)
  • kubectl configured to access your cluster
  • helm CLI installed (recommended version 3.12+)
  • Docker registry credentials from your Qualytics account manager
  • Authentication configuration — either OIDC credentials from your IdP (recommended) or Auth0 credentials from your Qualytics account manager

How should I use this chart?

Please work with your account manager at Qualytics to secure the right values for your licensed deployment. If you don't yet have an account manager, please write us here to say hello!

1. Create a CNCF compliant cluster

Qualytics fully supports kubernetes clusters hosted in AWS, GCP, and Azure as well as any CNCF-compliant control plane.

Terraform Templates Available: We provide ready-to-use Terraform templates for provisioning Kubernetes clusters on AWS (EKS), GCP (GKE), and Azure (AKS). See the /terraform directory for details.

Infrastructure Flexibility

Qualytics is designed to be flexible and can run on virtually any Kubernetes infrastructure. The platform automatically adapts to available resources, making it compatible with a wide range of cluster configurations. The infrastructure requirements scale based on the volume of data to be processed—smaller datasets can run on minimal resources, while larger data volumes benefit from more powerful configurations.

Node Configuration

The architecture above assumes three dedicated node groups (appNodes, driverNodes, executorNodes); for production data volumes we recommend keeping them separate with autoscaling enabled. The setup is flexible if that's overkill for your environment:

  • Combined Spark nodes: Merge driver and executor labels into a single sparkNodes=true label if your node group has sufficient resources for both.
  • No node selectors: Run on any available cluster nodes without targeting specific groups (disable node selectors in values.yaml).
  • Single node: For development or small workloads, the entire platform can run on a single appropriately-sized node.

Suggested Instance Types

The table below shows suggested instance types for a standard Medium-tier production deployment, suitable for most workloads up to 10 TB of data under management.

Application Nodes Spark Driver Nodes Spark Executor Nodes
Label appNodes=true driverNodes=true executorNodes=true
Scaling Autoscaling (1 node on-demand) Autoscaling (1 node on-demand) Autoscaling (1 - 12 nodes spot)
EKS m8g.2xlarge (8 vCPUs, 32 GB) r8g.2xlarge (8 vCPUs, 64 GB) r8gd.2xlarge (8 vCPUs, 64 GB, 474 GB SSD)
GKE n4-standard-8 (8 vCPUs, 32 GB) n4-highmem-8 (8 vCPUs, 64 GB) n2-highmem-8 + Local SSD (8 vCPUs, 64 GB)
AKS Standard_D8s_v6 (8 vCPUs, 32 GB) Standard_E8s_v6 (8 vCPUs, 64 GB) Standard_E8ds_v5 (8 vCPUs, 64 GB, 300 GB SSD)

For deployments with different data volumes, the Cluster Sizing Guide covers all six tiers (Small through 4X-Large), on-premises bare-metal specifications, cloud instance types for EKS/GKE/AKS, and Helm configurations. Contact your Qualytics account manager for sizing guidance.

Docker Registry Secrets

Execute the command below using the credentials supplied by your account manager as a replacement for "<token>". The secret created will provide access to Qualytics private registry on dockerhub and the required images that are available there.

kubectl create namespace qualytics
kubectl create secret docker-registry regcred -n qualytics --docker-username=qualyticsai --docker-password=<token>

Important

The above configuration will connect your cluster directly to our private dockerhub repositories for pulling our images. If you are unable to directly connect your cluster to our image repository for technical or compliance reasons, then you can instead import our images into your preferred registry using these same credentials (docker login -u qualyticsai -p <token>). You'll need to update the image URLs in the values.yaml file in the next step to point to your repository instead of ours.

2. Create your configuration file

For a quick start, copy the simplified template configuration:

cp template.values.yaml values.yaml

The template.values.yaml file contains essential configurations with sensible defaults. You'll need to update these required settings:

  1. DNS Record (provided by Qualytics or managed by customer):

    global:
      dnsRecord: "your-company.qualytics.io"  # or your custom domain
  2. Authentication — choose one of the following:

    Option A: OIDC — Direct IdP Integration (Recommended)

    Set global.authType to OIDC and configure your Identity Provider credentials. Register Qualytics as a Web Application in your IdP with https://<your-domain>/api/callback as the redirect URI, Authorization Code grant type, and at minimum openid scope.

    global:
      authType: "OIDC"
    
    secrets:
      oidc:
        oidc_scopes: "openid,email,profile"
        oidc_authorization_endpoint: "https://your-idp.example.com/oauth2/authorize"
        oidc_token_endpoint: "https://your-idp.example.com/oauth2/token"
        oidc_userinfo_endpoint: "https://your-idp.example.com/oauth2/userinfo"
        oidc_client_id: "your-client-id"
        oidc_client_secret: "your-client-secret"
        oidc_user_id_key: "sub"
        oidc_user_email_key: "email"
        oidc_user_name_key: "name"
        oidc_user_fname_key: "given_name"
        oidc_user_lname_key: "family_name"
        oidc_user_picture_key: "picture"
        oidc_user_provider_key: "auth_provider"
        oidc_allow_insecure_transport: false

    See the OIDC Configuration Guide for detailed instructions including IdP-specific examples for Okta, Azure AD (Entra ID), Keycloak, and Google Workspace.

    Option B: Auth0 — Managed by Qualytics

    Contact your Qualytics account manager to request Auth0 resources, then configure the provided values:

    global:
      authType: "AUTH0"
    
    secrets:
      auth0:
        auth0_audience: your-api-audience
        auth0_organization: org_your-org-id
        auth0_spa_client_id: your-spa-client-id

    See the Auth0 Setup Guide for details on how to request Auth0 resources from Qualytics.

  3. Security Secrets (generate secure random values):

    secrets:
      auth:
        jwt_signing_secret: your-secure-jwt-secret     # min 32 chars, generate with: openssl rand -base64 32
      postgres:
        secrets_passphrase: your-secure-passphrase
      rabbitmq:
        rabbitmq_password: your-secure-password

Optional configurations:

  • Enable nginx if you need an ingress controller
  • Provide a TLS Secret for the ingress (see docs/ingress-tls.md). Recommended: a single shared qualytics-tls-cert Secret referenced via ingress.tls.secretName. Existing deployments with api-tls-cert + frontend-tls-cert keep working unchanged.
  • Configure controlplane.smtp settings for email notifications

For advanced configuration, refer to the full charts/qualytics/values.yaml file which contains all available options.

Contact your Qualytics account manager for assistance.

3. Deploy Qualytics to your cluster

Add the Qualytics Helm repository and deploy the platform:

# Add the Qualytics Helm repository
helm repo add qualytics https://qualytics.github.io/qualytics-self-hosted
helm repo update

# Deploy Qualytics
helm upgrade --install qualytics qualytics/qualytics \
  --namespace qualytics \
  --create-namespace \
  -f values.yaml \
  --wait \
  --timeout=5m

Monitor the deployment:

# Check deployment status
kubectl get pods -n qualytics

Get the ingress IP address:

# If using nginx ingress
kubectl get svc -n qualytics qualytics-nginx-controller

# Or check ingress resources
kubectl get ingress -n qualytics

Note this IP address as it's needed for the next step!

4. Configure DNS and TLS for your deployment

Run Qualytics under a domain you control:

  1. Create an A record pointing your domain to the ingress IP address.
  2. Set global.dnsRecord in values.yaml to that hostname.
  3. Mint a TLS certificate for that hostname (corporate CA, Let's Encrypt, cloud-provider managed cert, etc.) and create a Kubernetes tls Secret from it — see docs/ingress-tls.md for the recommended single-Secret pattern and the per-ingress Secret option.
  4. Update any firewall rules to allow traffic to your domain.

Contact your account manager if you need assistance.

Can I run a fully "air-gapped" deployment?

Yes. The only egress requirement for a standard self-hosted Qualytics deployment is to https://auth.qualytics.io which provides Auth0-powered federated authentication. This is recommended for ease of installation and support, but not a strict requirement. If you require a fully private deployment with no access to the public internet, you can instead configure an OpenID Connect (OIDC) integration with your enterprise identity provider (IdP).

To set up OIDC for an air-gapped deployment:

  1. Set global.authType: "OIDC" in your values.yaml
  2. Configure your enterprise IdP credentials under secrets.oidc
  3. Import Qualytics container images into your private registry

See the OIDC Configuration Guide for step-by-step instructions.

Troubleshooting

Common Issues

Pods stuck in Pending state:

  • Check node resources: kubectl describe nodes
  • Verify node selectors match your cluster labels
  • Ensure storage classes are available

Image pull errors:

  • Verify Docker registry secret: kubectl get secret regcred -n qualytics -o yaml
  • Check if images are accessible from your cluster

Ingress not working:

  • Ensure an ingress controller is installed and running
  • Check ingress resources: kubectl describe ingress -n qualytics

Useful Commands

# Check all resources
kubectl get all -n qualytics

# Restart a deployment
kubectl rollout restart deployment/qualytics-api -n qualytics
kubectl rollout restart deployment/qualytics-cmd -n qualytics

# View detailed pod information
kubectl describe pod <pod-name> -n qualytics

# Get spark driver logs (Deployment-managed, random pod suffix — use the deployment selector)
kubectl logs -f deployment/qualytics-spark -n qualytics
# Or by label
kubectl logs -l spark-role=driver -n qualytics --tail=200 -f

Additional Documentation

About

Repo for deploying the Qualytics Platform Self-Hosted

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors