Qualytics is a closed-source container-native platform for assessing, monitoring, and facilitating enterprise data quality. Learn more about our product and capabilities here.
This chart will deploy a single-tenant instance of the qualytics platform to a CNCF compliant kubernetes control plane.
flowchart LR
user([User Browser])
datastores[("Datastores (external)<br/>JDBC: Snowflake, BigQuery, Redshift, Databricks, …<br/>DFS: Amazon S3, Google Cloud Storage, Azure Data Lake Storage")]
subgraph k8s["Kubernetes Cluster"]
direction TB
subgraph ns["Qualytics Namespace"]
direction TB
nginx["nginx-ingress<br/>(optional)"]
subgraph appPool["Application Nodes — appNodes=true"]
direction LR
fe["Frontend"]
api["Controlplane API<br/>(8 replicas)"]
cmd["Controlplane CMD"]
pg[("PostgreSQL<br/>StatefulSet")]
rmq[("RabbitMQ<br/>StatefulSet, emptyDir")]
end
subgraph driverPool["Spark Driver Node — driverNodes=true"]
driver["Dataplane (Spark Driver)<br/>spark-submit --deploy-mode client"]
end
subgraph execPool["Spark Executor Nodes — executorNodes=true"]
executors["Spark Executors<br/>(dynamic allocation, 1..12 pods)"]
end
end
end
user --> nginx
nginx -->|/| fe
nginx -->|/api/...| api
api <--> pg
api <--> rmq
cmd <--> pg
cmd <--> rmq
driver <-->|dataplane queue| rmq
driver -.->|launches and deletes<br/>executor pods| executors
driver -->|metadata| datastores
executors -->|reads data| datastores
A Qualytics deployment is split into a Controlplane, a Dataplane, and the Datastores it monitors:
- Controlplane — the API and CMD services plus the Frontend UI, all running on Application Nodes (
appNodes=true). The API serves user requests and orchestrates work; CMD is the background processor that schedules and tracks operations. PostgreSQL holds platform state and RabbitMQ is the message broker between the Controlplane and the Dataplane. - Dataplane — a Spark application: a single driver pod (
driverNodes=true) that runsspark-submitin client mode, plus executor pods (executorNodes=true) the driver creates and reaps dynamically based on workload (dataplane.dynamicAllocation.minExecutors..maxExecutors, default1..12). - Datastores — the external systems Qualytics is profiling and scanning. Two connector families are supported: JDBC (Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, Oracle, Microsoft SQL Server, …) and DFS (Amazon S3, Google Cloud Storage, Azure Data Lake Storage). The driver opens metadata connections; the executors do the parallel data reads.
Datastores are configured in the Qualytics UI after deployment. See the user guide's Source Datastores Overview for the full connector list.
Before deploying Qualytics, ensure you have:
- A Kubernetes cluster (recommended version 1.30+)
kubectlconfigured to access your clusterhelmCLI installed (recommended version 3.12+)- Docker registry credentials from your Qualytics account manager
- Authentication configuration — either OIDC credentials from your IdP (recommended) or Auth0 credentials from your Qualytics account manager
Please work with your account manager at Qualytics to secure the right values for your licensed deployment. If you don't yet have an account manager, please write us here to say hello!
Qualytics fully supports kubernetes clusters hosted in AWS, GCP, and Azure as well as any CNCF-compliant control plane.
Terraform Templates Available: We provide ready-to-use Terraform templates for provisioning Kubernetes clusters on AWS (EKS), GCP (GKE), and Azure (AKS). See the
/terraformdirectory for details.
Qualytics is designed to be flexible and can run on virtually any Kubernetes infrastructure. The platform automatically adapts to available resources, making it compatible with a wide range of cluster configurations. The infrastructure requirements scale based on the volume of data to be processed—smaller datasets can run on minimal resources, while larger data volumes benefit from more powerful configurations.
The architecture above assumes three dedicated node groups (appNodes, driverNodes, executorNodes); for production data volumes we recommend keeping them separate with autoscaling enabled. The setup is flexible if that's overkill for your environment:
- Combined Spark nodes: Merge driver and executor labels into a single
sparkNodes=truelabel if your node group has sufficient resources for both. - No node selectors: Run on any available cluster nodes without targeting specific groups (disable node selectors in values.yaml).
- Single node: For development or small workloads, the entire platform can run on a single appropriately-sized node.
The table below shows suggested instance types for a standard Medium-tier production deployment, suitable for most workloads up to 10 TB of data under management.
| Application Nodes | Spark Driver Nodes | Spark Executor Nodes | |
|---|---|---|---|
| Label | appNodes=true | driverNodes=true | executorNodes=true |
| Scaling | Autoscaling (1 node on-demand) | Autoscaling (1 node on-demand) | Autoscaling (1 - 12 nodes spot) |
| EKS | m8g.2xlarge (8 vCPUs, 32 GB) | r8g.2xlarge (8 vCPUs, 64 GB) | r8gd.2xlarge (8 vCPUs, 64 GB, 474 GB SSD) |
| GKE | n4-standard-8 (8 vCPUs, 32 GB) | n4-highmem-8 (8 vCPUs, 64 GB) | n2-highmem-8 + Local SSD (8 vCPUs, 64 GB) |
| AKS | Standard_D8s_v6 (8 vCPUs, 32 GB) | Standard_E8s_v6 (8 vCPUs, 64 GB) | Standard_E8ds_v5 (8 vCPUs, 64 GB, 300 GB SSD) |
For deployments with different data volumes, the Cluster Sizing Guide covers all six tiers (Small through 4X-Large), on-premises bare-metal specifications, cloud instance types for EKS/GKE/AKS, and Helm configurations. Contact your Qualytics account manager for sizing guidance.
Execute the command below using the credentials supplied by your account manager as a replacement for "<token>". The secret created will provide access to Qualytics private registry on dockerhub and the required images that are available there.
kubectl create namespace qualytics
kubectl create secret docker-registry regcred -n qualytics --docker-username=qualyticsai --docker-password=<token>Important
The above configuration will connect your cluster directly to our private dockerhub repositories for pulling our images. If you are unable to directly connect your cluster to our image repository for technical or compliance reasons, then you can instead import our images into your preferred registry using these same credentials (docker login -u qualyticsai -p <token>). You'll need to update the image URLs in the values.yaml file in the next step to point to your repository instead of ours.
For a quick start, copy the simplified template configuration:
cp template.values.yaml values.yamlThe template.values.yaml file contains essential configurations with sensible defaults. You'll need to update these required settings:
-
DNS Record (provided by Qualytics or managed by customer):
global: dnsRecord: "your-company.qualytics.io" # or your custom domain
-
Authentication — choose one of the following:
Option A: OIDC — Direct IdP Integration (Recommended)
Set
global.authTypetoOIDCand configure your Identity Provider credentials. Register Qualytics as a Web Application in your IdP withhttps://<your-domain>/api/callbackas the redirect URI, Authorization Code grant type, and at minimumopenidscope.global: authType: "OIDC" secrets: oidc: oidc_scopes: "openid,email,profile" oidc_authorization_endpoint: "https://your-idp.example.com/oauth2/authorize" oidc_token_endpoint: "https://your-idp.example.com/oauth2/token" oidc_userinfo_endpoint: "https://your-idp.example.com/oauth2/userinfo" oidc_client_id: "your-client-id" oidc_client_secret: "your-client-secret" oidc_user_id_key: "sub" oidc_user_email_key: "email" oidc_user_name_key: "name" oidc_user_fname_key: "given_name" oidc_user_lname_key: "family_name" oidc_user_picture_key: "picture" oidc_user_provider_key: "auth_provider" oidc_allow_insecure_transport: false
See the OIDC Configuration Guide for detailed instructions including IdP-specific examples for Okta, Azure AD (Entra ID), Keycloak, and Google Workspace.
Option B: Auth0 — Managed by Qualytics
Contact your Qualytics account manager to request Auth0 resources, then configure the provided values:
global: authType: "AUTH0" secrets: auth0: auth0_audience: your-api-audience auth0_organization: org_your-org-id auth0_spa_client_id: your-spa-client-id
See the Auth0 Setup Guide for details on how to request Auth0 resources from Qualytics.
-
Security Secrets (generate secure random values):
secrets: auth: jwt_signing_secret: your-secure-jwt-secret # min 32 chars, generate with: openssl rand -base64 32 postgres: secrets_passphrase: your-secure-passphrase rabbitmq: rabbitmq_password: your-secure-password
Optional configurations:
- Enable
nginxif you need an ingress controller - Provide a TLS Secret for the ingress (see docs/ingress-tls.md).
Recommended: a single shared
qualytics-tls-certSecret referenced viaingress.tls.secretName. Existing deployments withapi-tls-cert+frontend-tls-certkeep working unchanged. - Configure
controlplane.smtpsettings for email notifications
For advanced configuration, refer to the full charts/qualytics/values.yaml file which contains all available options.
Contact your Qualytics account manager for assistance.
Add the Qualytics Helm repository and deploy the platform:
# Add the Qualytics Helm repository
helm repo add qualytics https://qualytics.github.io/qualytics-self-hosted
helm repo update
# Deploy Qualytics
helm upgrade --install qualytics qualytics/qualytics \
--namespace qualytics \
--create-namespace \
-f values.yaml \
--wait \
--timeout=5mMonitor the deployment:
# Check deployment status
kubectl get pods -n qualyticsGet the ingress IP address:
# If using nginx ingress
kubectl get svc -n qualytics qualytics-nginx-controller
# Or check ingress resources
kubectl get ingress -n qualyticsNote this IP address as it's needed for the next step!
Run Qualytics under a domain you control:
- Create an A record pointing your domain to the ingress IP address.
- Set
global.dnsRecordinvalues.yamlto that hostname. - Mint a TLS certificate for that hostname (corporate CA, Let's Encrypt, cloud-provider managed cert, etc.) and create a Kubernetes
tlsSecret from it — see docs/ingress-tls.md for the recommended single-Secret pattern and the per-ingress Secret option. - Update any firewall rules to allow traffic to your domain.
Contact your account manager if you need assistance.
Yes. The only egress requirement for a standard self-hosted Qualytics deployment is to https://auth.qualytics.io which provides Auth0-powered federated authentication. This is recommended for ease of installation and support, but not a strict requirement. If you require a fully private deployment with no access to the public internet, you can instead configure an OpenID Connect (OIDC) integration with your enterprise identity provider (IdP).
To set up OIDC for an air-gapped deployment:
- Set
global.authType: "OIDC"in yourvalues.yaml - Configure your enterprise IdP credentials under
secrets.oidc - Import Qualytics container images into your private registry
See the OIDC Configuration Guide for step-by-step instructions.
Pods stuck in Pending state:
- Check node resources:
kubectl describe nodes - Verify node selectors match your cluster labels
- Ensure storage classes are available
Image pull errors:
- Verify Docker registry secret:
kubectl get secret regcred -n qualytics -o yaml - Check if images are accessible from your cluster
Ingress not working:
- Ensure an ingress controller is installed and running
- Check ingress resources:
kubectl describe ingress -n qualytics
# Check all resources
kubectl get all -n qualytics
# Restart a deployment
kubectl rollout restart deployment/qualytics-api -n qualytics
kubectl rollout restart deployment/qualytics-cmd -n qualytics
# View detailed pod information
kubectl describe pod <pod-name> -n qualytics
# Get spark driver logs (Deployment-managed, random pod suffix — use the deployment selector)
kubectl logs -f deployment/qualytics-spark -n qualytics
# Or by label
kubectl logs -l spark-role=driver -n qualytics --tail=200 -f- Authentication Configuration — Detailed OIDC and Auth0 configuration reference with Helm values mapping
- License Management — Activate and renew your deployment license (31-day grace period)
- Cluster Sizing Guide — Choose the right cluster size based on your data volume
- Self-Hosted Deployment Guide — End-to-end deployment walkthrough
- OIDC Configuration Guide — Configure OIDC authentication with your enterprise IdP
- Auth0 Setup Guide — Configure Auth0 authentication (managed by Qualytics)
- Qualytics UserGuide — Full platform documentation