Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
etlify (0.9.2)
etlify (0.10.0)
rails (>= 7.0)

GEM
Expand Down
87 changes: 75 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,12 +114,77 @@ class User < ApplicationRecord
id_property: :id,
# Only sync when an email exists
sync_if: ->(user) { user.email.present? },
# useful if your object serialization includes dependencies
dependencies: [:investments]
# Required: defines which records need to be synced
stale_scope: Users::EtlifyStaleScopeQuery
)
end
```

### Writing a stale_scope

The `stale_scope` is required and defines which records need to be synced. It must be a Proc or a Query Object responding to `.call(model_class, crm_name)` and returning an `ActiveRecord::Relation`.

```ruby
# app/queries/users/etlify_stale_scope_query.rb
module Users
class EtlifyStaleScopeQuery
STALE_SQL = <<-SQL.squish
crm_synchronisations.id IS NULL
OR crm_synchronisations.crm_name != ?
OR crm_synchronisations.last_synced_at < users.updated_at
SQL

def self.call(model, crm_name)
model
.left_joins(:crm_synchronisations)
.where(STALE_SQL, crm_name.to_s)
end
end
end
```

You can also use an inline Proc:

```ruby
hubspot_etlified_with(
serializer: UserSerializer,
crm_object_type: "contacts",
id_property: :id,
stale_scope: ->(model, crm_name) do
model
.left_joins(:crm_synchronisations)
.where(<<-SQL.squish, crm_name.to_s)
crm_synchronisations.id IS NULL
OR crm_synchronisations.crm_name != ?
OR crm_synchronisations.last_synced_at < users.updated_at
SQL
end
)
```

For models with dependencies (e.g., sync when investments change):

```ruby
# app/queries/users/etlify_stale_scope_query.rb
module Users
class EtlifyStaleScopeQuery
STALE_SQL = <<-SQL.squish
crm_synchronisations.id IS NULL
OR crm_synchronisations.crm_name != ?
OR crm_synchronisations.last_synced_at < users.updated_at
OR crm_synchronisations.last_synced_at < investments.updated_at
SQL

def self.call(model, crm_name)
model
.left_joins(:crm_synchronisations, :investments)
.where(STALE_SQL, crm_name.to_s)
.distinct
end
end
end
```

### Writing a serializer

```ruby
Expand Down Expand Up @@ -254,15 +319,13 @@ Etlify::CRM.register(
### How it works

- `Etlify::StaleRecords::Finder` scans all **etlified models**
(those that called `#{crm_name}_etlified_with`) and builds, for each,
a **SQL relation selecting only the PKs** of stale records.
- A record is considered stale if:
(those that called `#{crm_name}_etlified_with`) and calls the `stale_scope`
to get a **SQL relation selecting only the PKs** of stale records.
- The `stale_scope` you define determines which records are considered stale.
Typically, a record is stale if:
- it **has no** `crm_synchronisation` row, **or**
- its `last_synced_at` is **older** than the **max** `updated_at` among:
- its own row,
- and its declared dependencies (via `dependencies:` in `etlified_with`,
supporting `belongs_to`, `has_one`, `has_many`, `has_* :through`,
and polymorphic `belongs_to`).
- its `last_synced_at` is **older** than the record's `updated_at`
(or any related model you include in your scope).
- `Etlify::StaleRecords::BatchSync` then iterates **by ID batches**:
- in **async: true** mode (default): **enqueue** one job per ID without loading
full records into memory;
Expand All @@ -276,8 +339,8 @@ Etlify::CRM.register(
via ActiveJob.
- **Stable payloads**: ensure your serializers produce deterministic Hashes to
benefit from **idempotence**.
- **Dependencies**: declare `dependencies:` accurately in `etlified_with` so
indirect changes trigger resyncs.
- **Stale scope**: write your `stale_scope` to include all related models that
affect your serializer output, so indirect changes trigger resyncs.
- **Batch size**: adjust `batch_size` to your DB to balance throughput and memory.

---
Expand Down
59 changes: 58 additions & 1 deletion UPGRADE-GUIDE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,60 @@
# UPGRADING FROM 0.9.x -> 0.10.0

⚠️ **Breaking changes ahead.**

## Overview

Etlify 0.10.0 replaces the `dependencies:` option with a required `stale_scope:` parameter.
This gives you full control over which records are considered stale.

## Migration

Replace `dependencies:` with `stale_scope:` in all your model configurations:

**Before (0.9.x):**

```ruby
hubspot_etlified_with(
serializer: UserSerializer,
crm_object_type: "contacts",
id_property: :id,
dependencies: [:company, :investments]
)
```

**After (0.10.0):**

```ruby
hubspot_etlified_with(
serializer: UserSerializer,
crm_object_type: "contacts",
id_property: :id,
stale_scope: Users::EtlifyStaleScopeQuery
)

# app/queries/users/etlify_stale_scope_query.rb
module Users
class EtlifyStaleScopeQuery
STALE_SQL = <<-SQL.squish
crm_synchronisations.id IS NULL
OR crm_synchronisations.crm_name != ?
OR crm_synchronisations.last_synced_at < users.updated_at
OR crm_synchronisations.last_synced_at < companies.updated_at
OR crm_synchronisations.last_synced_at < investments.updated_at
SQL

def self.call(model, crm_name)
model
.left_joins(:crm_synchronisations, :company, :investments)
.where(STALE_SQL, crm_name.to_s)
.distinct
end
end
end
```

---

# UPGRADING FROM 0.9.1 -> 0.9.2

- Nothing to do (bugfix)
Expand Down Expand Up @@ -121,7 +178,7 @@ class User < ApplicationRecord
serializer: Etlify::Serializers::UserSerializer,
crm_object_type: "contacts",
id_property: "email",
dependencies: [:company],
stale_scope: Users::EtlifyStaleScopeQuery,
sync_if: ->(record) { record.email.present? }
)
end
Expand Down
8 changes: 6 additions & 2 deletions lib/etlify/model.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,18 +52,22 @@ def define_crm_dsl_on(klass, crm_name)
serializer:,
crm_object_type:,
id_property:,
dependencies: [],
stale_scope:,
sync_if: ->(_r) { true },
job_class: nil
|
unless stale_scope.respond_to?(:call)
raise ArgumentError, "stale_scope must respond to #call"
end

reg = Etlify::CRM.fetch(crm_name)

conf = {
serializer: serializer,
guard: sync_if,
crm_object_type: crm_object_type,
id_property: id_property,
dependencies: Array(dependencies).map(&:to_sym),
stale_scope: stale_scope,
adapter: reg.adapter,
job_class: job_class || reg.options[:job_class],
}
Expand Down
Loading