Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions framework-guarantees-example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# guarantees on the framework usage level

Let's discuss now guarantees that framework will provide for the user. For this I'll be using slightly simplier version of the storage config as in that example my focus would be on the UX for people who will write business-logic.

[TODO:?]Our goals are:
1. to provide guarantees that running the code is always safe from data definition's perspective. Meaning:
- you'll have all required fields
- they will have the correct type
2. you don't need to perform annoying migrations yourself

# Transition example

Let's imagine we have users, they have name and family name. Our code does something and in the end takes name+family-name and sends some confirmation email. And for some reason - we want to get rid of name and family name and want to start using name-and-family-name field. So we need to change the schema, perform the migration, expose the fields, update the code, remove old fields, etc.

## step 0
This is what we have for [storage configuration](./maroong-storage-config.md)
```python
storage(
types(
User(
v1[active](
id int
name str
family_name str
index (
id: hash,
name: btree
)
),
)
)
repositories (
postgresql(
id: "store-for-everything",
connectionParams: {...},
holdTypes(
User.v1(
id -> users.id,
name -> users.name,
family_name -> users.family_name,
),
)
)
),
location_rules(
User.v1("store-for-everything"),
)
)
```

this is the code we have on [maroon-runner](./maroon-runner.md)
```python
oltp_maroon {
maroon def (userID: str) {
# do some logic here
# will be executed durably, etc, bla bla


# send email to the user
var user = User(storage, index.hash==userID)

send_email(full_name: user.name + ' ' + family_name)
}
}
```

## destination state
we want to add the field that combines name + family-name. And use it in our code

```python
storage(
types(
User(
v1(
id int
name str
family_name str
index (
id: hash,
name: btree,
)
),
v2[active](
id int
full_name str # name + family_name
index (
id: hash,
full_name: btree,
)
),
migration(
# what we want here - to leave user a possibility to traverse data back and forth
# in case if we need to roll-back the changes whatever the reason

# constructor
v1_v2(obj: v1) -> v2 {
# side effects are allowed: idempotent! http/db/etc. calls
return v2{
full_name: obj.name + ' ' + obj.family_name,
v1...
}
}
# restructor
v2_v1(obj: v2) -> v1 {
# side effects are not allowed
# we need this to populate previous versions
# when we create never versions
return v1{
name: full_name.split(' ').first,
family_name: full_name.split(' ').second,
v2...
}
}
)
)
)
repositories (
postgresql(
id: "store-for-everything",
connectionParams: {...},
holdTypes(
User.v1(
id -> users.id,
name -> users.name,
family_name -> users.family_name,
),
User.v2(
id -> users.id,
full_name -> users.full_name,
),
)
)
),
location_rules(
User.v1("store-for-everything"),
User.v2("store-for-everything"),
)
)
```

Keep in mind that at any particular point in time there is only one available type version. In the code you can't combine two versions

```python
oltp_maroon {
maroon def (userID: str) {
# do some logic here
# will be executed durably, etc, bla bla


# send email to the user
var user = User(storage, index.hash==userID)
send_email(full_name: user.full_name)
}
}
```

## execution. Step by step(simplified)

- we have type User_v1 active and used by the code on [maroon-runner](./maroon-runner.md)
- create new type with added field in configurator - [admin::role action]
- send new config to maroon-runner to verify - [admin::role action]
- rejected. Reasons
- no constructor-restructor
- no repositories updates for the new type
[TODO:?] do we want to ask admin/dev to provide that information explicitly? Probably that's ok for the first version. Later - would be nice to make these things in an automatic way
- no location_rules updates for the new type
- admin::role make udpates and sends it for the verification
- maroon-runner accepts the new config
- maroon-runner starts background migration job
- creates necessary indexes/columns/tables in repositories
- creates User_v2 type with the state [constructing]
- starts to populate data for v2 objects
- for each migrated version adds v2 to supported versions in [maroon-source-of-truth](./maroon-source-of-truth.md)
- v2 type is in the [constructing] state and v1 - [active]
- [!] if here dev::role tries to deploy the code that uses v2 - they get an error, because v2 is in [construct] state
- [!] here admin::role can see the migration progress
- exposed through some telemetry channel
- [TODO:?] admin panel is here?
- background migration process finished
- now v2 is in [available] state
- all the indexes created
- all the data is migrated
- now you can change the code and make v2 - active
- dev::role make changes:
- in the config v2 - becomes active, v1 - becomes available
- in the code - now it should use only v2 constructors, fields
- admin::role pushes the changes to maroon-runner
- maroon-runner checks the correctness and accept the changes
- we still have two types: v1 and v2
- that means we still have info in DB for v1 and v2 and can switch any time we want
- it also means that when we create v2 object - we save enough data to create valid v1 object (by using v2_v1 restructor)
- so we can change v1 and v2 between active/availble state back and forth
- if we sure that we don't need v1 anymore we can perform "compaction" operation. That operation will:
- delete v1 type
- accept absence of constructor/restructor info for v1
- accept absence of repositories and location_rules info for v1
- starts background process of deleting not needed columns
- [!] it's a destructive operation. After applying it - we can't go between v1-v2 anymore. The information is lost
5 changes: 5 additions & 0 deletions maroon-runner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Entity that is reponsible for durable code execution:
- runs code
- saves checkpoint's states
- uses [maroon-storage](./maroon-storage.md)
- [TODO:?] replication module is here or it will be higher level abstraction?
18 changes: 18 additions & 0 deletions maroon-source-of-truth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
- keeps
- k-v objects
- k - key of the object
- v
- hash(last n versions)
- type
- name
- supported type versions
- version
- log of transformations between different object's versions
- indexes (for quicker finding the right objects. Ex: which repository to ask)
- lives on maroon-nodes
- to quicker query
- quicker perform validations: which code we can/can't deploy depends on the types

- [!] There is difference between type version and object version
- [TODO:?] do we need to keep type version for each object? Or it will be enough to know what's the current active type?
- looks like we need. We can use this information to allow active-type switching
27 changes: 27 additions & 0 deletions maroon-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Problem to solve
- provide consistent(invariants satisfaction) and durable storage built on top of different storage solutions presented in the organization
- we need to solve that problem in order to open a possibility to write simpler/durable code/logic/scenarios/etc (maroon-engine)

## Function
- maroon-storage works as a black box for (business-logic developer)::role
- uses one logical instance of [source of truth](./maroon-source-of-truth.md)
- can use various(N > 0) [repositories](./repository.md)
- supports ACID transactions
- A
- all updates are done or none
- C
- invariants are satisfied(across different repositories)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading imho — the "C" in our case has nothing to do with different repositories directly, the repositories are synced with in an eventually consistent way, decoupled from running ACID transactions.

- I
- since we'll have sequential read/writes - it's not relevant for migrator
- [!] fine only until we're "single-threaded"
- D
- set of data as durable as repository is durable(if some repository owns some subset of data - we're limited by this repository's durability)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I would argue the opposite: the "D" in our case is what the maroon execution framework guarantees — that ultimately all the changes will propagate to whatever repositories exist out there, but not necessarily immediately.

"D" is about not losing the data, and the maroon framework (imho) is quite allowed to (temporarily) be the source of truth for "hot" keys/values. Ideally, 99.9+% of the data is always durably stored in various repositories (including S3-grade ones, or even Iceberg-ed), but I'd refrain from making claims that our "D" implies prompt storage of all the data into the respective repositories.

- admin::role can [configure maroon-storage](./maroong-storage-config.md) to use different repositories
- takes an exclusive rights on read/write operations in all connected repositories
- all the traffic goes through the maroon-engine
- supported operations for the user
- read object of a type by id
- update(whole object or some fields) object of a type by id
- create object of a type with id
- autogenerated id
- query by some query parameters
134 changes: 134 additions & 0 deletions maroong-storage-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Configuration language

- declarative approach
- strong type system
- indexes support
- [TODO:?] do we have indexes on [source of turth](./maroon-source-of-truth.md) level? Or indexes will be spreaded to [repositories](./repository.md) as well? Or these are two different indexes?
- we need to clearly specify external repositories as our direction right now is to keep the customers data in their DBs
- different or the same types of objects can live in the same or different repositories. Examples:
- all the data in one storage
- part of users live in mongo and part in postgresql and some special users can be created only in that AzureDB in that region due to regulations or whatever
- admin can also add/remove repositories at "any" time
- "any" - of course not any time, but almost. Limitations TBD
- admin can change the parameters of repositories and maroon-engine will migrate the data between repositories accordingly

```python
storage(
types(
User(
v1[active](
id int
name string
country string
index (
name: btree
)
),
v2(
id int
name string
country string
active bool
age int
index (
name: btree # btree because we'll need to query ranges
age: hash # hash because we'll need to query exact values
)
)
migration(
# since new field is non-optional we need to add some code that can perform the transition between v1 and v2
# underthehood engine will do the heavylifting:
# - introduce a new `active` optional field
# - starts updating value of the field
# - when finishes - it will move the column from optional to non-optional state
v1_v2(obj: v1) -> v2 {
# not very declarative. Other proposals how to solve that situation?
is_active := http.call.is_active(obj.id)
if age := http.call.age_of_user(obj.id); age != nil {
age = default_age
}
return v2{ # choose which fields to setup and which just copy from the old
active: is_active,
age: age,
v1...
}
},
# restructor
# gets the previous version of the object
# doesn't allow side effects
# needed if we want to keep the possibility to roll back on severl versions
v2_v1(obj: v2) -> v1{
return v1{
v2...
}
}

)
)
)
repositories (
mongodb(
id: "eu-users-repository",
connectionParams: {bla bla},
holdTypes(
User.v1(
id -> users.id, // mapping between in-memory object and table/field in a table datastore
name -> users.name,
country -> users.country,
),
User.v2(
id -> users.id,
name -> users.name,
country -> users.country,
active -> users.active
),
)
),
postgresql(
id: "eu-users-repository-new", # imagine that we're migrating users from mongodb to postgres(unified storing approach), but it still should be in some EU-based DC
connectionParams: {...},
holdTypes(
User.v1(
id -> users.id,
name -> users.name,
country -> users_meta.country (foreing_key: users.id), # compound object that lives in different tables
),
User.v2(
id -> users.id,
name -> users.name,
country -> users_meta.country (foreing_key: users.id),
active -> users.active,
),
)
),
postgresql(
id: "us-users-repository",
connectionParams: {...},
holdTypes(
User.v1(
id -> users.id,
name -> users.name,
country -> users_meta.country (foreing_key: users.id),
),
User.v2(
id -> users.id,
name -> users.name,
country -> users_meta.country (foreing_key: users.id),
active -> users.active,
),
)
)
),
location_rules(
priority_migration(
# in that case data will be slowly copied from one storage to another
# TODO: we need to have a requirement here that transformation should cover all the fields and it should be checked
"eu-users-repository" ==> "eu-users-repository-new"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this ==> a typo of => or an intentionally different operator? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentionally different, it's not equal-or-greater

)
User.v1(country == "USA" => "us-users-repository"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, I personally love this, and am sure @AdamEther would love it too. Even though we didn't speak of this yet.

User.v2(country == "USA" => "us-users-repository"),
User.v1(country == "UK" => ["eu-users-repository", "eu-users-repository-new"]),
User.v2(country == "UK" => ["eu-users-repository", "eu-users-repository-new"]),
)
)
```
Loading