-
Notifications
You must be signed in to change notification settings - Fork 1
maroon-storage definition. v0.0.1 #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,198 @@ | ||
| # guarantees on the framework usage level | ||
|
|
||
| Let's discuss now guarantees that framework will provide for the user. For this I'll be using slightly simplier version of the storage config as in that example my focus would be on the UX for people who will write business-logic. | ||
|
|
||
| [TODO:?]Our goals are: | ||
| 1. to provide guarantees that running the code is always safe from data definition's perspective. Meaning: | ||
| - you'll have all required fields | ||
| - they will have the correct type | ||
| 2. you don't need to perform annoying migrations yourself | ||
|
|
||
| # Transition example | ||
|
|
||
| Let's imagine we have users, they have name and family name. Our code does something and in the end takes name+family-name and sends some confirmation email. And for some reason - we want to get rid of name and family name and want to start using name-and-family-name field. So we need to change the schema, perform the migration, expose the fields, update the code, remove old fields, etc. | ||
|
|
||
| ## step 0 | ||
| This is what we have for [storage configuration](./maroong-storage-config.md) | ||
| ```python | ||
| storage( | ||
| types( | ||
| User( | ||
| v1[active]( | ||
| id int | ||
| name str | ||
| family_name str | ||
| index ( | ||
| id: hash, | ||
| name: btree | ||
| ) | ||
| ), | ||
| ) | ||
| ) | ||
| repositories ( | ||
| postgresql( | ||
| id: "store-for-everything", | ||
| connectionParams: {...}, | ||
| holdTypes( | ||
| User.v1( | ||
| id -> users.id, | ||
| name -> users.name, | ||
| family_name -> users.family_name, | ||
| ), | ||
| ) | ||
| ) | ||
| ), | ||
| location_rules( | ||
| User.v1("store-for-everything"), | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| this is the code we have on [maroon-runner](./maroon-runner.md) | ||
| ```python | ||
| oltp_maroon { | ||
| maroon def (userID: str) { | ||
| # do some logic here | ||
| # will be executed durably, etc, bla bla | ||
|
|
||
|
|
||
| # send email to the user | ||
| var user = User(storage, index.hash==userID) | ||
|
|
||
| send_email(full_name: user.name + ' ' + family_name) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## destination state | ||
| we want to add the field that combines name + family-name. And use it in our code | ||
|
|
||
| ```python | ||
| storage( | ||
| types( | ||
| User( | ||
| v1( | ||
| id int | ||
| name str | ||
| family_name str | ||
| index ( | ||
| id: hash, | ||
| name: btree, | ||
| ) | ||
| ), | ||
| v2[active]( | ||
| id int | ||
| full_name str # name + family_name | ||
| index ( | ||
| id: hash, | ||
| full_name: btree, | ||
| ) | ||
| ), | ||
| migration( | ||
| # what we want here - to leave user a possibility to traverse data back and forth | ||
| # in case if we need to roll-back the changes whatever the reason | ||
|
|
||
| # constructor | ||
| v1_v2(obj: v1) -> v2 { | ||
| # side effects are allowed: idempotent! http/db/etc. calls | ||
| return v2{ | ||
| full_name: obj.name + ' ' + obj.family_name, | ||
| v1... | ||
| } | ||
| } | ||
| # restructor | ||
| v2_v1(obj: v2) -> v1 { | ||
| # side effects are not allowed | ||
| # we need this to populate previous versions | ||
| # when we create never versions | ||
| return v1{ | ||
| name: full_name.split(' ').first, | ||
| family_name: full_name.split(' ').second, | ||
| v2... | ||
| } | ||
| } | ||
| ) | ||
| ) | ||
| ) | ||
| repositories ( | ||
| postgresql( | ||
| id: "store-for-everything", | ||
| connectionParams: {...}, | ||
| holdTypes( | ||
| User.v1( | ||
| id -> users.id, | ||
| name -> users.name, | ||
| family_name -> users.family_name, | ||
| ), | ||
| User.v2( | ||
| id -> users.id, | ||
| full_name -> users.full_name, | ||
| ), | ||
| ) | ||
| ) | ||
| ), | ||
| location_rules( | ||
| User.v1("store-for-everything"), | ||
| User.v2("store-for-everything"), | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| Keep in mind that at any particular point in time there is only one available type version. In the code you can't combine two versions | ||
|
|
||
| ```python | ||
| oltp_maroon { | ||
| maroon def (userID: str) { | ||
| # do some logic here | ||
| # will be executed durably, etc, bla bla | ||
|
|
||
|
|
||
| # send email to the user | ||
| var user = User(storage, index.hash==userID) | ||
| send_email(full_name: user.full_name) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## execution. Step by step(simplified) | ||
|
|
||
| - we have type User_v1 active and used by the code on [maroon-runner](./maroon-runner.md) | ||
| - create new type with added field in configurator - [admin::role action] | ||
| - send new config to maroon-runner to verify - [admin::role action] | ||
| - rejected. Reasons | ||
| - no constructor-restructor | ||
| - no repositories updates for the new type | ||
| [TODO:?] do we want to ask admin/dev to provide that information explicitly? Probably that's ok for the first version. Later - would be nice to make these things in an automatic way | ||
| - no location_rules updates for the new type | ||
| - admin::role make udpates and sends it for the verification | ||
| - maroon-runner accepts the new config | ||
| - maroon-runner starts background migration job | ||
| - creates necessary indexes/columns/tables in repositories | ||
| - creates User_v2 type with the state [constructing] | ||
| - starts to populate data for v2 objects | ||
| - for each migrated version adds v2 to supported versions in [maroon-source-of-truth](./maroon-source-of-truth.md) | ||
| - v2 type is in the [constructing] state and v1 - [active] | ||
| - [!] if here dev::role tries to deploy the code that uses v2 - they get an error, because v2 is in [construct] state | ||
| - [!] here admin::role can see the migration progress | ||
| - exposed through some telemetry channel | ||
| - [TODO:?] admin panel is here? | ||
| - background migration process finished | ||
| - now v2 is in [available] state | ||
| - all the indexes created | ||
| - all the data is migrated | ||
| - now you can change the code and make v2 - active | ||
| - dev::role make changes: | ||
| - in the config v2 - becomes active, v1 - becomes available | ||
| - in the code - now it should use only v2 constructors, fields | ||
| - admin::role pushes the changes to maroon-runner | ||
| - maroon-runner checks the correctness and accept the changes | ||
| - we still have two types: v1 and v2 | ||
| - that means we still have info in DB for v1 and v2 and can switch any time we want | ||
| - it also means that when we create v2 object - we save enough data to create valid v1 object (by using v2_v1 restructor) | ||
| - so we can change v1 and v2 between active/availble state back and forth | ||
| - if we sure that we don't need v1 anymore we can perform "compaction" operation. That operation will: | ||
| - delete v1 type | ||
| - accept absence of constructor/restructor info for v1 | ||
| - accept absence of repositories and location_rules info for v1 | ||
| - starts background process of deleting not needed columns | ||
| - [!] it's a destructive operation. After applying it - we can't go between v1-v2 anymore. The information is lost |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| Entity that is reponsible for durable code execution: | ||
| - runs code | ||
| - saves checkpoint's states | ||
| - uses [maroon-storage](./maroon-storage.md) | ||
| - [TODO:?] replication module is here or it will be higher level abstraction? |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| - keeps | ||
| - k-v objects | ||
| - k - key of the object | ||
| - v | ||
| - hash(last n versions) | ||
| - type | ||
| - name | ||
| - supported type versions | ||
| - version | ||
| - log of transformations between different object's versions | ||
| - indexes (for quicker finding the right objects. Ex: which repository to ask) | ||
| - lives on maroon-nodes | ||
| - to quicker query | ||
| - quicker perform validations: which code we can/can't deploy depends on the types | ||
|
|
||
| - [!] There is difference between type version and object version | ||
| - [TODO:?] do we need to keep type version for each object? Or it will be enough to know what's the current active type? | ||
| - looks like we need. We can use this information to allow active-type switching |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| # Problem to solve | ||
| - provide consistent(invariants satisfaction) and durable storage built on top of different storage solutions presented in the organization | ||
| - we need to solve that problem in order to open a possibility to write simpler/durable code/logic/scenarios/etc (maroon-engine) | ||
|
|
||
| ## Function | ||
| - maroon-storage works as a black box for (business-logic developer)::role | ||
| - uses one logical instance of [source of truth](./maroon-source-of-truth.md) | ||
| - can use various(N > 0) [repositories](./repository.md) | ||
| - supports ACID transactions | ||
| - A | ||
| - all updates are done or none | ||
| - C | ||
| - invariants are satisfied(across different repositories) | ||
| - I | ||
| - since we'll have sequential read/writes - it's not relevant for migrator | ||
| - [!] fine only until we're "single-threaded" | ||
| - D | ||
| - set of data as durable as repository is durable(if some repository owns some subset of data - we're limited by this repository's durability) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Heh, I would argue the opposite: the "D" in our case is what the maroon execution framework guarantees — that ultimately all the changes will propagate to whatever repositories exist out there, but not necessarily immediately. "D" is about not losing the data, and the maroon framework (imho) is quite allowed to (temporarily) be the source of truth for "hot" keys/values. Ideally, 99.9+% of the data is always durably stored in various repositories (including S3-grade ones, or even Iceberg-ed), but I'd refrain from making claims that our "D" implies prompt storage of all the data into the respective repositories. |
||
| - admin::role can [configure maroon-storage](./maroong-storage-config.md) to use different repositories | ||
| - takes an exclusive rights on read/write operations in all connected repositories | ||
| - all the traffic goes through the maroon-engine | ||
| - supported operations for the user | ||
| - read object of a type by id | ||
| - update(whole object or some fields) object of a type by id | ||
| - create object of a type with id | ||
| - autogenerated id | ||
| - query by some query parameters | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| # Configuration language | ||
|
|
||
| - declarative approach | ||
| - strong type system | ||
| - indexes support | ||
| - [TODO:?] do we have indexes on [source of turth](./maroon-source-of-truth.md) level? Or indexes will be spreaded to [repositories](./repository.md) as well? Or these are two different indexes? | ||
| - we need to clearly specify external repositories as our direction right now is to keep the customers data in their DBs | ||
| - different or the same types of objects can live in the same or different repositories. Examples: | ||
| - all the data in one storage | ||
| - part of users live in mongo and part in postgresql and some special users can be created only in that AzureDB in that region due to regulations or whatever | ||
| - admin can also add/remove repositories at "any" time | ||
| - "any" - of course not any time, but almost. Limitations TBD | ||
| - admin can change the parameters of repositories and maroon-engine will migrate the data between repositories accordingly | ||
|
|
||
| ```python | ||
| storage( | ||
| types( | ||
| User( | ||
| v1[active]( | ||
| id int | ||
| name string | ||
| country string | ||
| index ( | ||
| name: btree | ||
| ) | ||
| ), | ||
| v2( | ||
| id int | ||
| name string | ||
| country string | ||
| active bool | ||
| age int | ||
| index ( | ||
| name: btree # btree because we'll need to query ranges | ||
| age: hash # hash because we'll need to query exact values | ||
| ) | ||
| ) | ||
| migration( | ||
| # since new field is non-optional we need to add some code that can perform the transition between v1 and v2 | ||
| # underthehood engine will do the heavylifting: | ||
| # - introduce a new `active` optional field | ||
| # - starts updating value of the field | ||
| # - when finishes - it will move the column from optional to non-optional state | ||
| v1_v2(obj: v1) -> v2 { | ||
| # not very declarative. Other proposals how to solve that situation? | ||
| is_active := http.call.is_active(obj.id) | ||
| if age := http.call.age_of_user(obj.id); age != nil { | ||
| age = default_age | ||
| } | ||
| return v2{ # choose which fields to setup and which just copy from the old | ||
| active: is_active, | ||
| age: age, | ||
| v1... | ||
| } | ||
| }, | ||
| # restructor | ||
| # gets the previous version of the object | ||
| # doesn't allow side effects | ||
| # needed if we want to keep the possibility to roll back on severl versions | ||
| v2_v1(obj: v2) -> v1{ | ||
| return v1{ | ||
| v2... | ||
| } | ||
| } | ||
|
|
||
| ) | ||
| ) | ||
| ) | ||
| repositories ( | ||
| mongodb( | ||
| id: "eu-users-repository", | ||
| connectionParams: {bla bla}, | ||
| holdTypes( | ||
| User.v1( | ||
| id -> users.id, // mapping between in-memory object and table/field in a table datastore | ||
| name -> users.name, | ||
| country -> users.country, | ||
| ), | ||
| User.v2( | ||
| id -> users.id, | ||
| name -> users.name, | ||
| country -> users.country, | ||
| active -> users.active | ||
| ), | ||
| ) | ||
| ), | ||
| postgresql( | ||
| id: "eu-users-repository-new", # imagine that we're migrating users from mongodb to postgres(unified storing approach), but it still should be in some EU-based DC | ||
| connectionParams: {...}, | ||
| holdTypes( | ||
| User.v1( | ||
| id -> users.id, | ||
| name -> users.name, | ||
| country -> users_meta.country (foreing_key: users.id), # compound object that lives in different tables | ||
| ), | ||
| User.v2( | ||
| id -> users.id, | ||
| name -> users.name, | ||
| country -> users_meta.country (foreing_key: users.id), | ||
| active -> users.active, | ||
| ), | ||
| ) | ||
| ), | ||
| postgresql( | ||
| id: "us-users-repository", | ||
| connectionParams: {...}, | ||
| holdTypes( | ||
| User.v1( | ||
| id -> users.id, | ||
| name -> users.name, | ||
| country -> users_meta.country (foreing_key: users.id), | ||
| ), | ||
| User.v2( | ||
| id -> users.id, | ||
| name -> users.name, | ||
| country -> users_meta.country (foreing_key: users.id), | ||
| active -> users.active, | ||
| ), | ||
| ) | ||
| ) | ||
| ), | ||
| location_rules( | ||
| priority_migration( | ||
| # in that case data will be slowly copied from one storage to another | ||
| # TODO: we need to have a requirement here that transformation should cover all the fields and it should be checked | ||
| "eu-users-repository" ==> "eu-users-repository-new" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It this
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. intentionally different, it's not equal-or-greater |
||
| ) | ||
| User.v1(country == "USA" => "us-users-repository"), | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FTR, I personally love this, and am sure @AdamEther would love it too. Even though we didn't speak of this yet. |
||
| User.v2(country == "USA" => "us-users-repository"), | ||
| User.v1(country == "UK" => ["eu-users-repository", "eu-users-repository-new"]), | ||
| User.v2(country == "UK" => ["eu-users-repository", "eu-users-repository-new"]), | ||
| ) | ||
| ) | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is misleading imho — the "C" in our case has nothing to do with different repositories directly, the repositories are synced with in an eventually consistent way, decoupled from running ACID transactions.