Skip to content

Batch Edit, Workbench, Query Builder backend improvements#4929

Closed
realVinayak wants to merge 77 commits intoproductionfrom
wb_improvements
Closed

Batch Edit, Workbench, Query Builder backend improvements#4929
realVinayak wants to merge 77 commits intoproductionfrom
wb_improvements

Conversation

@realVinayak
Copy link
Contributor

@realVinayak realVinayak commented May 18, 2024

Some backend improvements

Workbench

Query Builder

  • Generic relationship and fields in trees. At least the backend support for now.

Checklist

  • Self-review the PR after opening it to make sure the changes look good
    and self-explanatory (or properly documented)
  • Add automated tests
  • Add relevant issue to release milestone

Testing instructions

Batch-editing

Implementation and design

  1. The current implementation uses workbench, and query builder.
  2. The workbench and batch-edit dataset are differntiated at the user level using a new "isupdate" field in spdataset table. DEV note: There is no difference at the code level -- everything is as general as possible. In fact, the isupdate field is only used at the code level to follow a special rollback procedure
  3. The batch-edit datasets can seen using the batch-edit overlay accessible via the batch-edit menu item. Currently, the only way to create a new dataset is via the query builder interface.
  4. To make a batch-edit dataset, go to the query builder, and add the relevant fields to the query. Some fields and relationships are not supported. Nested-to-many, for instance, are supported in workbench, but aren't in the batch-edit. Special fields like nodenumber and highestchildnodenumber, fullname field in tree are also not supported.
  5. Other than relationships mentioned above, every field is supported. If an unsupported field like nodenumber is added, it is rendered as readonly. However, you cannot make nested-to-manys visible (this is different than being able to map it) --- you can map nested-to-manys, and even filter on them. As long as they are hidden, it'll not block. You can also arbitrarily add formatted and aggregated fields (they are unmapped, and are ignored)

Testing Instructions

  • Make a query with columns in the base table, and select relationships to edit. In the current, I am using CO with the base table. There are 4 different types of relationships, in general.

    • To-one dependent (for ex. collectionobjectattribute),
    • To-one independent (for ex. Cataloger, CollectingEvent [when not embedded])
    • To-many dependent (for ex. determinations, preparations),
    • To-many independent (for ex. None for CO as the base table)
  • Fields

    Modifying any field is possible, other than nodenumber, fullname, and highestchildnodenumber. If some fields get updated, only those fields are highlighted

  • To-one dependent (for ex. collectionobjectattribute)

    These relationships get directly updated, and are not matched. If the to-one is not in the db, it'll create one.
    This also includes collectingevent when embedded.

    Test cases to consider:

    • When mapped, the record is directly updated.
    • When mapped, if the record is not present, it'll be created, if not null values are present.
    • If the record previously had values, and the values are removed (making the cells completely empty), the to-one dependent record will be deleted. Since it is possible that there may be other fields in the database (but not in the query), we may accidentally delete the record. Eample: user selected collectionobject -> collectionObjectAttribute -> remarks. And say they set remarks to empty in the spreadsheet, it is is possible that integer1 field in collectionObjectAttribute may have some value. To prevent accidental deletion, by default, we look at all the fields in the database for that record (other than system fields), to determine whether we can delete the record or not. This behaviour is controlled by a remote preference. (described in a later section):
  • To-many dependent (for ex. determinations)

    Same as to-one dependent. These relationships get directly updated. If the corresponding record is not present, a new one gets created.

    Test cases to consider:

    • When mapped, the record is directly updated.
    • When mapped, if the record is not present, it'll be created, if not null values are present.
    • If the cell data is removed, and if every other field is empty in the database (can be disabled via a preference), the record will be deleted.
  • To-one independent (for ex. cataloger)

    These relationships get matched, and uploaded (if match is not found). During upload, it performs a clone of the record (cloning all the non-unique fields, and dependents). The clone takes into account relationships also mapped. That is, if agent needs to be cloned, and you have mapped agentspecialty, it'll take the agentspecialty mapped (rather than cloning previous's agentspecialty).

    Test cases to consider:

    • Start from a collectionobject with a cataloger, and map some fields. Change some of the values (like, say, lastname and firstname) to of agents that are present in the db. verify that agent gets matched. Note that the match can be performed with just the visible fields, or can also include fields in the database, not included in the query. This is controlled via a preference. By default, to be cautious in matching, it uses just the fields visible in the query.
    • If it is unable to match, it'll clone the existing agent, with data from the sheet. Make an agent with addresses/specialties/variants. Make sure the workbench is able to clone the agent correctly, and if you've provided some dependents in the mapping, it takes it.
    • Similar to workbench, you could customize the match behaviour by changing the matching options (like "never ignore", "ignore when blank", and "always ignore")
  • To-many independent

    Same as to-many dependent. The only difference is that we always perform an update (we don't delete these). If a mapped record is not present, it'll create one, without any matching.

    Test cases to consider:

    • Make collection objects, and assign them collectingevent. Do a query using collectingevent as the base table, and add fields of the CO table. Verify that resetting all fields does not delete the collection object (you'll also need to disable the preference that says to look at all fields for null checks)
    • If a record is not present, it'll create one, if there is a non-empty field.
  • Trees

    There are two different routes to perform tree updates.

    • Workbench method:

      If you want to modify a specific rank, or say reassign species for determination, you'd want to add a specific rank in the query. In this case, it always matches and uploads (and possibly clone), so we don't have updates.
      In the query builder, it'll enforce that you select complete branch of the tree. That is, if your query contains rank "species", and "genus", it'll demand you to add ranks all the way down from "genus" to "species". If used part of a relationship, it'll demand going the way down from "genus" to the lowest rank in the tree.

    • Update method:

      If in the query builder, there is no visible tree rank field, it allows direct modifications (and, thus, updates) to the tree table. This will be useful if you want to, say, update remarks for ones that match name "ploia"

    In both of the above methods, fullname, nodenumber, highestchildnodenumber is completely readonly.

Results

There are 4 new different type of results;

  • NoChange

Reported when the record was meant to updated, but no change occurred. That is, all the values from the db were the same. This is not visible to the user.

  • Updated

Reported when the record's fields were changed. This does not consider relationships (they are reported with different result)

  • Deleted

Reported when a record is deleted. Happens when a dependent's cells are all empty.

  • MatchedAndChanged

Reported when a to-one independent was matched to another record, different than the current one.

  • The side panel also shows the results per table, for different categories.

Preferences

There are three different preference options.

  • Remote Preferenences (2)

  • Defer For Match
    Set by sp7.batchEdit.deferForMatch.This preference controls whether database fields are included for matching or not. Defaults to false.

  • Defer For Null
    Set by sp7.batchEdit.deferForNull.This preference controls whether database fields are included for determining if the record is null or not. For dependents, null records are deleted, so this preference is used to control the caution batch-edit follows

  • User Preferenences (1)

  • Number of query rows

Determines how many number of query results are used for batch-edit. Defaults to 5000.

Rollbacks

Rollbacks are complicated to perform. In the current design, whenever user creates a batch-edit dataset, via the query builder, it makes two datasets. User can only see one of them. The second is a "backer" of the first, and contains a FK to the first (so we can find backer of a dataset later). When rollback is requested, for every row in the main one, we find the original row in the backer, and perform the regular batch-edit update on it. Essentially, it applies original snapshot.

This is highly experimental, so it is recommended to always take a backup of the db, but this should work in a good amount of cases.

Misc

  • Queries from record set are supported.

@realVinayak realVinayak changed the title Wb improvements Workbench, Query Builder backend improvements May 21, 2024
@realVinayak realVinayak linked an issue May 21, 2024 that may be closed by this pull request
2 tasks
@realVinayak realVinayak linked an issue Jun 19, 2024 that may be closed by this pull request
@@ -177,33 +181,33 @@ def from_stringid(cls, stringid, is_relation):
extracted_fieldname, date_part = extract_date_part(field_name)
field = node.get_field(extracted_fieldname, strict=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, you're quite brave for merging production in this branch. Most of the code looks not too bad. Except this one, where I thought some insider info will be necessary.

I know getting rid of TreeRankQuery seems like an easy way, but trust me, TreeRankQuery makes batch-edit very simple. Don't take my word for it. Go to the sibling file batch_edit.py, and you'll see just how minimal changes were necessary to support dataset construction when trees are selected. (IIRC, there are like just 3 places, quite isolated from an abstraction perspective).

Here's the idea behind TreeRankQuery. Basically, each rank is considered as a relationship from tree to itself. So, Kingdom is a to-one relationship from Taxon to Taxon. So, when user selects Collectionobject -> Determination -> Taxon (species, fullname), join path becomes

determination, taxon, species, fullname

Pros

  1. No need for tree_rank and tree_field fields.
  2. It's quite easy to test if join path ends with relationship (just check the last field is relationship or not). You'd previously also have to think about tree_rank and tree_field.
  3. The code, before this merge, already constructs correct queries for something like determination, taxon, species, createdby, firstname, effectively allowing relationships from tree ranks. Try doing that with tree_rank and tree_field!
  4. Most importantly (/s), it makes batch-edit quite simple. For every row, batch-edit dataset construction needs to look at what the IDs are. When you have TreeRankQuery, tree ranks are effectively just like any other relationship.

Cons:

  1. Hard to merge from production, which is a valid reason

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I was reluctant to merge prod here initially but the alternative seemed to be a bigger headache. I'll look into incorporating TreeRankQuery in the PR that follows #5417

@melton-jason melton-jason mentioned this pull request Jan 17, 2025
12 tasks
sharadsw added a commit that referenced this pull request Feb 7, 2025
@sharadsw sharadsw mentioned this pull request Feb 7, 2025
19 tasks
sharadsw added a commit that referenced this pull request Apr 29, 2025
* Enable trees in queries

* Use query construct code from #4929

* Update unit test

* Remove param_count

* Display tree name in query error

* Update TreeRankQuery to fix implicit ORs
- see: #6196 (comment)

* Allow removing last row in Batch Edit

* Make a missing rank info dialog which proceeds to dataset creation

* Augment tree queries with missing ranks for batch edit

* Lint code with ESLint and Prettier

Triggered by 65f7d21 on branch refs/heads/issue-6127

* Adjust apply_batch_edit_pack for multiple trees

* Add a discipline type in tests

* Un-enforce TreeRankRecord in upload plan

* Add loading action to missing ranks dialog

* Move table name to same line in missing fields dialog

* Handle case when rank name has spaces

* Add a close button to missing ranks dialog

* Fix frontend missing field calculation

* Fix frontend to many tree error

* Lint code with ESLint and Prettier

Triggered by 9407376 on branch refs/heads/issue-6127

* Restrict to manys only for tree fields

* Fix multiple rank in row error

* Fix navigator

* Fix tests

* Group missing ranks by tree

* Lint code with ESLint and Prettier

Triggered by 7f2149d on branch refs/heads/issue-6127

* Pass filtered treedef ids to the backend
- Adds checkboxes to tree names in missing ranks dialog
- Splits the main batch edit file into 4 smaller files

* Lint code with ESLint and Prettier

Triggered by 1c29ec9 on branch refs/heads/issue-6127

* Filter trees used when rewriting batch edit dataset

* Fix tests

* Use TreeRankRecord in upload plan

* Remove unused string

* Fix visual order
- For multiple trees, columns will be grouped by tree first

* Revert "Fix visual order"

This reverts commit a8a2ad6.

* Flag to-many in tree only queries

* Lint code with ESLint and Prettier

Triggered by f36521c on branch refs/heads/issue-6127

* Enable nested to-many in Workbench (#6216)

* Enable nested to-many in Workbench

* Update test

* Add upload plan changes

* Update tests

* Lint code with ESLint and Prettier

Triggered by 82033cd on branch refs/heads/issue-2331

* Fix tests

* Lint code with ESLint and Prettier

Triggered by f0822bf on branch refs/heads/issue-2331

* Lint code with ESLint and Prettier

Triggered by f27581b on branch refs/heads/issue-2331

* Lint code with ESLint and Prettier

Triggered by cc1f85b on branch refs/heads/issue-6127

* Check for lowercase tree table names when rewriting tree rank row plan

* Handle None rank

* Fix tree column order

* Fix tests

* Revert back to sort columns
@sharadsw sharadsw closed this Apr 30, 2025
sharadsw added a commit that referenced this pull request May 6, 2025
* Enable trees in queries

* Use query construct code from #4929

* Update unit test

* Enable nested to-many in Workbench

* Update test

* Remove param_count

* Display tree name in query error

* Add upload plan changes

* Update tests

* Lint code with ESLint and Prettier

Triggered by 82033cd on branch refs/heads/issue-2331

* Update TreeRankQuery to fix implicit ORs
- see: #6196 (comment)

* Allow removing last row in Batch Edit

* Make a missing rank info dialog which proceeds to dataset creation

* Augment tree queries with missing ranks for batch edit

* Lint code with ESLint and Prettier

Triggered by 65f7d21 on branch refs/heads/issue-6127

* Adjust apply_batch_edit_pack for multiple trees

* Add a discipline type in tests

* Enable relationships

* Lint code with ESLint and Prettier

Triggered by 350ee9c on branch refs/heads/issue-6126

* Enable data mapper and batch edit preferences

* Fix localizations

* Consider remote to ones as to many in upload plan

* Add remote to ones method

* Un-enforce TreeRankRecord in upload plan

* Add loading action to missing ranks dialog

* Move table name to same line in missing fields dialog

* Handle case when rank name has spaces

* Add a close button to missing ranks dialog

* Fix frontend missing field calculation

* Fix frontend to many tree error

* Lint code with ESLint and Prettier

Triggered by 9407376 on branch refs/heads/issue-6127

* Restrict to manys only for tree fields

* Avoid cloning to-ones when committing
- This was caused because we treat remote to-ones as to-many in the upload plan (affects COGs)

* Fix to many for tree in relationships

* Change revert to rollback in pref localization

* Use TreeRankRecord in upload plan

* Fix multiple rank in row error

* Fix multiple rank in row error

* Fix navigator

* Fix tests

* Group missing ranks by tree

* Lint code with ESLint and Prettier

Triggered by 7f2149d on branch refs/heads/issue-6127

* Pass filtered treedef ids to the backend
- Adds checkboxes to tree names in missing ranks dialog
- Splits the main batch edit file into 4 smaller files

* Lint code with ESLint and Prettier

Triggered by 1c29ec9 on branch refs/heads/issue-6127

* Filter trees used when rewriting batch edit dataset

* Fix tests

* Use TreeRankRecord in upload plan

* Remove unused string

* Fix visual order
- For multiple trees, columns will be grouped by tree first

* Revert "Fix visual order"

This reverts commit a8a2ad6.

* Fix tests

* Lint code with ESLint and Prettier

Triggered by f0822bf on branch refs/heads/issue-2331

* Handle (any rank) mapping for Batch Edit upload plans

* Lint code with ESLint and Prettier

Triggered by 3945527 on branch refs/heads/issue-6126

* Disable spauditlog for BE

* Add title when batch edit is disabled

* Fix deleted cells for many-to-one dependents

* Use variant permissions for creating record sets

* Fix undefined name error
- Using tables doesn't work when the data hasn't loaded correctly

* Disable changing batch edit prefs after upload

* Add validation error for scope change

* Add localization for other WB errors

* Lint code with ESLint and Prettier

Triggered by 34a3459 on branch refs/heads/issue-6126

* Remove description for null record
- Removed for UX reasons. Users do not need to manually remove null record strings

* Ensure at least 1 to-many column gets added to batch edit datasets

* Flag to-many in tree only queries

* Lint code with ESLint and Prettier

Triggered by f36521c on branch refs/heads/issue-6127

* Lint code with ESLint and Prettier

Triggered by 0647fa6 on branch refs/heads/issue-6126

* Enable nested to-many in Workbench (#6216)

* Enable nested to-many in Workbench

* Update test

* Add upload plan changes

* Update tests

* Lint code with ESLint and Prettier

Triggered by 82033cd on branch refs/heads/issue-2331

* Fix tests

* Lint code with ESLint and Prettier

Triggered by f0822bf on branch refs/heads/issue-2331

* Lint code with ESLint and Prettier

Triggered by f27581b on branch refs/heads/issue-2331

* Lint code with ESLint and Prettier

Triggered by cc1f85b on branch refs/heads/issue-6127

* Disable editing any rank tree relationships

* Check for lowercase tree table names when rewriting tree rank row plan

* Lint code with ESLint and Prettier

Triggered by 633a7da on branch refs/heads/issue-6126

* Handle None rank

* Batch edit: Disable editing dataset after rollback (#6428)

* Add rolledback to SpDataset

* Lint code with ESLint and Prettier

Triggered by 7d2a86d on branch refs/heads/issue-6390

* Add text to indicate dataset cannot be edited

* Make hot columns readonly based on context

* Lint code with ESLint and Prettier

Triggered by dcbb593 on branch refs/heads/issue-6390

* Reorder migration

* Fix tree column order

* Fix tests

* Fix tests

* Upgrade celery version (#6437)

* Add rolledback to SpDataset

* Lint code with ESLint and Prettier

Triggered by 7d2a86d on branch refs/heads/issue-6390

* Add text to indicate dataset cannot be edited

* Make hot columns readonly based on context

* Lint code with ESLint and Prettier

Triggered by dcbb593 on branch refs/heads/issue-6390

* Reorder migration

* Upgrade celery and its dependencies

* Revert back to sort columns

* Enable matched and changed when readonly

* Add missing import

* Re-add lost code
sharadsw added a commit that referenced this pull request May 13, 2025
* Enable trees in queries

* Use query construct code from #4929

* Update unit test

* Enable nested to-many in Workbench

* Update test

* Remove param_count

* Display tree name in query error

* Add upload plan changes

* Update tests

* Lint code with ESLint and Prettier

Triggered by 82033cd on branch refs/heads/issue-2331

* Update TreeRankQuery to fix implicit ORs
- see: #6196 (comment)

* Allow removing last row in Batch Edit

* Make a missing rank info dialog which proceeds to dataset creation

* Augment tree queries with missing ranks for batch edit

* Lint code with ESLint and Prettier

Triggered by 65f7d21 on branch refs/heads/issue-6127

* Adjust apply_batch_edit_pack for multiple trees

* Add a discipline type in tests

* Enable relationships

* Lint code with ESLint and Prettier

Triggered by 350ee9c on branch refs/heads/issue-6126

* Enable data mapper and batch edit preferences

* Fix localizations

* Consider remote to ones as to many in upload plan

* Add remote to ones method

* Un-enforce TreeRankRecord in upload plan

* Add loading action to missing ranks dialog

* Move table name to same line in missing fields dialog

* Handle case when rank name has spaces

* Add a close button to missing ranks dialog

* Fix frontend missing field calculation

* Fix frontend to many tree error

* Lint code with ESLint and Prettier

Triggered by 9407376 on branch refs/heads/issue-6127

* Restrict to manys only for tree fields

* Avoid cloning to-ones when committing
- This was caused because we treat remote to-ones as to-many in the upload plan (affects COGs)

* Fix to many for tree in relationships

* Change revert to rollback in pref localization

* Use TreeRankRecord in upload plan

* Fix multiple rank in row error

* Fix multiple rank in row error

* Fix navigator

* Fix tests

* Group missing ranks by tree

* Lint code with ESLint and Prettier

Triggered by 7f2149d on branch refs/heads/issue-6127

* Pass filtered treedef ids to the backend
- Adds checkboxes to tree names in missing ranks dialog
- Splits the main batch edit file into 4 smaller files

* Lint code with ESLint and Prettier

Triggered by 1c29ec9 on branch refs/heads/issue-6127

* Filter trees used when rewriting batch edit dataset

* Fix tests

* Use TreeRankRecord in upload plan

* Remove unused string

* Fix visual order
- For multiple trees, columns will be grouped by tree first

* Revert "Fix visual order"

This reverts commit a8a2ad6.

* Fix tests

* Lint code with ESLint and Prettier

Triggered by f0822bf on branch refs/heads/issue-2331

* Handle (any rank) mapping for Batch Edit upload plans

* Lint code with ESLint and Prettier

Triggered by 3945527 on branch refs/heads/issue-6126

* Disable spauditlog for BE

* Add title when batch edit is disabled

* Fix deleted cells for many-to-one dependents

* Use variant permissions for creating record sets

* Fix undefined name error
- Using tables doesn't work when the data hasn't loaded correctly

* Disable changing batch edit prefs after upload

* Add validation error for scope change

* Add localization for other WB errors

* Lint code with ESLint and Prettier

Triggered by 34a3459 on branch refs/heads/issue-6126

* Remove description for null record
- Removed for UX reasons. Users do not need to manually remove null record strings

* Ensure at least 1 to-many column gets added to batch edit datasets

* Flag to-many in tree only queries

* Lint code with ESLint and Prettier

Triggered by f36521c on branch refs/heads/issue-6127

* Lint code with ESLint and Prettier

Triggered by 0647fa6 on branch refs/heads/issue-6126

* Enable nested to-many in Workbench (#6216)

* Enable nested to-many in Workbench

* Update test

* Add upload plan changes

* Update tests

* Lint code with ESLint and Prettier

Triggered by 82033cd on branch refs/heads/issue-2331

* Fix tests

* Lint code with ESLint and Prettier

Triggered by f0822bf on branch refs/heads/issue-2331

* Lint code with ESLint and Prettier

Triggered by f27581b on branch refs/heads/issue-2331

* Lint code with ESLint and Prettier

Triggered by cc1f85b on branch refs/heads/issue-6127

* Disable editing any rank tree relationships

* Add rolledback to SpDataset

* Lint code with ESLint and Prettier

Triggered by 7d2a86d on branch refs/heads/issue-6390

* Add text to indicate dataset cannot be edited

* Make hot columns readonly based on context

* Lint code with ESLint and Prettier

Triggered by dcbb593 on branch refs/heads/issue-6390

* Check for lowercase tree table names when rewriting tree rank row plan

* Lint code with ESLint and Prettier

Triggered by 633a7da on branch refs/heads/issue-6126

* Reorder migration

* Handle None rank

* Fix result cell link in Batch Edit and Workbench

* Lint code with ESLint and Prettier

Triggered by 4b40eec on branch refs/heads/issue-6164

* Remove auto height

* Remove duplicate imports

* Lint code with ESLint and Prettier

Triggered by 19e6c3e on branch refs/heads/issue-6164
@CarolineDenis CarolineDenis deleted the wb_improvements branch January 2, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅Done

Development

Successfully merging this pull request may close these issues.

10 participants