fix: updated datasets authorization#2721
Conversation
There was a problem hiding this comment.
Hey - I've found 12 issues, and left some high level feedback:
- There are several typos and spelling inconsistencies in the new doc (e.g.
Authoorization,DetasetDeleteOwner,ar applied,ownerhip,Privilieged,DatasetReadyAny) that should be cleaned up to keep the authorization terminology unambiguous. - The names of the instance-level abilities in the tables (
DatasetReadPublic,DatasetReadAccess, etc.) don’t match the earlier listed CASL actions (DatasetReadManyPublic,DatasetReadManyAccess, etc.); aligning these labels or explicitly explaining the mapping would avoid confusion for readers. - The authorization tables are a bit hard to interpret (e.g. the
Readrow has fewer columns than the header, and the last table mixesDeleteas both a column and an operation row); consider tightening the column structure and ensuring each row has consistent columns to make the effective permissions clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There are several typos and spelling inconsistencies in the new doc (e.g. `Authoorization`, `DetasetDeleteOwner`, `ar applied`, `ownerhip`, `Privilieged`, `DatasetReadyAny`) that should be cleaned up to keep the authorization terminology unambiguous.
- The names of the instance-level abilities in the tables (`DatasetReadPublic`, `DatasetReadAccess`, etc.) don’t match the earlier listed CASL actions (`DatasetReadManyPublic`, `DatasetReadManyAccess`, etc.); aligning these labels or explicitly explaining the mapping would avoid confusion for readers.
- The authorization tables are a bit hard to interpret (e.g. the `Read` row has fewer columns than the header, and the last table mixes `Delete` as both a column and an operation row); consider tightening the column structure and ensuring each row has consistent columns to make the effective permissions clearer.
## Individual Comments
### Comment 1
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="1" />
<code_context>
+# Datasets Authoorization
+
+## CASL ability actions
</code_context>
<issue_to_address>
**issue (typo):** Fix spelling in the main heading.
Change the heading to `# Datasets Authorization` to correct the typo.
```suggestion
# Datasets Authorization
```
</issue_to_address>
### Comment 2
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="5" />
<code_context>
+
+## CASL ability actions
+
+This is the list of the permissions methods available for datasets and all their endpoints
+
+### Endpoint authorization
</code_context>
<issue_to_address>
**suggestion (typo):** Clarify the phrase "permissions methods".
Consider rephrasing to "permission methods" or "the permissions available for datasets and all their endpoints", depending on what you want to emphasize.
Suggested implementation:
```
# Datasets Authorization
```
```
This is the list of the permissions available for datasets and all their endpoints
```
</issue_to_address>
### Comment 3
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="24" />
<code_context>
+- DatasetReadAny
+- DatasetUpdateOwner
+- DatasetUpdateAny
+- DetasetDeleteOwner
+- DatasetDeleteAny
+
</code_context>
<issue_to_address>
**issue (typo):** Correct the spelling of "DetasetDeleteOwner".
Probably intended to be `DatasetDeleteOwner` for consistency with the other action names.
```suggestion
- DatasetDeleteOwner
```
</issue_to_address>
### Comment 4
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="29" />
<code_context>
+
+### Implementation
+
+How the different level of authorization translates in data condition applied by the backend.
+
+- Public
</code_context>
<issue_to_address>
**suggestion (typo):** Improve grammar in the description of authorization levels.
Consider rephrasing to: "How the different levels of authorization translate into data conditions applied by the backend."
```suggestion
How the different levels of authorization translate into data conditions applied by the backend.
```
</issue_to_address>
### Comment 5
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="33" />
<code_context>
+
+- Public
+ - isPublished = true
+- Access (condition ar applied in logical _or_)
+ - isPublished = true
+ - ownerGroup is one of the groups that the user belongs
</code_context>
<issue_to_address>
**issue (typo):** Fix typo and pluralization in the Access description.
Change this to "Access (conditions are applied in logical _or_)" to fix the `ar` → `are` typo and pluralize "condition."
```suggestion
- Access (conditions are applied in logical _or_)
```
</issue_to_address>
### Comment 6
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="37" />
<code_context>
+ - isPublished = true
+ - ownerGroup is one of the groups that the user belongs
+ - accessGroups are one of the groups that the user belongs
+ - _sharedWith contains the user's email_ (obsolete, it will removed)
+- Owner
+ - ownerGroup is one of the groups that the user belongs
</code_context>
<issue_to_address>
**issue (typo):** Fix grammar in the parenthetical note.
Change "it will removed" to "it will be removed" for correct grammar.
```suggestion
- _sharedWith contains the user's email_ (obsolete, it will be removed)
```
</issue_to_address>
### Comment 7
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="75-80" />
<code_context>
+
+### Special permissions groups
+
+- Dataset Create Basic (DsCB)
+ These groups are allowed to create datasets for any of the group they belong to, although they are not allowed to assigned the pid to the new dataset.
+ Default: _#nogroup_
+ Special values:
+ - _#all_ : all groups are allowed to create datasets with pid assigned by the system.
+- Dataset Create Extended (DsCE)
+ These groups are allowed to create datasets for any of the group they belong to, and they can assign the pid to the new dataset.
+ Default: _#nogroup_
</code_context>
<issue_to_address>
**issue (typo):** Tighten grammar in the Basic and Extended create descriptions.
Consider updating the shared phrase to "for any of the groups they belong to" and correcting "are not allowed to assigned the pid" to "are not allowed to assign the pid" in the Basic description.
Suggested implementation:
```
- Dataset Create Basic (DsCB)
These groups are allowed to create datasets for any of the groups they belong to, although they are not allowed to assign the pid to the new dataset.
```
```
- Dataset Create Extended (DsCE)
These groups are allowed to create datasets for any of the groups they belong to, and they can assign the pid to the new dataset.
```
</issue_to_address>
### Comment 8
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="99-100" />
<code_context>
+ Special values:
+ - _#DsCB_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own.
+ - _#DsCE_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own.
+- Dataset Update Privileged (DsUP)
+ These groups are allowed to update any datasets independently from the ownerhip.
+ Default: _#nogroup_
+ Special values:
</code_context>
<issue_to_address>
**issue (typo):** Fix spelling of "ownerhip" and consider preposition.
"Ownerhip" should be "ownership". Also consider rephrasing to "independently of the ownership" or "regardless of ownership" for more natural wording.
Suggested implementation:
```
- Dataset Read Privileged (DsRP)
These groups are allowed to read all datasets regardless of ownership.
Default: _#nogroup_
```
There is likely a similar sentence in the **Dataset Update Privileged (DsUP)** section, currently written as:
`These groups are allowed to update any datasets independently from the ownerhip.`
That should be updated to:
`These groups are allowed to update any datasets regardless of ownership.`
and the spelling of "ownerhip" should be corrected to "ownership" if that text exists elsewhere in the file.
</issue_to_address>
### Comment 9
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="118" />
<code_context>
+
+### Authorization special permissions groups
+
+If a user belongs to one of the groups which is listed to any special permission, the permissions listed in this table override the standard permissions.
+When the cell is empty in the following table, the permissions listed in the standard users table are applied.
+A user can belong to multiple groups listed in multiple special permissions. The union of all the permissions is applied.
</code_context>
<issue_to_address>
**suggestion (typo):** Improve grammar in "listed to any special permission".
Consider changing the phrase to "listed in any special permission" to improve the grammar and readability.
```suggestion
If a user belongs to one of the groups which is listed in any special permission, the permissions listed in this table override the standard permissions.
```
</issue_to_address>
### Comment 10
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="122" />
<code_context>
+When the cell is empty in the following table, the permissions listed in the standard users table are applied.
+A user can belong to multiple groups listed in multiple special permissions. The union of all the permissions is applied.
+
+| Operation | Endpoint Authorization | Dataset Read Privileged | Dataset Create Basic | Dataset Create Extended | Dataset Create Privileged | Dataset Update Basic | Dataset Update Privileged | Admin | Dataset Delete Basic | Dataset Delete Privilieged | Delete | Notes |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Create | _DatasetCreate_ | | Owner, w/o PID<br/>_DatasetCreateOwnerNoPid_ | Owner, w/ PID<br/>_DatasetCreateOwnerWithPid_ | Any<br/>_DatasetCreateAny_ | | | Any<br/>_DatasetCreateAny_ | | | | |
</code_context>
<issue_to_address>
**issue (typo):** Fix spelling of "Privilieged" in the table header.
The column header "Dataset Delete Privilieged" has a typo; please change it to "Dataset Delete Privileged".
```suggestion
| Operation | Endpoint Authorization | Dataset Read Privileged | Dataset Create Basic | Dataset Create Extended | Dataset Create Privileged | Dataset Update Basic | Dataset Update Privileged | Admin | Dataset Delete Basic | Dataset Delete Privileged | Delete | Notes |
```
</issue_to_address>
### Comment 11
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="125" />
<code_context>
+| Operation | Endpoint Authorization | Dataset Read Privileged | Dataset Create Basic | Dataset Create Extended | Dataset Create Privileged | Dataset Update Basic | Dataset Update Privileged | Admin | Dataset Delete Basic | Dataset Delete Privilieged | Delete | Notes |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Create | _DatasetCreate_ | | Owner, w/o PID<br/>_DatasetCreateOwnerNoPid_ | Owner, w/ PID<br/>_DatasetCreateOwnerWithPid_ | Any<br/>_DatasetCreateAny_ | | | Any<br/>_DatasetCreateAny_ | | | | |
+| Read | _DatasetRead_ | Any<br/>_DatasetReadAny_ | | | | | Any<br/>_DatasetReadyAny_ | | |
+| Update | _DatasetUpdate_ | | | | | Owner<br/>_DatasetUpdateOwner_ | Any<br/>_DatasetUpdateAny_ | Any<br/>_DatasetUpdateAny_ | | | | |
+| | | | | | | | | |
</code_context>
<issue_to_address>
**issue (typo):** Correct the action name "DatasetReadyAny".
In the `Any<br/>_DatasetReadyAny_` cell, the action name appears misspelled and should likely be `_DatasetReadAny_` to match the earlier entry.
```suggestion
| Read | _DatasetRead_ | Any<br/>_DatasetReadAny_ | | | | | Any<br/>_DatasetReadAny_ | | |
```
</issue_to_address>
### Comment 12
<location path="docs/developer-guide/authorization/authorization_datasets.md" line_range="133" />
<code_context>
+## Priorities
+
+This section lists the connected special permissions groups in order of importance.
+A user will acquire the permissions from the special permissions groups to the most right in the list, he/she belongs to.
+
+- Read
</code_context>
<issue_to_address>
**suggestion (typo):** Improve wording in the priorities explanation.
Consider rephrasing to: "A user will acquire the permissions from the special permissions groups up to the rightmost group in the list they belong to." This avoids the awkward "to the most right in the list" and reads more clearly.
```suggestion
A user will acquire the permissions from the special permissions groups up to the rightmost group in the list they belong to.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
…orization_documentation
## Description This PR includes an update to the new revised Dataset Authorization matrix. ## Summary by Sourcery Update dataset authorization documentation to reflect new endpoints and environment-based configuration for special permission groups. Documentation: - Document additional dataset read and update endpoints in the authorization matrix, including lifecycle and logbook operations. - Clarify dataset delete privilege naming in the authorization matrix table. - Describe environment variables used to configure special dataset permission groups and their mapping from legacy variables.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
abdellah257
left a comment
There was a problem hiding this comment.
Thanks for the Doc, looks clear and comprehensible
| - isPublished = true | ||
| - Access (conditions are applied in logical _or_) | ||
| - isPublished = true | ||
| <<<<<<< HEAD |
| The following list present the environmental variables that should be configured to setup the special groups listed in the previous sections. | ||
| Each variable is a comma separated list of the users' groups that acquired the special permissions linked to the special group. | ||
|
|
||
| - DATASET_READ_PRIVILEGED_GROUPS: groups with __Dataset Read Privileged__ permissions |
There was a problem hiding this comment.
Maybe highlight env variable with `` like : DATASET_READ_PRIVILEGED_GROUPS
fixed mistake in the special permissions groups table
Bold for groups name in list with explanation
There was a problem hiding this comment.
Thanks for this. My main feedback is that it would be helpful for the document to clearly distinguish between the current implementation and the proposed new model.
I'd suggest splitting it into two sections:
Current implemenation:
- The codebase uses CREATE_DATASET_GROUPS, CREATE_DATASET_WITH_PID_GROUPS, CREATE_DATASET_PRIVILEGED_GROUPS, and UPDATE_DATASET_LIFECYCLE_GROUPS.
- UPDATE_DATASET_LIFECYCLE_GROUPS is currently missing from the description.
Proposed authorisation model:
- The proposed permission model (with separate Read Privileged, Update Basic/Privileged, Delete Basic/Privileged groups) is a definite improvement and adds extra granularity
- UPDATE_DATASET_LIFECYCLE_GROUPS isn't discussed in the new permission model - will that stay the same?
- Will there be changes to attachments, origdatablocks, or datablocks to match this new schema? And if so, would those changes be phased in at the same time?
- Setting DATASET_UPDATE_BASIC_GROUPS = "#DsCB" to alias the create basic groups is nice, but to nail down the spec - would these aliases get resolved at startup or dynamically? What happens if the user tries to specify a concrete group and an alias together - is that allowed?
- What's the migration path? Would both the old and new variables work together, or would there be a cutover? Since the defaults differ (CREATE_DATASET_GROUPS defaults to #all; DATASET_CREATE_BASIC_GROUPS defaults to #nogroup), a cutover would have to be well documented as a breaking change.
I hope this feedback is useful; happy to discuss anything further if that'd be helpful. Thanks again.
| - Read | ||
| - GET Datasets | ||
| - GET Datasets/fullquery | ||
| - GET Datasets/fullfacets |
There was a problem hiding this comment.
| - GET Datasets/fullfacets | |
| - GET Datasets/fullfacet |
| Default: _#nogroup_ | ||
| Special values: | ||
| - _#DsCB_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own. | ||
| - _#DsCE_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own. |
There was a problem hiding this comment.
| - _#DsCE_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own. | |
| - _#DsCE_ : all groups listed in _Dataset Create Extended_ are allowed to update the datasets they own. |
| - _#DsCB_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own. | ||
| - _#DsCE_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own. |
There was a problem hiding this comment.
| - _#DsCB_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own. | |
| - _#DsCE_ : all groups listed in _Dataset Create Basic_ are allowed to update the datasets they own. | |
| - _#DsCB_ : all groups listed in _Dataset Create Basic_ are allowed to delete the datasets they own. | |
| - _#DsCE_ : all groups listed in _Dataset Create Extended_ are allowed to delete the datasets they own. |
| These groups are allowed to delete any dataset independently of the ownership. | ||
| Default: _#nogroup_ | ||
| Special values: | ||
| - _#DsCP_ : all groups listed in _Dataset Create Privileged_ are allowed to update any datasets. |
There was a problem hiding this comment.
| - _#DsCP_ : all groups listed in _Dataset Create Privileged_ are allowed to update any datasets. | |
| - _#DsCP_ : all groups listed in _Dataset Create Privileged_ are allowed to delete any datasets. |
| - ADMIN_GROUPS: groups with __Admin__ permissions. This variable effects all the sub-systems. | ||
| - DELETE_GROUPS: groups with __Delete__ permissions. This variable effects all the sub-systems. |
There was a problem hiding this comment.
| - ADMIN_GROUPS: groups with __Admin__ permissions. This variable effects all the sub-systems. | |
| - DELETE_GROUPS: groups with __Delete__ permissions. This variable effects all the sub-systems. | |
| - ADMIN_GROUPS: groups with __Admin__ permissions. This variable affects all the sub-systems. | |
| - DELETE_GROUPS: groups with __Delete__ permissions. This variable affects all the sub-systems. |
| - DATASET_UPDATE_BASIC_GROUPS = "" | ||
| - DATASET_UPDATE_PRIVILEGED_GROUPS = "post_ingestion_tasks" | ||
| - DATASET_DELETE_BASIC_GROUPS = "" | ||
| - DATASET_DELETE_PRIVILEGED_GROUP = "" |
There was a problem hiding this comment.
| - DATASET_DELETE_PRIVILEGED_GROUP = "" | |
| - DATASET_DELETE_PRIVILEGED_GROUPS = "" |
| - DATASET_UPDATE_BASIC_GROUPS = "" | ||
| - DATASET_UPDATE_PRIVILEGED_GROUPS = "" | ||
| - DATASET_DELETE_BASIC_GROUPS = "" | ||
| - DATASET_DELETE_PRIVILEGED_GROUP = "delete_obsolete_datasets" |
There was a problem hiding this comment.
| - DATASET_DELETE_PRIVILEGED_GROUP = "delete_obsolete_datasets" | |
| - DATASET_DELETE_PRIVILEGED_GROUPS = "delete_obsolete_datasets" |
| #### Description | ||
|
|
||
| We need to set up a workflow to delete datasets that are more than 10 years old and are marked with the keyword _obsolete_. | ||
| The process needs to list all datasets that contains the valu e_obsolete_ in the keywords field and have creation time older than 10 years from today. Once the list has been retrieved, it has to iterate through and execute a delte command on each dataset. |
There was a problem hiding this comment.
| The process needs to list all datasets that contains the valu e_obsolete_ in the keywords field and have creation time older than 10 years from today. Once the list has been retrieved, it has to iterate through and execute a delte command on each dataset. | |
| The process needs to list all datasets that contains the value _obsolete_ in the keywords field and have creation time older than 10 years from today. Once the list has been retrieved, it has to iterate through and execute a delete command on each dataset. |
| | Operation | Endpoint Authorization | Dataset Read Privileged | Dataset Create Basic | Dataset Create Extended | Dataset Create Privileged | Dataset Update Basic | Dataset Update Privileged | Admin | Dataset Delete Basic | Dataset Delete Privileged | Delete | Notes | | ||
| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ||
| | Create | _DatasetCreate_ | | Owner, w/o PID<br/>_DatasetCreateOwnerNoPid_ | Owner, w/ PID<br/>_DatasetCreateOwnerWithPid_ | Any<br/>_DatasetCreateAny_ | | | Any<br/>_DatasetCreateAny_ | | | | | | ||
| | Read | _DatasetRead_ | Any<br/>_DatasetReadAny_ | | | | | | Any<br/>_DatasetReadAny_ | | | |
There was a problem hiding this comment.
Two extra empty cells needed to match the header
| | Read | _DatasetRead_ | Any<br/>_DatasetReadAny_ | | | | | | Any<br/>_DatasetReadAny_ | | | | |
| | Read | _DatasetRead_ | Any<br/>_DatasetReadAny_ | | | | | | Any<br/>_DatasetReadAny_ | | | | | |
| These groups are allowed to create datasets for any group, and they can also assign the pid to the new dataset. | ||
| Default: _#nogroup_ | ||
| Special values: | ||
| - _#all_ : all groups are allowed to create datasets with pid assigned by the system |
There was a problem hiding this comment.
| - _#all_ : all groups are allowed to create datasets with pid assigned by the system | |
| - _#all_ : all groups can create datasets with for any group with explicit pid. |
|
Two comments from my side:
|
Description
This PR adds the documentation for new proposed authorization for datasets.
We are trying to simplify the documentation and to update the dataset authorization to meet the latest needs.
Motivation
The authorization in general was generated during the BE v4 migration and is in need for a review and update.
Changes:
Tests included
Documentation
Summary by Sourcery
Documentation: