-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Environment
- DiffSync version: 1.9.0
Proposed Functionality
Currently in DiffSync, the tree hierarchy has shown a few limitations, most notably that objects can not form a tree using the same type. For example, a user cannot form a hierarchy of Locations (Site -> Building -> Floor -> Room). Additionally, this parent/child relationship must be declared when creating the Models, which might not be entirely necessary.
Adapting to a new tree implementation may resolve several outstanding feature requests, such as #239 and #225, as well as allow for the implementation of #122 and a get_parent() method which does not exist today.
This change proposes:
- Declare parent/child object relationships when an object is added to the backend instead of declared during type declaration.
- (Implementation detail) Store parent/child relationships in the back-end only instead of in DiffSyncModel
- Add a
get_parentmethod - Eliminate the need for a call to
obj1.add_child(obj2) - Support arbitrary object hierarchies such as:
- Self-referencing models, such as Location trees (Location in a Location in a Location)
- Objects with an optional parent
Use Case
# Note that MyModelA & MyModelB declarations would not include _children
be1 = MyBackEnd()
a1 = MyModelA(...)
a2 = MyModelA(...)
b3 = MyModelB(...)
b4 = MyModelB(...)
be1.add(a1, parent=None) # Note that parent=None is the default
be1.add(a2, parent=a1) # Specify a parent object
be1.add(b3, parent=a2) # Specify a parent
be1.add(b4, parent=None) # Another object of the same type, but without a parent
# Getter methods
be1.get_parent(a2) # Will return a1
be1.get_parent(a1) # Will return None; though it could also raise an Exception
be1.get_children(a2) # Will return something that iterable (that happens to only have b3 in it)
be1.get_children(b4) # Will return an empty iterator/list
# Alternate signatures to support Model + ids style signatures
# IE when you don't have the original object.
be1.add(a2, parent_model=MyModelA, parent_ids={"k": "v", ...})
be1.get_parent(model=MyModelA, ids={"k": "v", ...})
be1.get_children(model=MyModelA, ids={"k": "v", ...})One thing I haven't figured out entirely is if it makes sense to support moving an object between parents without delete/recreate
- What would the API look like?
- What would diffs look like? How would this be represented?
- How would this change with multiple parents like in Allow childrens to have multiple parents #210?
get_parentturns intoget_parents, but do we keep the original and how does it behave?- Most of this feature request assumes only a single parent
- Does it make sense to add this support in the same feature PR?
- Does it need to be implemented with this feature or just designed in a way that doesn't inhibit this feature?
Implementation Thoughts
I've been thinking about the best place to store the parent/child relationship information. I think it's actually better to store them purely in the backend (in a dedicated tree structure) vs having the DiffSyncModel be aware of the parent/child relationships. Using this method would keep DiffSyncModel focused on storing data instead of getting involved in part of the tree.
@dataclass
class TreeNode:
"""Internal data structure to represent an object in a tree"""
object: DiffSyncModel
parent: None | DiffSyncModel = None
children: None | Dict[str, Dict[str, DiffSyncModel]] = None # children[model_name][identifier] = Object
def add_child(self: Node, child: DiffSyncModel) -> TreeNode:
"""Add a tree to a node"""
n = TreeNode(object=child, parent=self)
self.children = parent.children or {} # or use defaultdict()
self.children.setdefault(child._modelname, {}) # or use defaultdict()
self.children[child._modelname][child.get_identifier()] = n
return n
# How TreeNode could be represented in a backend
class DiffSync:
top_level_nodes: Dict[str, Dict[str, TreeNode]] # top_level_nodes[model][identifier] = TreeNode(...)
all_nodes: Dict[str, Dict[str, TreeNode]] # all_nodes[model][identifier] = TreeNode(...)
# alternative - use a synthetic root node to reduce the number of special-cases for top-level objects
root_node: TreeNode(object=None)
all_nodes: Dict[str, Dict[str, TreeNode]] # all_nodes[model][identifier] = TreeNode(...)
# The root_node / top_level_nodes would establish only the top-level nodes
# The all_nodes would allow for get_parent / get_children method to quickly locate an object in the tree,
# as well as allow DiffSync to enforce key-uniqueness.I'm going a bit off-topic, but instead of TreeNode storing a reference to a DiffSyncModel, it could instead store a reference to just the object's identifier; This might make it easier for Python's GC to do its job at the expense of having to look up an object by its key.
@dataclass
class TreeNode:
"""Internal data structure to represent an object in a tree"""
model: str # Model name
id: str # object key
parent_model: None | str = None # Parent model
parent_id: None | str = None # Parent key
children: None | Dict[str, Set[str]] = None # children[model_name] = Set(ids) # Set of IDs
class DiffSync:
# Key lookup of all objects in the backend
all_objects: Dict[str, Dict[str, DiffSyncModel]] # all_objects[model][identifier] = DiffSyncModel()
# Tree structure
top_level_nodes: Dict[str, Dict[str, TreeNode]] # top_level_nodes[model][identifier] = TreeNode(...)
all_nodes: Dict[str, Dict[str, TreeNode]] # all_nodes[model][identifier] = TreeNode(...)Related Issues
- This may be an appropriate feature for 2.0 2.0 Tracking #232
- Would resolve Remove need to call
add_childANDadd#239, Allow for recursive data models #225 - Could potentially resolve Allow children models to be moved between parents without deletion/creation #122
- May be affected by Allow a parent reference in the identifiers #222 and Allow childrens to have multiple parents #210