diff --git a/CHANGELOG.rst b/CHANGELOG.rst index 0552e620..ef873f7e 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -16,6 +16,8 @@ Change Log Unreleased __________ +* Added support for complex types in dictionaries and lists. + [10.5.0] - 2025-08-19 --------------------- diff --git a/docs/how-tos/add-event-bus-support-to-an-event.rst b/docs/how-tos/add-event-bus-support-to-an-event.rst index 83dc4437..b1a59f53 100644 --- a/docs/how-tos/add-event-bus-support-to-an-event.rst +++ b/docs/how-tos/add-event-bus-support-to-an-event.rst @@ -51,11 +51,56 @@ Complex Data Types -------------------- - Type-annotated Lists (e.g., ``List[int]``, ``List[str]``) +- Type-annotated Dictionaries (e.g., ``Dict[str, int]``, ``Dict[str, str]``) - Attrs Classes (e.g., ``UserNonPersonalData``, ``UserPersonalData``, ``UserData``, ``CourseData``) - Types with Custom Serializers (e.g., ``CourseKey``, ``datetime``) +- Nested Complex Types: + + - Lists containing dictionaries (e.g., ``List[Dict[str, int]]``) + - Dictionaries containing lists (e.g., ``Dict[str, List[int]]``) + - Lists containing attrs classes (e.g., ``List[UserData]``) + - Dictionaries containing attrs classes (e.g., ``Dict[str, CourseData]``) Ensure that the :term:`Event Payload` is structured as `attrs data classes`_ and that the data types used in those classes align with the event bus schema format. +Examples of Complex Data Types Usage +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Here are practical examples of how to use the supported complex data types in your event payloads: + +.. code-block:: python + + # Example 1: Event with type-annotated dictionaries and lists + @attr.s(frozen=True) + class CourseMetricsData: + """ + Course metrics with complex data structures. + """ + # Simple dictionary with string keys and integer values + enrollment_counts = attr.ib(type=dict[str, int], factory=dict) + + # List of dictionaries + grade_distributions = attr.ib(type=List[dict[str, float]], factory=list) + + # Dictionary containing lists + student_groups = attr.ib(type=dict[str, List[str]], factory=dict) + + + # Example 2: Event with nested attrs classes + @attr.s(frozen=True) + class BatchOperationData: + """ + Batch operation with collections of user data. + """ + # List of attrs classes + affected_users = attr.ib(type=List[UserData], factory=list) + + # Dictionary mapping course IDs to course data + courses_mapping = attr.ib(type=dict[str, CourseData], factory=dict) + + # Complex nested structure + operation_results = attr.ib(type=dict[str, List[dict[str, bool]]], factory=dict) + In the ``data.py`` files within each architectural subdomain, you can find examples of the :term:`Event Payload` structured as `attrs data classes`_ that align with the event bus schema format. Step 3: Ensure Serialization and Deserialization @@ -103,7 +148,12 @@ After implementing the serializer, add it to ``DEFAULT_CUSTOM_SERIALIZERS`` at t Now, the :term:`Event Payload` can be serialized and deserialized correctly when sent across services. .. warning:: - One of the known limitations of the current Open edX Event Bus is that it does not support dictionaries as data types. If the :term:`Event Payload` contains dictionaries, you may need to refactor the :term:`Event Payload` to use supported data types. When you know the structure of the dictionary, you can create an attrs class that represents the dictionary structure. If not, you can use a str type to represent the dictionary as a string and deserialize it on the consumer side using JSON deserialization. + The Open edX Event Bus supports type-annotated dictionaries (e.g., ``Dict[str, int]``) and complex nested types. However, dictionaries **without type annotations** are still not supported. Always use proper type annotations for dictionaries and lists in your :term:`Event Payload`. For example: + + - ✅ Supported: ``Dict[str, int]``, ``List[Dict[str, str]]``, ``Dict[str, UserData]`` + - ❌ Not supported: ``dict``, ``list``, ``Dict`` (without type parameters) + + If you need to use unstructured data, consider creating an attrs class that represents the data structure. Step 4: Generate the Avro Schema ==================================== diff --git a/docs/how-tos/create-a-new-event.rst b/docs/how-tos/create-a-new-event.rst index 6970bfb3..6232edf0 100644 --- a/docs/how-tos/create-a-new-event.rst +++ b/docs/how-tos/create-a-new-event.rst @@ -165,6 +165,11 @@ In our example, the event definition and payload for the enrollment event could - Try using nested data classes to group related data together. This will help maintain consistency and make the event more readable. For instance, in the above example, we have grouped the data into User, Course, and Enrollment data. - Try reusing existing data classes if possible to avoid duplicating data classes. This will help maintain consistency and reduce the chances of introducing errors. You can review the existing data classes in :ref:`Data Attributes` to see if there is a data class that fits your use case. - Each field in the payload should be documented with a description of what the field represents and the data type it should contain. This will help consumers understand the payload and react to the event. You should be able to justify why each field is included in the payload and how it relates to the event. +- Use type-annotated complex data types when needed. The event bus supports dictionaries and lists with proper type annotations: + + - ``Dict[str, int]`` for dictionaries with string keys and integer values. + - ``List[UserData]`` for lists containing attrs classes. + - ``Dict[str, List[str]]`` for nested complex structures. - Use defaults for optional fields in the payload to ensure its consistency in all cases. .. note:: When defining the payload, enforce :ref:`Event Bus` compatibility by ensuring that the data types used in the payload align with the event bus schema format. This will help ensure that the event can be sent by the producer and then be re-emitted by the same instance of `OpenEdxPublicSignal`_ on the consumer side, guaranteeing that the data sent and received is identical. For more information about adding event bus support to an event, refer to :ref:`Add Event Bus Support`. diff --git a/openedx_events/event_bus/avro/deserializer.py b/openedx_events/event_bus/avro/deserializer.py index 0c4f0a63..8b07d596 100644 --- a/openedx_events/event_bus/avro/deserializer.py +++ b/openedx_events/event_bus/avro/deserializer.py @@ -41,7 +41,7 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer return deserializer(data) elif data_type in PYTHON_TYPE_TO_AVRO_MAPPING: return data - elif data_type_origin == list: + elif data_type_origin is list: # Returns types of list contents. # Example: if data_type == List[int], arg_data_type = (int,) arg_data_type = get_args(data_type) @@ -52,7 +52,11 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer # Check whether list items type is in basic types. if arg_data_type[0] in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING: return data - elif data_type_origin == dict: + + # Complex nested types like List[List[...]], List[Dict[...]], etc. + item_type = arg_data_type[0] + return [_deserialized_avro_record_dict_to_object(sub_data, item_type, deserializers) for sub_data in data] + elif data_type_origin is dict: # Returns types of dict contents. # Example: if data_type == Dict[str, int], arg_data_type = (str, int) arg_data_type = get_args(data_type) @@ -63,6 +67,17 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer # Check whether dict items type is in basic types. if all(arg in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING for arg in arg_data_type): return data + + # Complex dict values that need recursive deserialization + key_type, value_type = arg_data_type + if key_type is not str: + raise TypeError("Avro maps only support string keys. The key type must be 'str'.") + + # Complex nested types like Dict[str, Dict[...]], Dict[str, List[...]], etc. + return { + key: _deserialized_avro_record_dict_to_object(value, value_type, deserializers) + for key, value in data.items() + } elif hasattr(data_type, "__attrs_attrs__"): transformed = {} for attribute in data_type.__attrs_attrs__: diff --git a/openedx_events/event_bus/avro/schema.py b/openedx_events/event_bus/avro/schema.py index fa508f45..20e5e784 100644 --- a/openedx_events/event_bus/avro/schema.py +++ b/openedx_events/event_bus/avro/schema.py @@ -4,8 +4,7 @@ TODO: Handle optional parameters and allow for schema evolution. https://github.com/edx/edx-arch-experiments/issues/53 """ - -from typing import get_args, get_origin +from typing import Any, Type, get_args, get_origin from .custom_serializers import DEFAULT_CUSTOM_SERIALIZERS from .types import PYTHON_TYPE_TO_AVRO_MAPPING, SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING @@ -74,37 +73,19 @@ def _create_avro_field_definition(data_key, data_type, previously_seen_types, raise Exception("Unable to generate Avro schema for dict or array fields without annotation types.") avro_type = PYTHON_TYPE_TO_AVRO_MAPPING[data_type] field["type"] = avro_type - elif data_type_origin == list: - # Returns types of list contents. - # Example: if data_type == List[int], arg_data_type = (int,) - arg_data_type = get_args(data_type) - if not arg_data_type: - raise TypeError( - "List without annotation type is not supported. The argument should be a type, for eg., List[int]" - ) - avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(arg_data_type[0]) - if avro_type is None: - raise TypeError( - "Only following types are supported for list arguments:" - f" {set(SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.keys())}" - ) - field["type"] = {"type": PYTHON_TYPE_TO_AVRO_MAPPING[data_type_origin], "items": avro_type} - elif data_type_origin == dict: - # Returns types of dict contents. - # Example: if data_type == Dict[str, int], arg_data_type = (str, int) - arg_data_type = get_args(data_type) - if not arg_data_type: - raise TypeError( - "Dict without annotation type is not supported. The argument should be a type, for eg., Dict[str, int]" - ) - avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(arg_data_type[1]) - if avro_type is None: - raise TypeError( - "Only following types are supported for dict arguments:" - f" {set(SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.keys())}" - ) - field["type"] = {"type": PYTHON_TYPE_TO_AVRO_MAPPING[data_type_origin], "values": avro_type} - # Case 3: data_type is an attrs class + # Case 3: data_type is a list (possibly with complex items) + elif data_type_origin is list: + item_avro_type = _get_avro_type_for_list_item( + data_type, previously_seen_types, all_field_type_overrides + ) + field["type"] = {"type": "array", "items": item_avro_type} + # Case 4: data_type is a dictionary (possibly with complex values) + elif data_type_origin is dict: + item_avro_type = _get_avro_type_for_dict_item( + data_type, previously_seen_types, all_field_type_overrides + ) + field["type"] = {"type": "map", "values": item_avro_type} + # Case 5: data_type is an attrs class elif hasattr(data_type, "__attrs_attrs__"): # Inner Attrs Class @@ -135,3 +116,129 @@ def _create_avro_field_definition(data_key, data_type, previously_seen_types, single_type = field["type"] field["type"] = ["null", single_type] return field + + +def _get_avro_type_for_dict_item( + data_type: Type[dict], previously_seen_types: set, type_overrides: dict[Any, str] +) -> str | dict[str, str]: + """ + Determine the Avro type definition for a dictionary value based on its Python type. + + This function converts Python dictionary value types to their corresponding + Avro type representations. It supports simple types, complex nested types (like + dictionaries and lists), and custom classes decorated with attrs. + + Args: + data_type (Type[dict]): The Python dictionary type with its type annotation + (e.g., Dict[str, str], Dict[str, int], Dict[str, List[str]]) + previously_seen_types (set): Set of type names that have already been + processed, used to prevent duplicate record definitions + type_overrides (dict[Any, str]): Dictionary mapping custom Python types to + their Avro type representations + + Returns: + One of the following Avro type representations: + - A string (e.g., "string", "int", "boolean") for simple types + - A dictionary with a complex type definition for container types, such as: + - {"type": "array", "items": } for lists + - {"type": "map", "values": } for nested dictionaries + - {"name": "", "type": "record", "fields": [...]} for attrs classes + - A string with a record name for previously defined record types + + Raises: + TypeError: If the dictionary has no type annotation, has non-string keys, + or contains unsupported value types + """ + # Validate dict has type annotation + # Example: if data_type == Dict[str, int], arg_data_type = (str, int) + arg_data_type = get_args(data_type) + if not arg_data_type: + raise TypeError( + "Dict without annotation type is not supported. The argument should be a type, e.g. Dict[str, int]" + ) + + value_type = arg_data_type[1] + + # Case 1: Simple types mapped in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING + avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(value_type) + if avro_type is not None: + return avro_type + + # Case 2: Complex types (dict, list, or attrs class) + if get_origin(value_type) in (dict, list) or hasattr(value_type, "__attrs_attrs__"): + # Create a temporary field for the value type and extract its type definition + temp_field = _create_avro_field_definition("temp", value_type, previously_seen_types, type_overrides) + return temp_field["type"] + + # Case 3: Unannotated containers (raise specific errors) + if value_type is dict: + raise TypeError("A Dictionary as a dictionary value should have a type annotation.") + if value_type is list: + raise TypeError("A List as a dictionary value should have a type annotation.") + + # Case 4: Unsupported types + raise TypeError(f"Type {value_type} is not supported for dict values.") + + +def _get_avro_type_for_list_item( + data_type: Type[list], previously_seen_types: set, type_overrides: dict[Any, str] +) -> str | dict[str, str]: + """ + Determine the Avro type definition for a list item based on its Python type. + + This function handles conversion of various Python types that can be + contained within a list to their corresponding Avro type representations. + It supports simple types, complex nested types (like dictionaries and lists), + and custom classes decorated with attrs. + + Args: + data_type (Type[list]): The Python list type with its type annotation + (e.g., List[str], List[int], List[Dict[str, str]], etc.) + previously_seen_types (set): Set of type names that have already been + processed, used to prevent duplicate record definitions + type_overrides (dict[Any, str]): Dictionary mapping custom Python types + to their Avro type representations + + Returns: + One of the following Avro type representations: + - A string (e.g., "string", "long", "boolean") for simple types + - A dictionary with a complex type definition for container types, such as: + - {"type": "array", "items": } for lists + - {"type": "map", "values": } for dictionaries + - {"name": "", "type": "record", "fields": [...]} for attrs classes + - A string with a record name for previously defined record types + + Raises: + TypeError: If the list has no type annotation, contains unsupported + types, or contains containers (dict, list) without proper type + annotations + """ + # Validate list has type annotation + # Example: if data_type == List[int], arg_data_type = (int,) + arg_data_type = get_args(data_type) + if not arg_data_type: + raise TypeError( + "List without annotation type is not supported. The argument should be a type, e.g. List[int]" + ) + + item_type = arg_data_type[0] + + # Case 1: Simple types mapped in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING + avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(item_type) + if avro_type is not None: + return avro_type + + # Case 2: Complex types (dict, list, or attrs class) + if get_origin(item_type) in (dict, list) or hasattr(item_type, "__attrs_attrs__"): + # Create a temporary field for the value type and extract its type definition + temp_field = _create_avro_field_definition("temp", item_type, previously_seen_types, type_overrides) + return temp_field["type"] + + # Case 3: Unannotated containers (raise specific errors) + if item_type is dict: + raise TypeError("A Dictionary as a list item should have a type annotation.") + if item_type is list: + raise TypeError("A List as a list item should have a type annotation.") + + # Case 4: Unsupported types + raise TypeError(f"Type {item_type} is not supported for list items.") diff --git a/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+course+notification+requested+v1_schema.avsc b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+course+notification+requested+v1_schema.avsc new file mode 100644 index 00000000..eda67fff --- /dev/null +++ b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+course+notification+requested+v1_schema.avsc @@ -0,0 +1,50 @@ +{ + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "fields": [ + { + "name": "course_notification_data", + "type": { + "name": "CourseNotificationData", + "type": "record", + "fields": [ + { + "name": "course_key", + "type": "string" + }, + { + "name": "app_name", + "type": "string" + }, + { + "name": "notification_type", + "type": "string" + }, + { + "name": "content_url", + "type": "string" + }, + { + "name": "content_context", + "type": { + "type": "map", + "values": "string" + } + }, + { + "name": "audience_filters", + "type": { + "type": "map", + "values": { + "type": "array", + "items": "string" + } + } + } + ] + } + } + ], + "namespace": "org.openedx.learning.course.notification.requested.v1" +} \ No newline at end of file diff --git a/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+discussions+configuration+changed+v1_schema.avsc b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+discussions+configuration+changed+v1_schema.avsc new file mode 100644 index 00000000..906d98e4 --- /dev/null +++ b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+discussions+configuration+changed+v1_schema.avsc @@ -0,0 +1,99 @@ +{ + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "fields": [ + { + "name": "configuration", + "type": { + "name": "CourseDiscussionConfigurationData", + "type": "record", + "fields": [ + { + "name": "course_key", + "type": "string" + }, + { + "name": "provider_type", + "type": "string" + }, + { + "name": "enable_in_context", + "type": "boolean" + }, + { + "name": "enable_graded_units", + "type": "boolean" + }, + { + "name": "unit_level_visibility", + "type": "boolean" + }, + { + "name": "plugin_configuration", + "type": { + "type": "map", + "values": "boolean" + } + }, + { + "name": "contexts", + "type": { + "type": "array", + "items": { + "name": "DiscussionTopicContext", + "type": "record", + "fields": [ + { + "name": "title", + "type": "string" + }, + { + "name": "usage_key", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "group_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "external_id", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "ordering", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "context", + "type": { + "type": "map", + "values": "string" + } + } + ] + } + } + } + ] + } + } + ], + "namespace": "org.openedx.learning.discussions.configuration.changed.v1" +} \ No newline at end of file diff --git a/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+created+v1_schema.avsc b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+created+v1_schema.avsc new file mode 100644 index 00000000..7588caba --- /dev/null +++ b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+created+v1_schema.avsc @@ -0,0 +1,183 @@ +{ + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "fields": [ + { + "name": "thread", + "type": { + "name": "DiscussionThreadData", + "type": "record", + "fields": [ + { + "name": "body", + "type": "string" + }, + { + "name": "commentable_id", + "type": "string" + }, + { + "name": "id", + "type": "string" + }, + { + "name": "truncated", + "type": "boolean" + }, + { + "name": "url", + "type": "string" + }, + { + "name": "user", + "type": { + "name": "UserData", + "type": "record", + "fields": [ + { + "name": "id", + "type": "long" + }, + { + "name": "is_active", + "type": "boolean" + }, + { + "name": "pii", + "type": { + "name": "UserPersonalData", + "type": "record", + "fields": [ + { + "name": "username", + "type": "string" + }, + { + "name": "email", + "type": "string" + }, + { + "name": "name", + "type": "string" + } + ] + } + } + ] + } + }, + { + "name": "course_id", + "type": "string" + }, + { + "name": "thread_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "anonymous", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "anonymous_to_peers", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "title", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "title_truncated", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "group_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "team_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "category_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "category_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "discussion", + "type": [ + "null", + { + "type": "map", + "values": "string" + } + ], + "default": null + }, + { + "name": "user_course_roles", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "user_forums_roles", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "options", + "type": { + "type": "map", + "values": "boolean" + } + } + ] + } + } + ], + "namespace": "org.openedx.learning.forum.thread.created.v1" +} \ No newline at end of file diff --git a/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+response+comment+created+v1_schema.avsc b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+response+comment+created+v1_schema.avsc new file mode 100644 index 00000000..e6526f20 --- /dev/null +++ b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+response+comment+created+v1_schema.avsc @@ -0,0 +1,183 @@ +{ + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "fields": [ + { + "name": "thread", + "type": { + "name": "DiscussionThreadData", + "type": "record", + "fields": [ + { + "name": "body", + "type": "string" + }, + { + "name": "commentable_id", + "type": "string" + }, + { + "name": "id", + "type": "string" + }, + { + "name": "truncated", + "type": "boolean" + }, + { + "name": "url", + "type": "string" + }, + { + "name": "user", + "type": { + "name": "UserData", + "type": "record", + "fields": [ + { + "name": "id", + "type": "long" + }, + { + "name": "is_active", + "type": "boolean" + }, + { + "name": "pii", + "type": { + "name": "UserPersonalData", + "type": "record", + "fields": [ + { + "name": "username", + "type": "string" + }, + { + "name": "email", + "type": "string" + }, + { + "name": "name", + "type": "string" + } + ] + } + } + ] + } + }, + { + "name": "course_id", + "type": "string" + }, + { + "name": "thread_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "anonymous", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "anonymous_to_peers", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "title", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "title_truncated", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "group_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "team_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "category_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "category_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "discussion", + "type": [ + "null", + { + "type": "map", + "values": "string" + } + ], + "default": null + }, + { + "name": "user_course_roles", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "user_forums_roles", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "options", + "type": { + "type": "map", + "values": "boolean" + } + } + ] + } + } + ], + "namespace": "org.openedx.learning.forum.thread.response.comment.created.v1" +} \ No newline at end of file diff --git a/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+response+created+v1_schema.avsc b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+response+created+v1_schema.avsc new file mode 100644 index 00000000..e7bf42d5 --- /dev/null +++ b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+forum+thread+response+created+v1_schema.avsc @@ -0,0 +1,183 @@ +{ + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "fields": [ + { + "name": "thread", + "type": { + "name": "DiscussionThreadData", + "type": "record", + "fields": [ + { + "name": "body", + "type": "string" + }, + { + "name": "commentable_id", + "type": "string" + }, + { + "name": "id", + "type": "string" + }, + { + "name": "truncated", + "type": "boolean" + }, + { + "name": "url", + "type": "string" + }, + { + "name": "user", + "type": { + "name": "UserData", + "type": "record", + "fields": [ + { + "name": "id", + "type": "long" + }, + { + "name": "is_active", + "type": "boolean" + }, + { + "name": "pii", + "type": { + "name": "UserPersonalData", + "type": "record", + "fields": [ + { + "name": "username", + "type": "string" + }, + { + "name": "email", + "type": "string" + }, + { + "name": "name", + "type": "string" + } + ] + } + } + ] + } + }, + { + "name": "course_id", + "type": "string" + }, + { + "name": "thread_type", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "anonymous", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "anonymous_to_peers", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "title", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "title_truncated", + "type": [ + "null", + "boolean" + ], + "default": null + }, + { + "name": "group_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "team_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "category_id", + "type": [ + "null", + "long" + ], + "default": null + }, + { + "name": "category_name", + "type": [ + "null", + "string" + ], + "default": null + }, + { + "name": "discussion", + "type": [ + "null", + { + "type": "map", + "values": "string" + } + ], + "default": null + }, + { + "name": "user_course_roles", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "user_forums_roles", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "options", + "type": { + "type": "map", + "values": "boolean" + } + } + ] + } + } + ], + "namespace": "org.openedx.learning.forum.thread.response.created.v1" +} \ No newline at end of file diff --git a/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+ora+submission+created+v1_schema.avsc b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+ora+submission+created+v1_schema.avsc new file mode 100644 index 00000000..e008c064 --- /dev/null +++ b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+ora+submission+created+v1_schema.avsc @@ -0,0 +1,95 @@ +{ + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "fields": [ + { + "name": "submission", + "type": { + "name": "ORASubmissionData", + "type": "record", + "fields": [ + { + "name": "uuid", + "type": "string" + }, + { + "name": "anonymous_user_id", + "type": "string" + }, + { + "name": "location", + "type": "string" + }, + { + "name": "attempt_number", + "type": "long" + }, + { + "name": "created_at", + "type": "string" + }, + { + "name": "submitted_at", + "type": "string" + }, + { + "name": "answer", + "type": { + "name": "ORASubmissionAnswer", + "type": "record", + "fields": [ + { + "name": "parts", + "type": { + "type": "array", + "items": { + "type": "map", + "values": "string" + } + } + }, + { + "name": "file_keys", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "file_descriptions", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "file_names", + "type": { + "type": "array", + "items": "string" + } + }, + { + "name": "file_sizes", + "type": { + "type": "array", + "items": "long" + } + }, + { + "name": "file_urls", + "type": { + "type": "array", + "items": "string" + } + } + ] + } + } + ] + } + } + ], + "namespace": "org.openedx.learning.ora.submission.created.v1" +} \ No newline at end of file diff --git a/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+user+notification+requested+v1_schema.avsc b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+user+notification+requested+v1_schema.avsc new file mode 100644 index 00000000..743d3a75 --- /dev/null +++ b/openedx_events/event_bus/avro/tests/schemas/org+openedx+learning+user+notification+requested+v1_schema.avsc @@ -0,0 +1,47 @@ +{ + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "fields": [ + { + "name": "notification_data", + "type": { + "name": "UserNotificationData", + "type": "record", + "fields": [ + { + "name": "user_ids", + "type": { + "type": "array", + "items": "long" + } + }, + { + "name": "notification_type", + "type": "string" + }, + { + "name": "content_url", + "type": "string" + }, + { + "name": "app_name", + "type": "string" + }, + { + "name": "course_key", + "type": "string" + }, + { + "name": "context", + "type": { + "type": "map", + "values": "string" + } + } + ] + } + } + ], + "namespace": "org.openedx.learning.user.notification.requested.v1" +} \ No newline at end of file diff --git a/openedx_events/event_bus/avro/tests/test_avro.py b/openedx_events/event_bus/avro/tests/test_avro.py index 884760d8..04a22e75 100644 --- a/openedx_events/event_bus/avro/tests/test_avro.py +++ b/openedx_events/event_bus/avro/tests/test_avro.py @@ -2,7 +2,7 @@ import io import os from datetime import datetime -from typing import List, Union +from typing import Any, List, Union, get_args, get_origin from unittest import TestCase from uuid import UUID, uuid4 @@ -32,78 +32,181 @@ from openedx_events.tooling import KNOWN_UNSERIALIZABLE_SIGNALS, OpenEdxPublicSignal, load_all_signals -def generate_test_data_for_schema(schema): # pragma: no cover +def generate_test_data_for_schema(schema: dict[str, Any]) -> dict: # pragma: no cover """ - Generates a test data dict for the given schema. + Generates test data dict for the given schema. - Arguments: - schema: A JSON representation of an Avro schema + This function creates sample data that conforms to the provided Avro schema + structure. It handles complex nested schemas including records, arrays, + maps, unions, and references to named types. For each field, it generates + sample data according to the field type. + + Args: + schema (dict[str, Any]): The Avro schema as a dictionary Returns: - A dictionary of test data parseable by the schema + dict: Test data according to the schema """ - defaults_per_type = { - 'long': 1, - 'boolean': True, - 'string': "default", - 'double': 1.0, - 'null': None, - 'map': {'key': 'value'}, + DEFAULT_PER_TYPE = { + "long": 1, + "boolean": True, + "string": "default", + "double": 1.0, + "null": None, } - def get_default_value_or_raise(schema_field_type): - try: - return defaults_per_type[schema_field_type] - # 'None' is the default value for type=null so we can't just check if default_value is not None - except KeyError as exc: - raise Exception(f"Unsupported type {schema_field_type}") from exc # pylint: disable=broad-exception-raised - - data_dict = {} - top_level = schema['fields'] - for field in top_level: - key = field['name'] - field_type = field['type'] - - # some fields (like optional ones) accept multiple types. Choose the first one and run with it. - if isinstance(field_type, list): - field_type = field_type[0] - - # if the field_type is a dict, we're either dealing with a list or a custom object - if isinstance(field_type, dict): - sub_field_type = field_type['type'] - if sub_field_type == "array": - # if we're dealing with a list, "items" will be the type of items in the list - data_dict.update({key: [get_default_value_or_raise(field_type['items'])]}) - elif sub_field_type == "record": - # if we're dealing with a record, recurse into the record - data_dict.update({key: generate_test_data_for_schema(field_type)}) - elif sub_field_type == "map": - # if we're dealing with a map, "values" will be the type of values in the map - data_dict.update({key: {"key": get_default_value_or_raise(field_type["values"])}}) - else: - raise Exception(f"Unsupported type {field_type}") # pylint: disable=broad-exception-raised + # Repository for defined types in the schema + defined_types = {} + + def register_defined_types(schema_obj: Any) -> None: + """ + Registers all types defined in the schema for later reference. + + Args: + schema_obj (Any): A schema object which might be a dict, list, + or primitive + """ + if isinstance(schema_obj, dict): + if schema_obj.get("type") == "record" and "name" in schema_obj: + record_name = schema_obj["name"] + defined_types[record_name] = schema_obj + + # Process fields to find more defined types + for field in schema_obj.get("fields", []): + field_type = field.get("type") + register_defined_types(field_type) + + # Process arrays and maps + elif schema_obj.get("type") == "array": + register_defined_types(schema_obj.get("items")) + elif schema_obj.get("type") == "map": + register_defined_types(schema_obj.get("values")) + + def process_schema(schema_obj: Any) -> Any: + """ + Processes a complete schema and generates test data. This is the entry + point for processing. + + Args: + schema_obj (Any): The schema object to process + + Returns: + Generated test data for the schema + """ + # First, we register the types defined throughout the schema + register_defined_types(schema_obj) - # a record is a collection of fields rather than a field itself, so recursively generate and add each field - elif field_type == "record": - data_dict.update([generate_test_data_for_schema(sub_field) for sub_field in field['fields']]) + # Then, we process the schema to generate data + if isinstance(schema_obj, dict) and schema_obj.get("type") == "record": + return process_record(schema_obj) else: - data_dict.update({key: get_default_value_or_raise(field_type)}) + return process_type(schema_obj) + + def process_record(record_schema: dict[str, Any]) -> dict[str, Any]: + """ + Args: + record_schema (dict[str, Any]): A record type schema + + Returns: + A dictionary with all fields populated according to the schema + """ + result = {} + for field in record_schema.get("fields", []): + field_name = field.get("name") + field_type = field.get("type") + + # Process the field type + result[field_name] = process_type(field_type) + + return result + + def process_type(type_spec: Any) -> Any: + """ + Processes any data type in Avro and generates an appropriate test value. + Handles primitive types, complex types, union types, and references to + defined types. + + Args: + type_spec (Any): A type specification which might be a str, dict, or list + + Returns: + Any: An appropriate test value for the specified type + """ + # Primitive types like string, long, boolean, etc. + if isinstance(type_spec, str): + if type_spec in DEFAULT_PER_TYPE: + return DEFAULT_PER_TYPE[type_spec] + + # It's a reference to a previously defined type + if type_spec in defined_types: + return process_type(defined_types[type_spec]) + + return {} + + # Union types (list of possible types) + if isinstance(type_spec, list): + # If null is in the list, we try to return a non-null type + if "null" in type_spec: + for t in type_spec: + if t != "null": + return process_type(t) + # If all types are null, we return None + return None + + # If there's no null, we use the first type + if type_spec: + return process_type(type_spec[0]) + + return None + + # Complex types defined as dictionaries + if isinstance(type_spec, dict): + type_name = type_spec.get("type") + + # Record type (object with fields) + if type_name == "record": + return process_record(type_spec) + + # Array type (list) + elif type_name == "array": + items = type_spec.get("items") + # We create a list with a single element + return [process_type(items)] - return data_dict + # Map type (dictionary/object) + elif type_name == "map": + values = type_spec.get("values") + # We create a dictionary with a single key + return {"key": process_type(values)} + # If the type is directly primitive + elif type_name in DEFAULT_PER_TYPE: + return DEFAULT_PER_TYPE[type_name] -def generate_test_event_data_for_data_type(data_type): # pragma: no cover + # If we can't determine the type, we return None + return None + + # We start processing the schema + return process_schema(schema) + + +def generate_test_event_data_for_data_type(data_type: Any) -> dict: # pragma: no cover """ Generates test data for use in the event bus test cases. Builds data by filling in dummy data for basic data types (int/float/bool/str) and recursively breaks down the classes for nested classes into basic data types. + Also supports complex container types like List[dict[str, str]], Dict[str, List[int]], + List[EventData], and Dict[str, EventData]. - Arguments: - data_type: The type of the data which we are generating data for + Args: + data_type (Any): The type of the data which we are generating data for Returns: - (dict): A data dictionary containing dummy data for all attributes of the class + dict: A data dictionary containing dummy data for all attributes of the class + + Raises: + TypeError: If a dictionary has non-string keys (not compatible with AVRO) """ defaults_per_type = { int: 1, @@ -150,17 +253,89 @@ def generate_test_event_data_for_data_type(data_type): # pragma: no cover dict[str, Union[str, int]]: {'key': 'value'}, dict[str, Union[str, int, float]]: {'key': 1.0}, } - data_dict = {} - for attribute in data_type.__attrs_attrs__: - result = defaults_per_type.get(attribute.type, None) - if result is not None: - data_dict.update({attribute.name: result}) - elif attribute.type in [dict, list]: - # pylint: disable-next=broad-exception-raised - raise Exception("Unable to generate Avro schema for dict or array fields") - else: - data_dict.update({attribute.name: attribute.type(**generate_test_event_data_for_data_type(attribute.type))}) - return data_dict + + # Handle origin types + origin_type = get_origin(data_type) + + if origin_type is not None: + + args = get_args(data_type) + + # Handle List types + if origin_type is list: + + item_type = args[0] + + # Handle List of simple types, e.g. List[str] + if item_type in defaults_per_type: + return [defaults_per_type[item_type]] + + # Handle List of Dicts, e.g. List[Dict[str, str]] + if get_origin(item_type) is dict: + dict_key_type, dict_value_type = get_args(item_type) + # Only support string keys for Avro compatibility + if dict_key_type is not str: + raise TypeError("Avro maps only support string keys. The key type must be 'str'.") + + sample_dict = {} + if get_origin(dict_value_type) is not None: + # Handle nested types in dictionary values, e.g. List[str] + sample_dict = {"key": generate_test_event_data_for_data_type(dict_value_type)} + else: + # Handle simple types in dictionary values, e.g. str + default_value = defaults_per_type.get(dict_value_type, "default_value") + sample_dict = {"key": default_value} + + return [sample_dict] + + # Handle List of attrs classes, e.g. List[EventData] + item_data = generate_test_event_data_for_data_type(item_type) + return [item_data] + + # Handle Dict types + elif origin_type is dict: + + key_type, value_type = args[0], args[1] + + # Only support string keys for Avro compatibility + if key_type is not str: + raise TypeError("Avro maps only support string keys. The key type must be 'str'.") + + # Handle Dict of simple types, e.g. Dict[str, str] + if value_type in defaults_per_type: + return {"key": defaults_per_type[value_type]} + + # Handle Dict of List types, e.g. Dict[str, List[int]] + if get_origin(value_type) is list: + list_item_type = get_args(value_type)[0] + return {"key": [defaults_per_type[list_item_type]]} + + # Handle Dict of attrs classes, e.g. Dict[str, EventData] + value_data = generate_test_event_data_for_data_type(value_type) + return {"key": value_data} + + # Handle attrs classes + if hasattr(data_type, "__attrs_attrs__"): + + data_dict = {} + + for attribute in data_type.__attrs_attrs__: + + result = defaults_per_type.get(attribute.type, None) + # Handle simple types + if result is not None: + data_dict.update({attribute.name: result}) + else: + # Handle origin types in attributes + origin = get_origin(attribute.type) + if origin is not None: + data_dict.update({attribute.name: generate_test_event_data_for_data_type(attribute.type)}) + # Handle attrs classes + if hasattr(attribute.type, "__attrs_attrs__"): + attr_data = generate_test_event_data_for_data_type(attribute.type) + data_dict.update({attribute.name: attr_data}) + + return data_type(**data_dict) def generate_test_data_for_signal(signal: OpenEdxPublicSignal) -> dict: @@ -178,8 +353,7 @@ def generate_test_data_for_signal(signal: OpenEdxPublicSignal) -> dict: """ test_data = {} for key, curr_class in signal.init_data.items(): - example_data = generate_test_event_data_for_data_type(curr_class) - example_data_processed = curr_class(**example_data) + example_data_processed = generate_test_event_data_for_data_type(curr_class) test_data.update({key: example_data_processed}) return test_data diff --git a/openedx_events/event_bus/avro/tests/test_deserializer.py b/openedx_events/event_bus/avro/tests/test_deserializer.py index f8522b03..7590c286 100644 --- a/openedx_events/event_bus/avro/tests/test_deserializer.py +++ b/openedx_events/event_bus/avro/tests/test_deserializer.py @@ -13,6 +13,7 @@ ComplexAttrs, EventData, NestedAttrsWithDefaults, + NestedComplexAttrs, NestedNonAttrs, NonAttrs, SimpleAttrs, @@ -57,7 +58,7 @@ def setUp(self) -> None: }, }, ], - } + }, ), ( ComplexAttrs, @@ -79,8 +80,54 @@ def setUp(self) -> None: }, }, ], - } - ) + }, + ), + ( + NestedComplexAttrs, + { + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "namespace": "simple.signal", + "fields": [ + { + "name": "data", + "type": { + "name": "NestedComplexAttrs", + "type": "record", + "fields": [ + { + "name": "list_of_attr_field", + "type": { + "type": "array", + "items": { + "name": "SimpleAttrs", + "type": "record", + "fields": [ + {"name": "boolean_field", "type": "boolean"}, + {"name": "int_field", "type": "long"}, + {"name": "float_field", "type": "double"}, + {"name": "bytes_field", "type": "bytes"}, + {"name": "string_field", "type": "string"}, + ], + }, + }, + }, + {"name": "dict_of_attr_field", "type": {"type": "map", "values": "SimpleAttrs"}}, + { + "name": "list_of_dict_field", + "type": {"type": "array", "items": {"type": "map", "values": "long"}}, + }, + { + "name": "dict_of_list_field", + "type": {"type": "map", "values": {"type": "array", "items": "long"}}, + }, + ], + }, + } + ], + }, + ), ) @ddt.unpack def test_schema_string(self, data_cls, expected_schema): @@ -281,6 +328,110 @@ def test_deserialization_of_dict_with_annotation(self): self.assertIsInstance(test_data, dict) self.assertEqual(test_data, expected_event_data) + def test_deserialization_of_dict_of_lists(self): + SIGNAL = create_simple_signal({"dict_input": dict[str, List[int]]}) + initial_dict = {"dict_input": {"key1": [1, 2], "key2": [3, 4]}} + + deserializer = AvroSignalDeserializer(SIGNAL) + event_data = deserializer.from_dict(initial_dict) + expected_event_data = {"key1": [1, 2], "key2": [3, 4]} + test_data = event_data["dict_input"] + + self.assertIsInstance(test_data, dict) + self.assertEqual(test_data, expected_event_data) + + def test_deserialization_of_dict_of_event_data(self): + SIGNAL = create_simple_signal({"dict_input": dict[str, EventData]}) + initial_dict = { + "dict_input": { + "key1": { + "course_id": "bar", + "sub_name": "bar.name", + "sub_test_0": {"course_id": "bar1.course", "sub_name": "bar1.name"}, + "sub_test_1": {"course_id": "bar2.course", "sub_name": "bar2.name"}, + }, + "key2": { + "course_id": "foo", + "sub_name": "foo.name", + "sub_test_0": {"course_id": "foo1.course", "sub_name": "foo1.name"}, + "sub_test_1": {"course_id": "foo2.course", "sub_name": "foo2.name"}, + }, + } + } + + deserializer = AvroSignalDeserializer(SIGNAL) + event_data = deserializer.from_dict(initial_dict) + expected_event_data = { + "key1": EventData( + sub_name="bar.name", + course_id="bar", + sub_test_0=SubTestData0(sub_name="bar1.name", course_id="bar1.course"), + sub_test_1=SubTestData1(sub_name="bar2.name", course_id="bar2.course"), + ), + "key2": EventData( + sub_name="foo.name", + course_id="foo", + sub_test_0=SubTestData0(sub_name="foo1.name", course_id="foo1.course"), + sub_test_1=SubTestData1(sub_name="foo2.name", course_id="foo2.course"), + ), + } + test_data = event_data["dict_input"] + + self.assertIsInstance(test_data, dict) + self.assertEqual(test_data, expected_event_data) + + def test_deserialization_of_list_of_dicts(self): + SIGNAL = create_simple_signal({"list_input": List[dict[str, int]]}) + initial_dict = {"list_input": [{"key1": 1, "key2": 2}, {"key1": 3, "key2": 4}]} + + deserializer = AvroSignalDeserializer(SIGNAL) + event_data = deserializer.from_dict(initial_dict) + expected_event_data = [{"key1": 1, "key2": 2}, {"key1": 3, "key2": 4}] + test_data = event_data["list_input"] + + self.assertIsInstance(test_data, list) + self.assertEqual(test_data, expected_event_data) + + def test_deserialization_of_list_of_event_data(self): + SIGNAL = create_simple_signal({"list_input": List[EventData]}) + initial_dict = { + "list_input": [ + { + "course_id": "bar", + "sub_name": "bar.name", + "sub_test_0": {"course_id": "bar1.course", "sub_name": "bar1.name"}, + "sub_test_1": {"course_id": "bar2.course", "sub_name": "bar2.name"}, + }, + { + "course_id": "foo", + "sub_name": "foo.name", + "sub_test_0": {"course_id": "foo1.course", "sub_name": "foo1.name"}, + "sub_test_1": {"course_id": "foo2.course", "sub_name": "foo2.name"}, + }, + ] + } + + deserializer = AvroSignalDeserializer(SIGNAL) + event_data = deserializer.from_dict(initial_dict) + expected_event_data = [ + EventData( + sub_name="bar.name", + course_id="bar", + sub_test_0=SubTestData0(sub_name="bar1.name", course_id="bar1.course"), + sub_test_1=SubTestData1(sub_name="bar2.name", course_id="bar2.course"), + ), + EventData( + sub_name="foo.name", + course_id="foo", + sub_test_0=SubTestData0(sub_name="foo1.name", course_id="foo1.course"), + sub_test_1=SubTestData1(sub_name="foo2.name", course_id="foo2.course"), + ), + ] + test_data = event_data["list_input"] + + self.assertIsInstance(test_data, list) + self.assertEqual(test_data, expected_event_data) + def test_deserialization_of_dict_without_annotation(self): """ Check that deserialization raises error when dict data is not annotated. @@ -298,20 +449,6 @@ def test_deserialization_of_dict_without_annotation(self): with self.assertRaises(TypeError): deserializer.from_dict(initial_dict) - def test_deserialization_of_dict_with_complex_types_fails(self): - SIGNAL = create_simple_signal({"dict_input": Dict[str, list]}) - with self.assertRaises(TypeError): - AvroSignalDeserializer(SIGNAL) - initial_dict = {"dict_input": {"key1": [1, 3], "key2": [4, 5]}} - # create dummy signal to bypass schema check while initializing deserializer - # This allows us to test whether correct exceptions are raised while deserializing data - DUMMY_SIGNAL = create_simple_signal({"dict_input": Dict[str, int]}) - deserializer = AvroSignalDeserializer(DUMMY_SIGNAL) - # Update signal with incorrect type info - deserializer.signal = SIGNAL - with self.assertRaises(TypeError): - deserializer.from_dict(initial_dict) - def test_deserialization_of_dicts_with_keys_of_complex_types_fails(self): SIGNAL = create_simple_signal({"dict_input": Dict[CourseKey, int]}) deserializer = AvroSignalDeserializer(SIGNAL) @@ -319,34 +456,32 @@ def test_deserialization_of_dicts_with_keys_of_complex_types_fails(self): with self.assertRaises(TypeError): deserializer.from_dict(initial_dict) - def test_deserialization_of_nested_list_fails(self): + def test_deserialization_of_unsupported_data_type(self): """ - Check that deserialization raises error when nested list data is passed. + Check that deserialization raises TypeError when encountering an unsupported data type. + + Create a dummy signal with a custom class that isn't in the deserializers dictionary + and doesn't have __attrs_attrs__ to test the final TypeError case. """ - # create dummy signal to bypass schema check while initializing deserializer - # This allows us to test whether correct exceptions are raised while deserializing data - SIGNAL = create_simple_signal({"list_input": List[int]}) - LIST_SIGNAL = create_simple_signal({"list_input": List[List[int]]}) - initial_dict = {"list_input": [[1, 3], [4, 5]]} - deserializer = AvroSignalDeserializer(SIGNAL) - # Update signal with incomplete type info - deserializer.signal = LIST_SIGNAL - with self.assertRaises(TypeError): + # Create a custom class that isn't in the deserializers and doesn't have __attrs_attrs__ + class CustomUnsupportedType: + pass + + # Create a signal with a valid type first to avoid schema validation errors + VALID_SIGNAL = create_simple_signal({"list_input": List[int]}) + INVALID_SIGNAL = create_simple_signal({"list_input": List[CustomUnsupportedType]}) + initial_dict = {"list_input": [1, 2, 3]} + deserializer = AvroSignalDeserializer(VALID_SIGNAL) + # Update signal with invalid type + deserializer.signal = INVALID_SIGNAL + + # Test that it raises TypeError with appropriate message + with self.assertRaises(TypeError) as context: deserializer.from_dict(initial_dict) - def test_deserialization_of_nested_list_with_complex_types_fails(self): - SIGNAL = create_simple_signal({"list_input": List[list]}) - with self.assertRaises(TypeError): - AvroSignalDeserializer(SIGNAL) - initial_dict = {"list_input": [[1, 3], [4, 5]]} - # create dummy signal to bypass schema check while initializing deserializer - # This allows us to test whether correct exceptions are raised while deserializing data - DUMMY_SIGNAL = create_simple_signal({"list_input": List[int]}) - deserializer = AvroSignalDeserializer(DUMMY_SIGNAL) - # Update signal with incorrect type info - deserializer.signal = SIGNAL - with self.assertRaises(TypeError): - deserializer.from_dict(initial_dict) + # Verify the error message mentions the unsupported type + self.assertIn("Unable to deserialize", str(context.exception)) + self.assertIn("CustomUnsupportedType", str(context.exception)) def test_deserialize_bytes_to_event_data(self): """ diff --git a/openedx_events/event_bus/avro/tests/test_schema.py b/openedx_events/event_bus/avro/tests/test_schema.py index a3410643..4c7557ec 100644 --- a/openedx_events/event_bus/avro/tests/test_schema.py +++ b/openedx_events/event_bus/avro/tests/test_schema.py @@ -248,6 +248,10 @@ def test_throw_exception_to_list_or_dict_types_without_annotation(self): DICT_SIGNAL = create_simple_signal({"dict_input": dict}) LIST_WITHOUT_ANNOTATION_SIGNAL = create_simple_signal({"list_input": List}) DICT_WITHOUT_ANNOTATION_SIGNAL = create_simple_signal({"dict_input": Dict}) + LIST_OF_DICT_SIGNAL = create_simple_signal({"list_input": list[dict]}) + LIST_OF_LIST_SIGNAL = create_simple_signal({"list_input": list[list]}) + DICT_OF_LIST_SIGNAL = create_simple_signal({"dict_input": dict[str, list]}) + DICT_OF_DICT_SIGNAL = create_simple_signal({"dict_input": dict[str, dict]}) with self.assertRaises(Exception): schema_from_signal(LIST_SIGNAL) @@ -260,10 +264,29 @@ def test_throw_exception_to_list_or_dict_types_without_annotation(self): with self.assertRaises(TypeError): schema_from_signal(DICT_WITHOUT_ANNOTATION_SIGNAL) - def test_throw_exception_invalid_dict_annotation(self): - INVALID_DICT_SIGNAL = create_simple_signal({"dict_input": Dict[str, NestedAttrsWithDefaults]}) with self.assertRaises(TypeError): - schema_from_signal(INVALID_DICT_SIGNAL) + schema_from_signal(LIST_OF_DICT_SIGNAL) + + with self.assertRaises(TypeError): + schema_from_signal(LIST_OF_LIST_SIGNAL) + + with self.assertRaises(TypeError): + schema_from_signal(DICT_OF_LIST_SIGNAL) + + with self.assertRaises(TypeError): + schema_from_signal(DICT_OF_DICT_SIGNAL) + + def test_throw_exception_to_list_or_dict_types_of_unsupported_types(self): + class UnSupportedClass: + pass + + LIST_SIGNAL = create_simple_signal({"list_input": List[UnSupportedClass]}) + DICT_SIGNAL = create_simple_signal({"dict_input": Dict[str, UnSupportedClass]}) + with self.assertRaises(TypeError): + schema_from_signal(LIST_SIGNAL) + + with self.assertRaises(TypeError): + schema_from_signal(DICT_SIGNAL) def test_list_with_annotation_works(self): LIST_SIGNAL = create_simple_signal({"list_input": List[int]}) @@ -294,3 +317,33 @@ def test_dict_with_annotation_works(self): } schema = schema_from_signal(DICT_SIGNAL) self.assertDictEqual(schema, expected_dict) + + def test_dict_of_list_with_annotation_works(self): + DICT_SIGNAL = create_simple_signal({"dict_input": Dict[str, List[int]]}) + expected_dict = { + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "namespace": "simple.signal", + "fields": [{ + "name": "dict_input", + "type": {"type": "map", "values": {"type": "array", "items": "long"}}, + }], + } + schema = schema_from_signal(DICT_SIGNAL) + self.assertDictEqual(schema, expected_dict) + + def test_list_of_dict_with_annotation_works(self): + LIST_SIGNAL = create_simple_signal({"list_input": List[Dict[str, int]]}) + expected_dict = { + "name": "CloudEvent", + "type": "record", + "doc": "Avro Event Format for CloudEvents created with openedx_events/schema", + "namespace": "simple.signal", + "fields": [{ + "name": "list_input", + "type": {"type": "array", "items": {"type": "map", "values": "long"}}, + }], + } + schema = schema_from_signal(LIST_SIGNAL) + self.assertDictEqual(schema, expected_dict) diff --git a/openedx_events/event_bus/avro/tests/test_utilities.py b/openedx_events/event_bus/avro/tests/test_utilities.py index f594d05b..2319aae8 100644 --- a/openedx_events/event_bus/avro/tests/test_utilities.py +++ b/openedx_events/event_bus/avro/tests/test_utilities.py @@ -46,6 +46,15 @@ class ComplexAttrs: dict_field: dict[str, int] +@attr.s(auto_attribs=True) +class NestedComplexAttrs: + """Class with nested complex type fields""" + list_of_attr_field: list[SimpleAttrs] + dict_of_attr_field: dict[str, SimpleAttrs] + list_of_dict_field: list[dict[str, int]] + dict_of_list_field: dict[str, list[int]] + + @attr.s(auto_attribs=True) class SubTestData0: """Subclass for testing nested attrs""" diff --git a/openedx_events/learning/data.py b/openedx_events/learning/data.py index 014c53ec..d4fa4c8b 100644 --- a/openedx_events/learning/data.py +++ b/openedx_events/learning/data.py @@ -5,7 +5,7 @@ pattern. """ from datetime import datetime -from typing import List, Optional +from typing import List import attr from ccx_keys.locator import CCXLocator @@ -198,10 +198,10 @@ class DiscussionTopicContext: title = attr.ib(type=str) usage_key = attr.ib(type=UsageKey, default=None) - group_id = attr.ib(type=Optional[int], default=None) + group_id = attr.ib(type=int, default=None) external_id = attr.ib(type=str, default=None) ordering = attr.ib(type=int, default=None) - context = attr.ib(type=dict, factory=dict) + context = attr.ib(type=dict[str, str], factory=dict) @attr.s(frozen=True) @@ -227,7 +227,7 @@ class CourseDiscussionConfigurationData: enable_in_context = attr.ib(type=bool, default=True) enable_graded_units = attr.ib(type=bool, default=False) unit_level_visibility = attr.ib(type=bool, default=False) - plugin_configuration = attr.ib(type=dict, default={}) + plugin_configuration = attr.ib(type=dict[str, bool], default=dict) contexts = attr.ib(type=List[DiscussionTopicContext], factory=list) @@ -284,12 +284,12 @@ class UserNotificationData: Data related to a user notification object. Attributes: - user_ids (List(int)): identifier of the users to which the notification belongs. + user_ids (List[int]): identifier of the users to which the notification belongs. notification_type (str): type of the notification. content_url (str): url of the content. app_name (str): name of the app. - course_key (str): identifier of the Course object. - context (dict): additional structured information about the context of the notification. + course_key (CourseKey): identifier of the Course object. + context (dict[str, str]): additional structured information about the context of the notification. """ user_ids = attr.ib(type=List[int]) @@ -297,7 +297,7 @@ class UserNotificationData: content_url = attr.ib(type=str) app_name = attr.ib(type=str) course_key = attr.ib(type=CourseKey) - context = attr.ib(type=dict, factory=dict) + context = attr.ib(type=dict[str, str], factory=dict) @attr.s(frozen=True) @@ -421,26 +421,26 @@ class DiscussionThreadData: options (dict): options for the thread. """ - anonymous = attr.ib(type=bool) - anonymous_to_peers = attr.ib(type=bool) body = attr.ib(type=str) - category_id = attr.ib(type=int) - category_name = attr.ib(type=str) commentable_id = attr.ib(type=str) - group_id = attr.ib(type=int) - id = attr.ib(type=int) - team_id = attr.ib(type=int) - thread_type = attr.ib(type=str) - title = attr.ib(type=str) - title_truncated = attr.ib(type=bool) + id = attr.ib(type=str) truncated = attr.ib(type=bool) url = attr.ib(type=str) user = attr.ib(type=UserData) course_id = attr.ib(type=CourseKey) - discussion = attr.ib(type=dict, factory=dict) + thread_type = attr.ib(type=str, default=None) + anonymous = attr.ib(type=bool, default=None) + anonymous_to_peers = attr.ib(type=bool, default=None) + title = attr.ib(type=str, default=None) + title_truncated = attr.ib(type=bool, default=None) + group_id = attr.ib(type=int, default=None) + team_id = attr.ib(type=int, default=None) + category_id = attr.ib(type=int, default=None) + category_name = attr.ib(type=str, default=None) + discussion = attr.ib(type=dict[str, str], default=None) user_course_roles = attr.ib(type=List[str], factory=list) user_forums_roles = attr.ib(type=List[str], factory=list) - options = attr.ib(type=dict, factory=dict) + options = attr.ib(type=dict[str, bool], factory=dict) @attr.s(frozen=True) @@ -453,10 +453,10 @@ class CourseNotificationData: app_name (str): name of the app requesting the course notification. notification_type (str): type of the notification. content_url (str): url of the content the notification will redirect to. - content_context (dict): additional information related to the content of the notification. + content_context (dict[str, str]): additional information related to the content of the notification. Notification content templates are defined in edx-platform here: https://github.com/openedx/edx-platform/blob/master/openedx/core/djangoapps/notifications/base_notification.py#L10 - audience_filters (dict): additional information related to the audience of the notification. + audience_filters (dict[str, list[str]]): additional information related to the audience of the notification. We can have different filters on course level, such as roles, enrollments, cohorts etc. Example of content_context for a discussion notification: @@ -483,8 +483,8 @@ class CourseNotificationData: app_name = attr.ib(type=str) notification_type = attr.ib(type=str) content_url = attr.ib(type=str) - content_context = attr.ib(type=dict, factory=dict) - audience_filters = attr.ib(type=dict, factory=dict) + content_context = attr.ib(type=dict[str, str], factory=dict) + audience_filters = attr.ib(type=dict[str, List[str]], factory=dict) @attr.s(frozen=True) @@ -504,7 +504,7 @@ class ORASubmissionAnswer: file_urls (List[str]): List of file URLs in the ORA submission. """ - parts = attr.ib(type=List[dict], factory=list) + parts = attr.ib(type=List[dict[str, str]], factory=list) file_keys = attr.ib(type=List[str], factory=list) file_descriptions = attr.ib(type=List[str], factory=list) file_names = attr.ib(type=List[str], factory=list)