Skip to content

schp/demo

Repository files navigation

Architecture

A possible approach to handle the review process is to rely on a relational database to store the events, assign them to individual users and record their responses. Of course there are other solutions (like using message queues or key-value stores), but I think that this is the simplest, though not necessarily the most performant base. A relational database provides a lot of services (transactions, partitioning, queries etc), offloading complexity from the application itself.

The proposed architecture of the application (consisting of a backend and frontend) is as follows:

  • Events are sent by the regional data stores to the backend via a REST endpoint, and then are stored in the database. Due to privacy concerns these events contain only data required to (1) uniquely identify and access the original video and associated metadata and (2) distribute the events to the various teams.

    Note that the REST endpoint can be protected by checking the source IP, using mTLS, requiring a secret token etc. Using a push-mechanism to acquire events seems to be simpler than a pull-mechanism, though in a real-world scenario security and performance requirements may favor the pull-mechanism.

  • Users and teams are registered in the database, which also stores which user belongs to which group. A user could belong to multiple teams if necessary (this would only slightly increase the complexity). Each user has a username (maybe represented by an e-mail address) and a password, stored in an encrypted manner in the database. When a user or team is removed, they are not deleted from a database, just marked as removed, otherwise keeping a record of who did what would be difficult.

    Note that it may make sense to leverage AWS for authentication in a real-world scenario (allowing the use of MFA without a significant amount of development).

  • When a user logs in, a JWT is created for him/her, including the user ID, the ID of the team the user belongs to and an expiration timestamp after which the JWT can no longer be used.

    Note that the advantage of using JWT instead of a session stored in the database is that checking its validity does not require any database access, but on the other hand it is not possible to immediately invalidate it (unless a list of invalid JWTs are maintained in the database).

  • After a user has logged in, he/she can request a batch of events to review from the backend. The backend collects unreviewed events which are either unassigned or whose assignment expired (the expiration period is configurable in the backend), and which can be handled by the team the user belongs to, specified by various rules (see below). At most a given number of events are assigned (configurable in the backend). The assignment is stored in the database.

    Concurrency is handled by locking and transactioning provided by the database. PostgreSQL has a nifty tool to select records for update which are not yet locked for update by another process (thus reducing contention during the assignment phase):

    SELECT ... event data ...
      FROM event
      WHERE ... select the appropriate events ...
      ORDER BY created_at
      LIMIT 10
      FOR UPDATE
      SKIP LOCKED

    After selecting the events, in the same transaction their assignment can be recorded. An important thing to watch out for is to use a consistent ordering to avoid deadlocks.

  • Rules determine which team can handle an event. As per requirements rules can be changed on-the-fly, in a real-world scenario attention should be paid to make this evaluation performant. An efficient solution would be to translate the rules into a stored function/procedure (which can cache its output based on its arguments), updating it whenever the rules change.

    However, for the sake of simplicity here a set of records are used. Each record has either a specific value for the region ID, site ID or device ID columns, or a NULL, which matches any value in the event. It is assumed that records do not overlap, that is, each event will be matched by up to only one record. If an event is not matched by any records, it will be available to any user regardless of his/her team.

  • When a user inspects an event, its data (that is, the small video and associated metadata) is retrieved by the frontend from the appropriate region store and shown in the UI. This way it never reaches the backend.

    Note that there are several ways unauthorized access to the original event data can be blocked. One is to have the regional data stores validate the JWT the user possesses, use ephemeral secret tokens which is generated by the backend (and checked by the regional data stores) when it returns the assignments to the user, or to use mTLS, ensuring that as soon as a user is inactivated the corresponding client certificate is invalidated at the regional data store.

  • When a user submits a review, it is stored in the appropriate event record in the database if the assignment has not expired yet. Although in a real-world scenario it will be probably a rare case that a user tries to review an event whose assignment has expired, one way to prevent it is to ensure that the frontend show events which will expire soon by a different color and removing them from the UI as soon as they expire.

  • Because when the backends assigns events to a user it also considers events whose assignment has expired, there is no need for any special mechanism to keep track of user inactivity.

  • Scalability can be solved in several ways. Because review of events are independent of each other, one possible approach is to use multiple database instances and randomly send incoming events to them. When a user logs in, he/she can be routed to one of the multiple instances. Another option would be to use a distributed key/value store instead of a database, using read-after-write to check if the intended assignment went through or not.

Data model

Users represent users. They have the following properties:

  • email address (primary key)
  • name
  • encoded password
  • team ID (a foreign key to the team table)
  • time when the user was created
  • time when the user was removed (nullable)

Teams represent teams. They have the following properties:

  • team ID (primary key)
  • name
  • time when the team was created
  • time when the team was removed (nullable)

Events represent the small video and associated metadata stored in a regional data store which should be reviewed. They have the following properties:

  • region ID (part of the primary key)
  • unique ID inside the region (part of the primary key)
  • secret token required to access the small video and associated metadata
  • site ID
  • device ID
  • time when the event was received
  • time when the event was last assigned (nullable)
  • ID of the rule based on which the event was last assigned (nullable)
  • ID of user whom the event was last assigned to (nullable)
  • review result: approve/disapprove/not-sure (nullable)
  • time when the event was reviewed (nullable)

Rules represent assignment rules. They have the following properties:

  • rule ID (primary key)
  • team ID (a foreign key to the team table)
  • region ID (nullable)
  • site ID (nullable)
  • device ID (nullable)
  • time when the rule was created
  • time when the rule was removed (nullable)

Usage

In order to run the project, the following is required:

  • PostgreSQL database available at localhost:5432, with a postgres superadmin using trust authentication
  • psql tool compatible with the PostgreSQL database
  • Python3 with the requests package installed
  • npm (I used version 20.16.0)

To initialize the database, run:

init_database.sh

To start the API, run:

npm run start

To test drive the API, run:

7simulator.py`

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published