|
| 1 | +Blob input/output |
| 2 | +================= |
| 3 | + |
| 4 | +`Blob` objects allow binary data to be returned by an Action. This binary data can be passed between Things, or between Things and client code. Using a `Blob` object allows binary data to be efficiently sent over HTTP if required, and allows the same code to run either on the server (without copying the data) or on a client (where data is transferred over HTTP). |
| 5 | + |
| 6 | +If interactions require only simple data types that can easily be represented in JSON, very little thought needs to be given to data types - strings and numbers will be converted to and from JSON automatically, and your Python code should only ever see native Python datatypes whether it's running on the server or a remote client. However, if you want to transfer larger data objects such as images, large arrays or other binary data, you will need to use a `Blob` object. |
| 7 | + |
| 8 | +`Blob` objects are not part of the Web of Things specification, which is most often used with fairly simple data structures in JSON. In LabThings-FastAPI, the `Blob` mechanism is intended to provide an efficient way to work with arbitrary binary data. If it's used to transfer data between two `Thing`s on the same server, the data should not be copied or otherwise iterated over - and when it must be transferred over the network it can be done using a binary transfer, rather than embedding in JSON with base64 encoding. |
| 9 | + |
| 10 | +A `Blob` consists of some data and a MIME type, which sets how the data should be interpreted. It is best to create a subclass of `Blob` with the content type set: this makes it clear what kind of data is in the `Blob`. In the future, it might be possible to add functionality to `Blob` subclasses, for example to make it simple to obtain a `PIL` `Image` object from a `Blob` containing JPEG data. However, this will not yet work across both client and server code. |
| 11 | + |
| 12 | +Creating and using `Blob` objects |
| 13 | +------------------------------------------------ |
| 14 | + |
| 15 | +Blobs can be created from binary data that is in memory (a `bytes` object), on disk (a file), or using a URL as a placeholder. The intention is that the code that uses a `Blob` should not need to know which of these is the case, and should be able to use the same code regardless of how the data is stored. |
| 16 | + |
| 17 | +Blobs offer three ways to access their data: |
| 18 | + |
| 19 | +* A `bytes` object, obtained via the `data` property. For blobs created with a `bytes` object, this simply returns the original data object with no copying. If the data is stored in a file, the file is opened and read when the `data` property is accessed. If the `Blob` references a URL, it is retrieved and returned when `data` is accessed. |
| 20 | +* An `open()` method providing a file-like object. This returns a `BytesIO` wrapper if the `Blob` was created from a `bytes` object or the file if the data is stored on disk. URLs are retrieved, stored as `bytes` and returned wrapped in a `BytesIO` object. |
| 21 | +* A `save` method will either save the data to a file, or copy the existing file on disk. This should be more efficient than loading `data` and writing to a file, if the `Blob` is pointing to a file rather than data in memory. |
| 22 | + |
| 23 | +The intention here is that `Blob` objects may be used identically with data in memory or on disk or even at a remote URL, and the code that uses them should not need to know which is the case. |
| 24 | + |
| 25 | +Examples |
| 26 | +-------- |
| 27 | + |
| 28 | +A camera might want to return an image as a `Blob` object. The code for the action might look like this: |
| 29 | + |
| 30 | +.. code-block:: python |
| 31 | +
|
| 32 | + from labthings_fastapi.blob import Blob |
| 33 | + from labthings_fastapi.thing import Thing |
| 34 | + from labthings_fastapi.decorators import thing_action |
| 35 | +
|
| 36 | + class JPEGBlob(Blob): |
| 37 | + content_type = "image/jpeg" |
| 38 | +
|
| 39 | + class Camera(Thing): |
| 40 | + @thing_action |
| 41 | + def capture_image(self) -> JPEGBlob: |
| 42 | + # Capture an image and return it as a Blob |
| 43 | + image_data = self._capture_image() # This returns a bytes object holding the JPEG data |
| 44 | + return JPEGBlob.from_bytes(image_data) |
| 45 | +
|
| 46 | +The corresponding client code might look like this: |
| 47 | + |
| 48 | +.. code-block:: python |
| 49 | +
|
| 50 | + from PIL import Image |
| 51 | + from labthings_fastapi.client import ThingClient |
| 52 | +
|
| 53 | + camera = ThingClient.from_url("http://localhost:5000/camera/") |
| 54 | + image_blob = camera.capture_image() |
| 55 | + image_blob.save("captured_image.jpg") # Save the image to a file |
| 56 | +
|
| 57 | + # We can also open the image directly with PIL |
| 58 | + with image_blob.open() as f: |
| 59 | + img = Image.open(f) |
| 60 | + img.show() # This will display the image in a window |
| 61 | +
|
| 62 | +We could define a more sophisticated camera that can capture raw images and convert them to JPEG, using two actions: |
| 63 | + |
| 64 | +.. code-block:: python |
| 65 | +
|
| 66 | + from labthings_fastapi.blob import Blob |
| 67 | + from labthings_fastapi.thing import Thing |
| 68 | + from labthings_fastapi.decorators import thing_action |
| 69 | +
|
| 70 | + class JPEGBlob(Blob): |
| 71 | + content_type = "image/jpeg" |
| 72 | +
|
| 73 | + class RAWBlob(Blob): |
| 74 | + content_type = "image/x-raw" |
| 75 | +
|
| 76 | + class Camera(Thing): |
| 77 | + @thing_action |
| 78 | + def capture_raw_image(self) -> RAWBlob: |
| 79 | + # Capture a raw image and return it as a Blob |
| 80 | + raw_data = self._capture_raw_image() # This returns a bytes object holding the raw data |
| 81 | + return RAWBlob.from_bytes(raw_data) |
| 82 | + |
| 83 | + @thing_action |
| 84 | + def convert_raw_to_jpeg(self, raw_blob: RAWBlob) -> JPEGBlob: |
| 85 | + # Convert a raw image Blob to a JPEG Blob |
| 86 | + jpeg_data = self._convert_raw_to_jpeg(raw_blob.data) # This returns a bytes object holding the JPEG data |
| 87 | + return JPEGBlob.from_bytes(jpeg_data) |
| 88 | + |
| 89 | + @thing_action |
| 90 | + def capture_image(self) -> JPEGBlob: |
| 91 | + # Capture an image and return it as a Blob |
| 92 | + raw_blob = self.capture_raw_image() # Capture the raw image |
| 93 | + jpeg_blob = self.convert_raw_to_jpeg(raw_blob) # Convert the raw image to JPEG |
| 94 | + return jpeg_blob # Return the JPEG Blob |
| 95 | + # NB the `raw_blob` is not retained after this action completes, so it will be garbage collected |
| 96 | +
|
| 97 | +On the client, we can use the `capture_image` action directly (as before), or we can capture a raw image and convert it to JPEG: |
| 98 | + |
| 99 | +.. code-block:: python |
| 100 | +
|
| 101 | + from PIL import Image |
| 102 | + from labthings_fastapi.client import ThingClient |
| 103 | +
|
| 104 | + camera = ThingClient.from_url("http://localhost:5000/camera/") |
| 105 | + |
| 106 | + # Capture a JPEG image directly |
| 107 | + jpeg_blob = camera.capture_image() |
| 108 | + jpeg_blob.save("captured_image.jpg") |
| 109 | +
|
| 110 | + # Alternatively, capture a raw image and convert it to JPEG |
| 111 | + raw_blob = camera.capture_raw_image() # NB the raw image is not yet downloaded |
| 112 | + jpeg_blob = camera.convert_raw_to_jpeg(raw_blob) |
| 113 | + jpeg_blob.save("converted_image.jpg") |
| 114 | +
|
| 115 | + raw_blob.save("raw_image.raw") # Download and save the raw image to a file |
| 116 | +
|
| 117 | +
|
| 118 | +Using `Blob` objects as inputs |
| 119 | +------------------------------ |
| 120 | + |
| 121 | +`Blob` objects may be used as either the input or output of an action. There are relatively few good use cases for `Blob` inputs to actions, but a possible example would be image capture: one action could perform a quick capture of raw data, and another action could convert the raw data into a useful image. The output of the capture action would be a `Blob` representing the raw data, which could be passed to the conversion action. |
| 122 | + |
| 123 | +Because `Blob` outputs are represented in JSON as links, they are downloaded with a separate HTTP request if needed. There is currently no way to create a `Blob` on the server via HTTP, which means remote clients can use `Blob` objects provided in the output of actions but they cannot yet upload data to be used as input. However, it is possible to pass the URL of a `Blob` that already exists on the server as input to a subsequent Action. This means, in the example above of raw image capture, a remote client over HTTP can pass the raw `Blob` to the conversion action, and the raw data need never be sent over the network. |
| 124 | + |
| 125 | +Memory management and retention |
| 126 | +------------------------------- |
| 127 | + |
| 128 | +Management of `Blob` objects is currently very basic: when a `Blob` object is returned in the output of an Action that has been called via the HTTP interface, a fixed 5 minute expiry is used. This should be improved in the future to avoid memory management issues. |
| 129 | + |
| 130 | +The behaviour is different when actions are called from other actions. If `action_a` calls `action_b`, and `action_b` returns a `Blob`, that `Blob` will be subject to Python's usual garbage collection rules when `action_a` ends - i.e. it will not be retained unless it is included in the output of `action_a`. |
| 131 | + |
| 132 | +HTTP interface and serialization |
| 133 | +----------------------- |
| 134 | + |
| 135 | +`Blob` objects are subclasses of `pydantic.BaseModel`, which means they can be serialized to JSON and deserialized from JSON. When this happens, the `Blob` is represented as a JSON object with two fields: `url` and `content_type`. The `url` field is a link to the data. The `content_type` field is a string representing the MIME type of the data. When a `Blob` is serialized, a URL is generated with a unique ID to allow it to be downloaded. However, only a weak reference is held to the `Blob`. Once an Action has finished running, the only strong reference to the `Blob` should be held by the output property of the action invocation. The `Blob` should be garbage collected once the output is no longer required, i.e. when the invocation is discarded - currently 5 minutes after the action completes, once the maximum number of invocations has been reached or when it is explicitly deleted by the client. |
0 commit comments