Skip to content

Commit dd6095c

Browse files
committed
Improved docstrings and docs for Blob
1 parent fdd4943 commit dd6095c

File tree

4 files changed

+156
-46
lines changed

4 files changed

+156
-46
lines changed

docs/source/blobs.rst

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,14 +122,30 @@ Using :class:`.Blob` objects as inputs
122122

123123
Because :class:`.Blob` outputs are represented in JSON as links, they are downloaded with a separate HTTP request if needed. There is currently no way to create a :class:`.Blob` on the server via HTTP, which means remote clients can use :class:`.Blob` objects provided in the output of actions but they cannot yet upload data to be used as input. However, it is possible to pass the URL of a :class:`.Blob` that already exists on the server as input to a subsequent Action. This means, in the example above of raw image capture, a remote client over HTTP can pass the raw :class:`.Blob` to the conversion action, and the raw data need never be sent over the network.
124124

125+
126+
HTTP interface and serialization
127+
--------------------------------
128+
129+
:class:`.Blob` objects are subclasses of `pydantic.BaseModel`, which means they can be serialized to JSON and deserialized from JSON. When this happens, the :class:`.Blob` is represented as a JSON object with :prop:`.Blob.url` and :prop:`.Blob.content_type` fields. The :prop:`.Blob.url` field is a link to the data. The :prop:`.Blob.content_type` field is a string representing the MIME type of the data. It is worth noting that models may be nested: this means an action may return many :class:`.Blob` objects in its output, either as a list or as fields in a :class:`pydantic.BaseModel` subclass. Each :class:`.Blob` in the output will be serialized to JSON with its URL and content type, and the client can then download the data from the URL, one download per :class:`.Blob` object.
130+
131+
When a :class:`.Blob` is serialized, the URL is generated with a unique ID to allow it to be downloaded. The URL is not guaranteed to be permanent, and should not be used as a long-term reference to the data. The URL will expire after 5 minutes, and the data will no longer be available for download after that time.
132+
133+
In order to run an action and download the data, currently an HTTP client must:
134+
135+
* Call the action that returns a :class:`.Blob` object, which will return a JSON object representing the invocation.
136+
* Poll the invocation until it is complete, and the :class:`.Blob` is available in its ``output`` property with the URL and content type.
137+
* Download the data from the URL in the :class:`.Blob` object, which will return the binary data.
138+
139+
It may be possible to have actions return binary data directly in the future, but this is not yet implemented.
140+
141+
125142
Memory management and retention
126143
-------------------------------
127144

128145
Management of :class:`.Blob` objects is currently very basic: when a :class:`.Blob` object is returned in the output of an Action that has been called via the HTTP interface, a fixed 5 minute expiry is used. This should be improved in the future to avoid memory management issues.
129146

147+
When a :class:`.Blob` is serialized, a URL is generated with a unique ID to allow it to be downloaded. However, only a weak reference is held to the :class:`.Blob`. Once an Action has finished running, the only strong reference to the :class:`.Blob` should be held by the output property of the action invocation. The :class:`.Blob` should be garbage collected once the output is no longer required, i.e. when the invocation is discarded - currently 5 minutes after the action completes, once the maximum number of invocations has been reached or when it is explicitly deleted by the client.
148+
130149
The behaviour is different when actions are called from other actions. If `action_a` calls `action_b`, and `action_b` returns a :class:`.Blob`, that :class:`.Blob` will be subject to Python's usual garbage collection rules when `action_a` ends - i.e. it will not be retained unless it is included in the output of `action_a`.
131150

132-
HTTP interface and serialization
133-
--------------------------------
134151

135-
:class:`.Blob` objects are subclasses of `pydantic.BaseModel`, which means they can be serialized to JSON and deserialized from JSON. When this happens, the :class:`.Blob` is represented as a JSON object with two fields: `url` and `content_type`. The `url` field is a link to the data. The `content_type` field is a string representing the MIME type of the data. When a :class:`.Blob` is serialized, a URL is generated with a unique ID to allow it to be downloaded. However, only a weak reference is held to the :class:`.Blob`. Once an Action has finished running, the only strong reference to the :class:`.Blob` should be held by the output property of the action invocation. The :class:`.Blob` should be garbage collected once the output is no longer required, i.e. when the invocation is discarded - currently 5 minutes after the action completes, once the maximum number of invocations has been reached or when it is explicitly deleted by the client.

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727

2828
autodoc2_packages = ["../../src/labthings_fastapi"]
2929
autodoc2_render_plugin = "myst"
30+
autodoc2_class_docstring = "both"
3031

3132
# autoapi_dirs = ["../../src/labthings_fastapi"]
3233
# autoapi_ignore = []

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Documentation for LabThings-FastAPI
1212
dependencies/dependencies.rst
1313
blobs.rst
1414
concurrency.rst
15-
client_code.rst
15+
using_things.rst
1616

1717
apidocs/index
1818

src/labthings_fastapi/outputs/blob.py

Lines changed: 135 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,45 @@
1-
"""BLOB Output Module
1+
"""
2+
# BLOB Output Module
23
34
The BlobOutput class is used when you need to return something file-like that can't
45
easily (or efficiently) be converted to JSON. This is useful for returning large objects
56
like images, especially where an existing file-type is the obvious way to handle it.
67
8+
There is a [dedicated documentation page on blobs](/blobs.rst) that explains how to use
9+
this mechanism.
10+
711
To return a file from an action, you should declare its return type as a BlobOutput
8-
subclass, defining the `media_type` attribute.
12+
subclass, defining the
13+
[`media_type`](#labthings_fastapi.outputs.blob.Blob.media_type) attribute.
14+
15+
```python
16+
class MyImageBlob(Blob):
17+
media_type = "image/png"
18+
19+
class MyThing(Thing):
20+
@thing_action
21+
def get_image(self) -> MyImageBlob:
22+
# Do something to get the image data
23+
data = self._get_image_data()
24+
return MyImageBlob.from_bytes(data)
25+
```
926
1027
The action should then return an instance of that subclass, with data supplied
1128
either as a `bytes` object or a file on disk. If files are used, it's your
12-
responsibility to ensure the file is deleted after the `BlobOutput` object is
13-
garbage-collected. Constructing it using the class methods `from_bytes` or
14-
`from_temporary_directory` will ensure this is done for you.
29+
responsibility to ensure the file is deleted after the
30+
[`Blob`](#labthings_fastapi.outputs.blob.Blob) object is
31+
garbage-collected. Constructing it using the class methods
32+
[`from_bytes`](#labthings_fastapi.outputs.blob.Blob.from_bytes) or
33+
[`from_temporary_directory`](#labthings_fastapi.outputs.blob.Blob.from_temporary_directory)
34+
will ensure this is done for you.
1535
1636
Bear in mind a `tempfile` object only holds a file descriptor and is not safe for
17-
concurrent use: action outputs may be retrieved multiple times after the action has
18-
completed. Creating a temp folder and making a file inside it is the safest way to
19-
deal with this.
37+
concurrent use, which does not work well with the HTTP API:
38+
action outputs may be retrieved multiple times after the action has
39+
completed, possibly concurrently. Creating a temp folder and making a file inside it
40+
with
41+
[`from_temporary_directory`](#labthings_fastapi.outputs.blob.Blob.from_temporary_directory)
42+
is the safest way to deal with this.
2043
"""
2144

2245
from __future__ import annotations
@@ -53,10 +76,17 @@
5376
@runtime_checkable
5477
class BlobData(Protocol):
5578
"""The interface for the data store of a Blob.
56-
57-
:class:`.Blob` objects can represent their data in various ways. Each of
79+
80+
[`Blob`](#labthings_fastapi.outputs.blob.Blob) objects can represent their data in various ways. Each of
5881
those options must provide three ways to access the data, which are the
5982
`content` property, the `save()` method, and the `open()` method.
83+
84+
This protocol defines the interface needed by any data store used by a
85+
[`Blob`](#labthings_fastapi.outputs.blob.Blob).
86+
87+
Objects that are used on the server will additionally need to implement the
88+
[`ServerSideBlobData`](#labthings_fastapi.outputs.blob.ServerSideBlobData) protocol,
89+
which adds a `response()` method and `id` property.
6090
"""
6191

6292
@property
@@ -80,10 +110,14 @@ def open(self) -> io.IOBase:
80110

81111
class ServerSideBlobData(BlobData, Protocol):
82112
"""A BlobData protocol for server-side use, i.e. including `response()`
83-
84-
:class:`Blob` objects returned by actions must use :class:`.BlobData` objects
85-
that can be downloaded. This protocol extends the :class:`.BlobData` protocol to
86-
include a :meth:`~.ServerSideBlobData.response()` method that returns a FastAPI response object.
113+
114+
[`Blob`](#labthings_fastapi.outputs.blob.Blob) objects returned by actions must use
115+
[`BlobData`](#labthings_fastapi.outputs.blob.BlobData) objects
116+
that can be downloaded. This protocol extends that protocol to
117+
include a [`response()`](#labthings_fastapi.outputs.blob.ServerSideBlobData.response) method that returns a FastAPI response object.
118+
119+
See [`BlobBytes`](#labthings_fastapi.outputs.blob.BlobBytes) or
120+
[`BlobFile`](#labthings_fastapi.outputs.blob.BlobFile) for concrete implementations.
87121
"""
88122

89123
id: Optional[uuid.UUID] = None
@@ -157,13 +191,12 @@ def response(self) -> Response:
157191
class Blob(BaseModel):
158192
"""A container for binary data that may be retrieved over HTTP
159193
160-
See :doc:`blobs` for more information on how to use this class.
161-
162-
A :class:`.Blob` may be created to hold data using the class methods
194+
See the [documentation on blobs](/blobs.rst) for more information on how to use this class.
195+
196+
A [`Blob`](#labthings_fastapi.outputs.blob.Blob) may be created
197+
to hold data using the class methods
163198
`from_bytes` or `from_temporary_directory`. The constructor will
164-
attempt to deserialise a Blob from a URL, and may only be used within
165-
a `blob_serialisation_context_manager`. This is made available when
166-
actions are invoked, or when their output is returned.
199+
attempt to deserialise a Blob from a URL (see `__init__` method).
167200
168201
You are strongly advised to subclass this class and specify the
169202
`media_type` attribute, as this will propagate to the auto-generated
@@ -182,21 +215,29 @@ class Blob(BaseModel):
182215
)
183216

184217
_data: Optional[ServerSideBlobData] = None
185-
"""This object holds the data, either in memory or as a file."""
218+
"""This object holds the data, either in memory or as a file.
219+
220+
If `_data` is `None`, then the Blob has not been deserialised yet, and the
221+
`href` should point to a valid address where the data may be downloaded.
222+
"""
186223

187224
@model_validator(mode="after")
188225
def retrieve_data(self):
189226
"""Retrieve the data from the URL
190-
191-
When a :class:`.Blob` is created using its constructor, :mod:`pydantic`
227+
228+
When a [`Blob`](#labthings_fastapi.outputs.blob.Blob) is created
229+
using its constructor, [`pydantic`](https://docs.pydantic.dev/latest/)
192230
will attempt to deserialise it by retrieving the data from the URL
193-
specified in `href`. Currently, this must be a URL pointing to a
194-
:class:`.Blob` that already exists on this server.
231+
specified in `href`. Currently, this must be a URL pointing to a
232+
[`Blob`](#labthings_fastapi.outputs.blob.Blob) that already exists on
233+
this server.
195234
196235
This validator will only work if the function to resolve URLs to
197-
:class:`.BlobData` objects has been set in the context variable. This
198-
is done when actions are being invoked over HTTP, or when
199-
their outputs are being returned.
236+
[`BlobData`](#labthings_fastapi.outputs.blob.BlobData) objects
237+
has been set in the context variable
238+
[`url_to_blobdata_ctx`](#labthings_fastapi.outputs.blob.url_to_blobdata_ctx).
239+
This is done when actions are being invoked over HTTP by the
240+
[`BlobIOContextDep`](#labthings_fastapi.outputs.blob.BlobIOContextDep) dependency.
200241
"""
201242
if self.href == "blob://local":
202243
if self._data:
@@ -216,17 +257,19 @@ def retrieve_data(self):
216257
@model_serializer(mode="plain", when_used="always")
217258
def to_dict(self) -> Mapping[str, str]:
218259
"""Serialise the Blob to a dictionary and make it downloadable
219-
220-
When :mod:`pydantic` serialises this object, it will call this method
221-
to convert it to a dictionary. There is a significant side-effect, which
222-
is that we will add the blob to the :class:`.BlobDataManager` so it
260+
261+
When [`pydantic`](https://docs.pydantic.dev/latest/) serialises this object,
262+
it will call this method to convert it to a dictionary. There is a
263+
significant side-effect, which is that we will add the blob to the
264+
[`BlobDataManager`](#labthings_fastapi.outputs.blob.BlobDataManager) so it
223265
can be downloaded.
224-
225-
This serialiser will only work if the function to resolve URLs to
226-
:class:`.BlobData` objects has been set in the context variable. This
227-
is done when the outputs of actions are being returned.
228266
229-
Note that the
267+
This serialiser will only work if the function to assign URLs to
268+
[`BlobData`](#labthings_fastapi.outputs.blob.BlobData) objects
269+
has been set in the context variable
270+
[`blobdata_to_url_ctx`](#labthings_fastapi.outputs.blob.blobdata_to_url_ctx).
271+
This is done when actions are being returned over HTTP by the
272+
[`BlobIOContextDep`](#labthings_fastapi.outputs.blob.BlobIOContextDep) dependency.
230273
"""
231274
if self.href == "blob://local":
232275
try:
@@ -249,10 +292,30 @@ def to_dict(self) -> Mapping[str, str]:
249292

250293
@classmethod
251294
def default_media_type(cls) -> str:
295+
"""The default media type.
296+
297+
`Blob` should generally be subclassed to define the default media type,
298+
as this forms part of the auto-generated documentation. Using the
299+
`Blob` class directly will result in a media type of `*/*`, which makes
300+
it unclear what format the output is in.
301+
"""
252302
return cls.model_fields["media_type"].get_default()
253303

254304
@property
255305
def data(self) -> ServerSideBlobData:
306+
"""The data store for this Blob
307+
308+
`Blob` objects may hold their data in various ways, defined by the
309+
[`ServerSideBlobData`](#labthings_fastapi.outputs.blob.ServerSideBlobData)
310+
protocol. This property returns the data store for this `Blob`.
311+
312+
If the `Blob` has not yet been downloaded, there may be no data
313+
held locally, in which case this function will raise a `ValueError`.
314+
315+
It is recommended to use the `content` property or `save()` or `open()`
316+
methods rather than accessing this property directly. Those methods will
317+
download data if required, rather than raising an error.
318+
"""
256319
if self._data is None:
257320
raise ValueError("This Blob has no data.")
258321
return self._data
@@ -329,7 +392,16 @@ def response(self):
329392

330393

331394
def blob_type(media_type: str) -> type[Blob]:
332-
"""Create a BlobOutput subclass for a given media type"""
395+
"""Create a BlobOutput subclass for a given media type
396+
397+
This convenience function may confuse static type checkers, so it is usually
398+
clearer to make a subclass instead, e.g.:
399+
400+
```python
401+
class MyImageBlob(Blob):
402+
media_type = "image/png"
403+
```
404+
"""
333405
if "'" in media_type or "\\" in media_type:
334406
raise ValueError("media_type must not contain single quotes or backslashes")
335407
return create_model(
@@ -342,17 +414,20 @@ def blob_type(media_type: str) -> type[Blob]:
342414
class BlobDataManager:
343415
"""A class to manage BlobData objects
344416
345-
The BlobManager is responsible for serving `Blob` objects to clients. It
417+
The `BlobManager` is responsible for serving `Blob` objects to clients. It
346418
holds weak references: it will not retain `Blob`s that are no longer in use.
347-
Most `Blob`s will be retained"""
419+
Most `Blob`s will be retained by the output of an action: this holds a strong
420+
reference, and will be expired by the
421+
[`ActionManager`](#labthings_fastapi.actions.ActionManager).
422+
"""
348423

349424
_blobs: WeakValueDictionary[uuid.UUID, ServerSideBlobData]
350425

351426
def __init__(self):
352427
self._blobs = WeakValueDictionary()
353428

354429
def add_blob(self, blob: ServerSideBlobData) -> uuid.UUID:
355-
"""Add a BlobOutput to the manager"""
430+
"""Add a BlobOutput to the manager, generating a unique ID"""
356431
if hasattr(blob, "id") and blob.id is not None:
357432
if blob.id in self._blobs:
358433
return blob.id
@@ -380,12 +455,29 @@ def attach_to_app(self, app: FastAPI):
380455

381456

382457
blobdata_to_url_ctx = ContextVar[Callable[[ServerSideBlobData], str]]("blobdata_to_url")
458+
"""This context variable gives access to a function that makes BlobData objects
459+
downloadable, by assigning a URL and adding them to the
460+
[`BlobDataManager`](#labthings_fastapi.outputs.blob.BlobDataManager).
461+
462+
It is only available within a
463+
[`blob_serialisation_context_manager`](#labthings_fastapi.outputs.blob.blob_serialisation_context_manager)
464+
because it requires access to the `BlobDataManager` and the `url_for` function
465+
from the FastAPI app.
466+
"""
383467

384468
url_to_blobdata_ctx = ContextVar[Callable[[str], BlobData]]("url_to_blobdata")
469+
"""This context variable gives access to a function that makes BlobData objects
470+
from a URL, by retrieving them from the
471+
[`BlobDataManager`](#labthings_fastapi.outputs.blob.BlobDataManager).
472+
473+
It is only available within a
474+
[`blob_serialisation_context_manager`](#labthings_fastapi.outputs.blob.blob_serialisation_context_manager)
475+
because it requires access to the `BlobDataManager`.
476+
"""
385477

386478

387479
async def blob_serialisation_context_manager(request: Request):
388-
"""Set context variables to allow blobs to be serialised"""
480+
"""Set context variables to allow blobs to be [de]serialised"""
389481
thing_server = find_thing_server(request.app)
390482
blob_manager: BlobDataManager = thing_server.blob_data_manager
391483
url_for = request.url_for
@@ -415,3 +507,4 @@ def url_to_blobdata(url: str) -> BlobData:
415507
BlobIOContextDep: TypeAlias = Annotated[
416508
BlobDataManager, Depends(blob_serialisation_context_manager)
417509
]
510+
"""A dependency that enables `Blob`s to be serialised and deserialised."""

0 commit comments

Comments
 (0)