AzureBlobDataStorageService::Cache issues

I'm noticing a few issues with the implementation of this method that I'd like to discuss...

# Using a sync method to download
The `AzureBlobDataStorageService::Cache` method is itself `async`, but it's calling the synchronous version of `BlockBlobClient::Download` here:

https://github.com/microsoft/AuthJanitor/blob/a1ee7f699edca55a8f7d67e746947c4e15e0cbf2/AuthJanitor.Integrations.DataStores.AzureBlobStorage/AzureBlobStorageDataStore.cs#L144

It should be changed to use the asynchronous `DownloadAsync` at bare minimum.

# Deserialization implementation is suboptimal
Looking at the implementation overall though, I would suggest a more optimal solution which will reduce code as well as runtime allocations:

```CSharp
            using (var blobJsonStream = new MemoryStream())
            {
                await Blob.DownloadToAsync(blobJsonStream);

                CachedCollection = JsonSerializer.Deserialize<List<TStoredModel>>(new ReadOnlySpan<byte>(blobJsonStream.GetBuffer(), 0, (int)blobJsonStream.Length));
            }
```

> NOTE: this code is based off the WIP branch to make the switch to `System.Text.Json`, assumes that will be committed and is meant to build on that.

So, first, it allows the `BlobDowloadClient` to optimize the download and delivery to the actual `MemoryStream` with the additional bonus of reducing a bunch of code in this code base to get to `CopyTo`. Second it removes the allocation/copy of the bytes to a `byte[]` that is caused by the `ToArray()` call right now. It does this by simply using the existing underlying buffer of the `MemoryStream` itself and wrapping it in a `ReadOnlySpan` while being sure to specify that the span ends at the logical `Length` of the stream, not the totally size of the allocated buffer. Third, it avoids the conversion of the `byte[]` into a `string` which not only is another allocation/copy, as it goes through the overhead of the `GetString(...)` conversion. Since we already know the bytes to be UTF8 bytes, we can simply pass those bytes through raw to the `JsonSerializer` and let it work right on top of them directly which is all it wants in the first place. 

Now, the only behavioral difference to consider here is if the writer of the JSON blob happened to put a byte order mark (BOM) in place this approach would not skip over that vs the `GetString` approach which would handle that. Thus, I'm making the assumption that these JSON files are stored without a BOM with this code. If they are _definitively_ stored with a BOM we can account for that. If it's possible that they might be stored with or without a BOM then we can still account for that with more checks too. However, since it's the `Commit` method that seems to be the source of the blob content and we control the writing of that blob content and its implementation will always write without a BOM right now, then I think my change is safe.

# This method doesn't actually "cache"?
Finally, the method is called `Cache` and it's called as a preamble in almost every other method (e.g. `Get`, `GetOne`, etc), but it doesn't actually cache the results... it will execute the download logic every single time. This doesn't seem like a caching method as I traditionally think of it. I would think that _at least_ for the lifetime of the instance once `Cache()` is called it would not download the results again. Is this the intended behavior of this method?

Bigger picture, it's treated as a singleton by the application, so... in effect there is no caching going on here and you're always going to the source to download the latest. If there's multiple writers that might make a lot of sense, but then I guess I'd change the name from "cache" since it's always doing a read-through to the backing store. Or, if we really do want it to still act like a cache, this could be further optimized to remember the `ETag` from the last read and only perform the download/deserialization if the `ETag` has changed. This would at least optimize for the many read, few write scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AzureBlobDataStorageService::Cache issues #40

Using a sync method to download

Deserialization implementation is suboptimal

This method doesn't actually "cache"?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AzureBlobDataStorageService::Cache issues #40

Description

Using a sync method to download

Deserialization implementation is suboptimal

This method doesn't actually "cache"?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions