Support streaming code unit sequences by saving incomplete code unit sequences as encoding state

Consuming code unit sequences from a streaming source may result in attempts to decode a partial code unit sequence.  At present, an exception will be thrown when such underflow occurs.  An alternative would be to store the partial code unit sequence in the iterator state and then have the iterator compare equally to the end iterator.  This would enable code like the following to work correctly even if buffer ends fail to fall on a code unit sequence boundary.

``` C++
using encoding = utf8_encoding;
auto state = encoding::initial_state();
do {
   std::string b = get_more_data();
   auto tv = make_text_view<utf8_encoding>(state, begin(b), end(b));
   auto tv_it = begin(tv);
   while (tv_it != end(tv))
     ...;
   state = tv_it;  // Trailing state is in tv_it, preserve it
                   // to seed state for the next iteration.
} while(!b.empty());
```

A problem with this approach is that it leaves open the possibility for trailing code units (e.g., garbage at the end of the encoded text) to go unnoticed.  Because of this, the behavior above probably shouldn't be the default behavior, but it should be possible for code to opt in to it; perhaps via a policy class as suggested in #14.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support streaming code unit sequences by saving incomplete code unit sequences as encoding state #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support streaming code unit sequences by saving incomplete code unit sequences as encoding state #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions