Skip to content

The iterative batch generation code is so strange to me in Chapter 6 04_arxiv preprocessing.py. #13

@jeffacode

Description

@jeffacode

First, for i in range(0, len(text) - self.length + 1, self.max_length // 2):. I'm sorry, but what if len(text) is actually smaller than self.length(I assume it's the max_length)? And Why would I need to do this process?

Second, assert all(len(x) == len(windows[0]) for x in windows). Why do I need to make every text the same length?

Next, the following while True. Isn't it going to loop infinitely?

Last, batch = windows[i: i + self.batch_size]. I don't think last batch generated will be the same size as previous ones in first dimension.

Hope someone could answer my questions:)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions