Skip to content

attention-heads as samples from posterior distribution in a Bayesian sense #22

@sgbaird

Description

@sgbaird

https://aclanthology.org/2020.emnlp-main.17.pdf

Though I think CrabNet might need to be refitted for new samples (i.e. if you specify N=10, then you only get 10 samples from the posterior, to get more would probably require refitting, and not sure if these would be directly comparable to the 10 from the first run). Also not exactly sure how this could be converted to individual predictions. Maybe just some basic plumbing in and after:

CrabNet/crabnet/kingcrab.py

Lines 151 to 157 in 9e0d79c

if self.attention:
encoder_layer = nn.TransformerEncoderLayer(self.d_model,
nhead=self.heads,
dim_feedforward=2048,
dropout=0.1)
self.transformer_encoder = nn.TransformerEncoder(encoder_layer,
num_layers=self.N)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions