attention-heads as samples from posterior distribution in a Bayesian sense

https://aclanthology.org/2020.emnlp-main.17.pdf

Though I think CrabNet might need to be refitted for new samples (i.e. if you specify `N=10`, then you only get `10` samples from the posterior, to get more would probably require refitting, and not sure if these would be directly comparable to the `10` from the first run). Also not exactly sure how this could be converted to individual predictions. Maybe just some basic plumbing in and after:

https://github.com/anthony-wang/CrabNet/blob/9e0d79c5bff56ceae0600015942c54214d78152f/crabnet/kingcrab.py#L151-L157



	if self.attention:
	encoder_layer = nn.TransformerEncoderLayer(self.d_model,
	nhead=self.heads,
	dim_feedforward=2048,
	dropout=0.1)
	self.transformer_encoder = nn.TransformerEncoder(encoder_layer,
	num_layers=self.N)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention-heads as samples from posterior distribution in a Bayesian sense #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

attention-heads as samples from posterior distribution in a Bayesian sense #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions