dm.cs.tu-dortmund.de/mlbits/neural-nlp-decoders/
Decoder Models – Lecture Notes
[RaGoGo23]
Ravfogel, S., Goldberg, Y. and Goldberger, J. 2023. Conformal nucleus sampling . Association for computational linguistics, ACL (2023), 27–34.
[RaNa18]
Radford, A. and Narasimhan, K. 2018. Improving [...] All rights reserved unless otherwise noted.
Decoder-only Models
Early decoder-only models:
As of 2023, pre-layernorm appears to be more popular [ XYHZ20 ] :
Causal Language Modeling
Where original BERT …