-
Mustererkennung
-
Teaching
-
Project Groups
TU Llama
preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023. [5] J. Schulman, F. Wolski, P. Dhariwal, et al. Proximal policy optimization algorithms. arXiv preprint [...] H. Touvron, L. Martin, K. Stone, et al. Llama 2: Open foundation and fine-tuned chat models. July 2023. [7] G. Wenzek, M.-A. Lachaux, A. Conneau, et al. CCNet: Extracting high quality monolingual datasets …