LLM past, present, and future
Thomas Scialom, Research Scientist, Meta
In this talk, Thomas Scialom, Research Scientist at Meta, delves deep into Llama 2 and explores the world of Reinforcement Learning from Human Feedback.
One year ago, Meta released Galactica, a scientific language model, marking the inception of ChatGPT. This technology, originating in Silicon Valley, swiftly reached developers worldwide in less than a year, showcasing unprecedented scalability.
Large language models (LLMs) are transformer-based models, utilizing weights and data (internet tokens/words) for pre-training on next-token prediction. Scaling, a key aspect, involves increasing weights and data. Early GPT versions favored scaling weights, but Chinchilla's research emphasized the importance of scaling both weights and data for optimal results. LlAMA, Meta's initiative, aims to rethink compute optimization beyond traditional research goals. By focusing on scaling data, LlAMA strives for cost-effective improvements, surpassing the Chinchilla trap.
LlAMA 2 introduces RLHF Instruction tuning, aligning models with human preferences through creative annotation and reward models. This approach leverages reinforcement learning with human feedback, achieving superhuman data creation capabilities. The future of LLMs involves multimodality, enabling models to process image inputs and aligning modalities in a single model. Access to external tools, plugins, and the web expands their capabilities. RLHF opens avenues for novel data creation, pushing the boundaries of what models can achieve.
Robotics integration, though a long-term goal, is on the horizon, connecting language models to real-world sensors. Despite unpredictable breakthroughs, the trajectory of increased compute in models suggests ongoing improvements in the next decade. As we witness the Copernicus moment of intelligence, understanding that intelligence involves matrix multiplications, the future of AI holds exciting possibilities.