CRCD Hosted NVIDIA Workshops | Center for Research Computing and Data

Adding New Knowledge to LLMs
Nov. 12, 2-4pm ET
Maximum Number of attendees: 40

Large Language Models (LLMs) are powerful, but their knowledge is often general-purpose and may lack the specific, up-to-date, or specialized information required for enterprise applications. The "Adding Knowledge to LLMs" workshop provides a comprehensive, hands-on guide to the essential techniques for augmenting and customizing LLMs.

This workshop takes you on a complete journey from raw data to a fine-tuned, optimized model. You will begin by learning how to curate high-quality datasets and generate synthetic data with NVIDIA NeMo Curator. Next, you will dive deep into the crucial process of model evaluation, using benchmarks, LLM-as-a-judge, and the NeMo Evaluator to rigorously assess model performance. With a solid foundation in evaluation, you will then explore a suite of powerful customization techniques, including Continued Pretraining to inject new knowledge, Supervised Fine-Tuning to teach new skills, and Direct Preference Optimization (DPO) to align model behavior with human preferences.

Finally, you will learn to make your customized models efficient for real-world deployment by exploring essential optimization techniques like quantization, pruning, and knowledge distillation using TensorRT-LLM and the NeMo framework. The workshop culminates in a hands-on assessment where you will apply your new skills to align an LLM to a specific conversational style, solidifying your ability to tailor models for any application.

Domain-Adaptive Pre-Training: Tailoring LLMs for Specialized Applications
Dec. 3, 2-4pm ET
Maximum Number of Attendees: 40

While Large Language Models (LLMs) are broadly capable, their general knowledge often falls short of the specialized, domain-specific information required for enterprise applications. This hands-on lab provides a focused, end-to-end approach for building domain-specific large language models. You'll learn how to curate domain-specific datasets, design and train custom tokenizers, and execute the pre-training process to tailor LLMs for specialized applications. You'll gain practical skills and knowledge necessary to adapt LLMs to your unique domain requirements and to real-world use cases. This course takes you on a practical journey from initial data preparation to a domain-adapted, fine-tuned model.