self-supervised nlp model training curriculum
Provides structured educational progression through self-supervised learning techniques for NLP, covering masked language modeling, contrastive learning, and representation learning approaches. The curriculum is organized as a semester-long course with lectures, assignments, and projects that build foundational understanding of how modern language models learn from unlabeled data without explicit supervision signals.
Unique: University-level curriculum specifically focused on self-supervised NLP at Johns Hopkins, combining theoretical foundations with hands-on implementation of techniques like masked prediction, contrastive objectives (SimCLR, MoCo), and momentum-based learning — taught by NLP researchers actively publishing in this space
vs alternatives: Deeper theoretical grounding and research-oriented perspective compared to industry bootcamp courses; provides access to cutting-edge self-supervised techniques before they become mainstream, with faculty expertise in representation learning
hands-on self-supervised model implementation assignments
Structured programming assignments that guide students through implementing core self-supervised learning algorithms from first principles, including masked language model training loops, contrastive loss functions, and evaluation frameworks. Assignments progress from implementing basic objectives to building complete training pipelines with data loading, optimization, and validation.
Unique: Assignments are designed by active NLP researchers and iterate on real self-supervised techniques used in production models; includes debugging guidance and common pitfalls specific to self-supervised training (e.g., collapse in contrastive learning, convergence issues with masked prediction)
vs alternatives: More rigorous and research-aligned than generic deep learning assignments; focuses on implementation details that matter for production self-supervised systems rather than simplified toy problems
research paper reading and analysis seminar
Structured seminar component where students read, present, and critically analyze recent self-supervised NLP research papers. The seminar covers landmark papers (BERT, RoBERTa, SimCLR, MoCo) and recent advances, with student presentations and group discussions that develop research literacy and understanding of the field's evolution.
Unique: Seminar is led by faculty actively publishing in self-supervised NLP; paper selection reflects current research frontiers and includes unpublished work or preprints from the research group, providing insider perspective on research directions
vs alternatives: More curated and research-focused than generic paper reading groups; provides direct access to researchers' perspectives on which papers matter and why, rather than relying on citation counts or popularity
final project guidance for self-supervised model development
Capstone project framework where students design and implement novel self-supervised learning approaches or apply existing techniques to new domains. Projects are guided through proposal, implementation, and evaluation phases with feedback from instructors and peers, culminating in a research-quality report and code release.
Unique: Projects are mentored by NLP researchers with active publication records; guidance includes not just technical feedback but also research methodology, experimental rigor, and publication-readiness standards that align with top-tier venues
vs alternatives: More research-oriented than typical course projects; emphasizes reproducibility, statistical significance, and contribution novelty rather than just technical correctness, preparing students for research careers
self-supervised learning theory and mathematical foundations
Comprehensive coverage of the mathematical and theoretical underpinnings of self-supervised learning, including information theory perspectives (mutual information maximization), contrastive learning theory (noise contrastive estimation, triplet loss), and convergence analysis. Lectures bridge intuitive explanations with rigorous mathematical proofs and derivations.
Unique: Theory lectures are taught by researchers with publications in theoretical self-supervised learning; includes recent theoretical advances (e.g., understanding collapse in contrastive learning, sample complexity bounds) not yet in textbooks
vs alternatives: Deeper theoretical rigor than industry courses; connects self-supervised learning to broader mathematical frameworks (information theory, statistical learning theory) rather than treating it as isolated techniques