{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Advanced Deep Learning, SS 2025\n", "\n", "## Overall Goals\n", "\n", "- Achieve mastery of deep neural networks.\n", "- Cover wide range of topics, fast tempo, in-depth.\n", "- Coding whenever possible. Conceptual and technical depth.\n", "- After the course you will be well-equipped for Master thesis.\n", "\n", "## Side-Effects\n", "\n", "- Earning potential increased.\n", "- You will come to appreciate the [bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html).\n", "- You should be able to follow and discern the scientific literature in deep learning.\n", "- You will become more productive and be able to iterate faster.\n", "- You will become able to discern and intuit: \n", " - What is fundamentally important, what will fade away and be forgotten?\n", " - Where do performance limitations come from / how can they be overcome?\n", "- Get to know a few good blogs, twitter accounts etc.\n", "\n", "## Requirements\n", "\n", "- Necessary: (Intro to) Deep Learning, Machine Learning.\n", "- Desirable: Reinforcement Learning.\n", " - [David Silver's course](https://www.davidsilver.uk/teaching/)\n", " - [RL lecture from Uni Paderborn](https://github.com/upb-lea/reinforcement_learning_course_materials)\n", " - [Sutton's book](http://www.incompleteideas.net/book/the-book-2nd.html)\n", "- I expect big time investment from student’s side. I also invest effort." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Organization\n", "\n", "- Communication will go through (discord server)[https://discord.gg/ARF8NFJ3].\n", "- Lecturer: Paul Swoboda, Email [paul.swoboda@hhu.de](paul.swoboda@hhu.de).\n", "- Assistant: Paweł Batorski, Email [pawel.batorski@hhu.de](pawel.batorski@hhu.de).\n", "- Tutors:\n", " - Malte Günnewig, Email [guennewig.malte@googlemail.com](guennewig.malte@googlemail.com).\n", " - Abtin Pourhadi, Email [abtin.pourhadi@hhu.de](abtin.pourhadi@hhu.de).\n", "- Other organizational questions like exam registration, study regulation related etc. go to Annette Peitsch [annette.peitsch@hhu.de](annette.peitsch@hhu.de)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Topics (tentative)\n", "\n", "### Transformers (More than you want to know)\n", "\n", "- GPT-2 reimplementation, as in [https://www.youtube.com/watch?v=l8pRSuU81PU] \n", " - Mixed precision\n", " - Kernel fusion\n", " - distributed training\n", " - ...\n", "- Efficient Inference\n", "- Supervised Fine-Tuning (SFT)\n", "- Reinforcement from human feedback (RLHF)\n", "- Efficient and Scalable Attention:\n", " - Flash attention \n", " - Ring attention \n", " - Tree attention \n", " - Multi-Head Latent Attention\n", "- Positional Encodings:\n", " - RoPE\n", " - Context Extension (Yarn)\n", "- Efficient Fine-Tuning:\n", " - (Q)Lora, LongLora\n", " - Adapters\n", "- Reasoning\n", "\n", "## Transformer Variants\n", "\n", "- Long Context Transformers\n", "- Vision Transformers:\n", " - ViT\n", " - Axial Attention\n", " - Swin-Transformer\n", " - Slot Attention\n", "- General-Purpose Attention Models:\n", " - Perceiver, Perceiver-AR\n", "- Attention Replacements:\n", " - MLP-Mixer\n", " - PoolFormer\n", " - FNet\n", "\n", "## General DL & Training\n", "\n", "- Large batch size: On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima\n", "- Understanding Batch Normalization\n", "- Scaling Laws\n", "- How to train transformers \n", "- Label smoothing\n", "- Natural gradient\n", "- Autodiff forward and backward\n", "- Training Tips & Tricks\n", "\n", "## Sequence Models\n", "\n", "- Recursive Neural Networks (RNNs):\n", " - vanilla RNNs\n", " - GRU \n", " - (x)LSTM\n", "- State Space Models (SSM):\n", " - Mamba(2)\n", "\n", "## Parallelism\n", "\n", "- Data Parallelism\n", "- Model Parallelism\n", "\n", "## Meta-Learning\n", "\n", "- MAML\n", "- Hypernetworks\n", "- [Lilian Weng blog](https://lilianweng.github.io/posts/2018-11-30-meta-learning/)\n", "\n", "## Self-Supervised Learning\n", "\n", "- InfoNCE\n", "- VicReg\n", "- Triplet Loss\n", "- Barlow-Twin\n", "- DINO\n", "- Metric Learning\n", "- [Lilian Weng blog](https://lilianweng.github.io/posts/2019-11-10-self-supervised/)\n", "\n", "## Handling Non-Differentiability\n", "\n", "- Straight-Through Estimator (STE)\n", "- (A)IMLE\n", "- Blackbox-Backprop\n", "\n", "## Compression\n", "\n", "- Knowledge Distillation\n", "- Quantization\n", " \n", "## Tentative Other Topics\n", "\n", "- Generative: Diffusion, Flow Matching\n", "- Neural Tangent Kernel\n", "- Agentic AI" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }