AI & ML

Multimodal Generation

From text-to-image to text-to-video. Diffusion models, DiT architectures, evaluation metrics, and responsible generation.

Generative Foundations

Generative Model Overview (VAE, GAN, Diffusion)
Text-to-Image Architecture (U-Net, DiT)
Diffusion Training and Sampling

Evaluation & Video

T2I Evaluation (IS, FID, CLIP Score)
Text-to-Video Pipeline and Scaling
Multimodal Evaluation and Alignment

Architecture & Responsibility

Vision Transformers & Cross-Modal Attention
End-to-End Multimodal System Architecture
Responsible Multimodal Generation
SWE Quiz - Master System Design & ML Interviews