Software Development

Create a Large Language Model from Scratch with Python – Tutorial



Learn to construct your individual massive language mannequin, from scratch. This course goes into the information dealing with, math, and transformers behind massive language fashions. You’ll use Python.

Thank you for reading this post, don't forget to subscribe!

✏️ Course developed by @elliotarledge

💻 Code and course assets:

Be a part of Elliot’s Discord server:
Elliot on X:

⭐️ Contents ⭐️
(0:00:00) Intro
(0:03:25) Set up Libraries
(0:06:24) Pylzma construct instruments
(0:08:58) Jupyter Pocket book
(0:12:11) Obtain wizard of oz
(0:14:51) Experimenting with textual content file
(0:17:58) Character-level tokenizer
(0:19:44) Kinds of tokenizers
(0:20:58) Tensors as a substitute of Arrays
(0:22:37) Linear Algebra heads up
(0:23:29) Practice and validation splits
(0:25:30) Premise of Bigram Model
(0:26:41) Inputs and Targets
(0:29:29) Inputs and Targets Implementation
(0:30:10) Batch measurement hyperparameter
(0:32:13) Switching from CPU to CUDA
(0:33:28) PyTorch Overview
(0:42:49) CPU vs GPU efficiency in PyTorch
(0:47:49) Extra PyTorch Capabilities
(1:06:03) Embedding Vectors
(1:11:33) Embedding Implementation
(1:13:06) Dot Product and Matrix Multiplication
(1:25:42) Matmul Implementation
(1:26:56) Int vs Float
(1:29:52) Recap and get_batch
(1:35:07) nnModule subclass
(1:37:05) Gradient Descent
(1:50:53) Logits and Reshaping
(1:59:28) Generate perform and giving the mannequin some context
(2:03:58) Logits Dimensionality
(2:05:17) Coaching loop + Optimizer + Zerograd rationalization
(2:13:56) Optimizers Overview
(2:17:04) Purposes of Optimizers
(2:18:11) Loss reporting + Practice VS Eval mode
(2:32:54) Normalization Overview
(2:35:45) ReLU, Sigmoid, Tanh Activations
(2:45:15) Transformer and Self-Consideration
(2:46:55) Transformer Structure
(3:17:54) Constructing a GPT, not Transformer mannequin
(3:19:46) Self-Consideration Deep Dive
(3:25:05) GPT structure
(3:27:07) Switching to Macbook
(3:31:42) Implementing Positional Encoding
(3:36:57) GPTLanguageModel initalization
(3:40:52) GPTLanguageModel ahead cross
(3:46:56) Normal Deviation for mannequin parameters
(4:00:50) Transformer Blocks
(4:04:54) FeedForward community
(4:07:53) Multi-head Consideration
(4:12:49) Dot product consideration
(4:19:43) Why we scale by 1/sqrt(dk)
(4:26:45) Sequential VS ModuleList Processing
(4:30:47) Overview Hyperparameters
(4:32:14) Fixing errors, refining
(4:34:01) Start coaching
(4:35:46) OpenWebText obtain and Survey of LLMs paper
(4:37:56) How the dataloader/batch getter must change
(4:41:20) Extract corpus with winrar
(4:43:44) Python information extractor
(4:49:23) Adjusting for practice and val splits
(4:57:55) Including dataloader
(4:59:04) Coaching on OpenWebText
(5:02:22) Coaching works properly, mannequin loading/saving
(5:04:18) Pickling
(5:05:32) Fixing errors + GPU Reminiscence in process supervisor
(5:14:05) Command line argument parsing
(5:18:11) Porting code to script
(5:22:04) Immediate: Completion characteristic + extra errors
(5:24:23) nnModule inheritance + technology cropping
(5:27:54) Pretraining vs Finetuning
(5:33:07) R&D pointers
(5:44:38) Outro

🎉 Because of our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan

Study to code at no cost and get a developer job:

Learn tons of of articles on programming:

source

Comments are closed.