LLMs Learning Path

Fundamentals

From Seq-to-Seq and RNN
to Attention and Transformers

Attention In LLMs

- Self-Attention : Calculates attention using queries, keys, and values from the same block (encoder or decoder).
- Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder.
- Sparse Attention : To speedup the computation of Self-attention, sparse attention iteratively calculates attention in sliding windows for speed gains.
- Flash Attention : To speed up calculating attention using GPUs, flash attention employs input tiling to minimize the memory reads and writes between the GPU high bandwidth memory (HBM) and the on-chip SRAM.

NLP Fundamentals

Tokenization

- Wordpiece
- Byte pair encoding (BPE)
- UnigramLM

Encoding Positions

- Alibi
- RoPE

Language Modeling and llms

- Full Language Modeling
- Prefix Language Modeling
- Masked Language Modeling
- Unified Language Modeling

HF Transformers, pyTorch

LLMs Adaptation

Generative AI Agents

AI Apps

Efficient LLMs

Basic LLMs Tasks

Question Answering

Conversational AI

Text Summarization

Language Translation

Paraphrasing

Ethical and Bias Evaluation

Content Personalization

Sentiment Analysis

Semantic Search

Text-to-Text Transformation

Information Extraction

Content Generation and Correction

Business Sectors

e-Business

Finance and Banking

Sales and Marketing

Customer Relationship Management (CRM)

Regulatory Compliance

Education and Training

Healthcare

Research and Development

Human Resources and Talent Management

Supply Chain Management

Knowledge Management

Manufacturing

Technology and Software Development