Machine Learning for
Time-Series Problems
ML Fundamentals
(Build data lakes, lakehouses and data warehousesimplement cutting-edge analyticsand develop Big Data applications)
Subtopic
Subtopic
ML Theory
Predictive Modeling
Gradient Boosting and XGBoost overview
Big Data Analytics
(ML distributed modeling and scalable)
Project
Time Series Forecasting (With ML)
(Deliver insight from data and model
Interpretation)
Time Series terminologies and Nomenclature
Taxonomy of Forecasting
EDA and Data preparation for forecasting
Time Series forecasting as Supervised learning
Forecasting Evaluation
Project
Time Series Forecasting (With Deep Learning)
(Deliver insight from data and model
Interpretation)
Deep Learning Fundamentals for Time-Series
Predictive Modeling
Forecasting
Recommendation Systems
Anomaly Detection
Signal Processing
Project
Time Series Classification
(Deliver insight from data and model
Interpretation)
Machine Learning Consulting and Optimization
Predictive Modeling
Forecasting
Recommendation Systems
Anomaly Detection
Signal Processing
Project
Time-Series Anomaly Detection
Computer Vision
Natural Language Processing (NLP)
Project
Time-Series Clustering
Computer Vision
Natural Language Processing (NLP)
Project
Time Series Forecasting (With Transfer Learning)
(Deliver insight from data and model
Interpretation)
Deep Learning Fundamentals for Time-Series
Predictive Modeling
Forecasting
Recommendation Systems
Anomaly Detection
Signal Processing
Project
- BytePairEncoding
- WordPieceEncoding
- SentencePieceEncoding
- Absolute Positional Embeddings
- Relative Positional Embeddings
- Rotary Position Embeddings
- Relative Positional Bias
- Decoder-Only
- Encoder-Decoder
- Hybrid
- Supervised Fine-tuning
- General Fine-tuning
- Multi-turn Instructions
- Instruction Following
Basic
Advanced
Text Embedding
- Masked Language Modeling
- Causal Language Modeling
- Next Sentence Prediction
- Mixture of Experts
LLMs Cpabilities
Basic
Coding
World
Knowledge
Multilingual
Translation
Crosslingual Tasks
Crosslingual QA
Comprehension
Summarization
Simplification
Reading Comprehension
Emerging
In-context learning
Step by step
solving
Symbolic reference
Pos/Neg example
Instruction
following
Task definition
Turn based
Few-shot
Task definition
Reasoning
Logical
Common Sense
Symbolic
Arithmetic
Augmented
Self-improvement
Self-cirtisim
Self-refinement
Tool
utilization
Tool planning
Knowledge base utilization
Task decomposition
Interacting
with users
Assignment planning
Virtual acting
LLM Components
Tokenizations
Positional Encoding
LLM Architectures
Model Pre-training
Fine-tuning and Instruction Tuning
Alignment
Decoding Strategies
Adaptation
LLM Essentials
Attention In LLMs
- Self-Attention : Calculates attention using queries, keys, and values from the same block (encoder or decoder).
- Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder.
- Sparse Attention : To speedup the computation of Self-attention, sparse attention iteratively calculates attention in sliding windows for speed gains.
- Flash Attention : To speed up calculating attention using GPUs, flash attention employs input tiling to minimize the memory reads and writes between the GPU high bandwidth memory (HBM) and the on-chip SRAM.
NLP Fundamentals
Tokenization
- Wordpiece
- Byte pair encoding (BPE)
- UnigramLM
Encoding Positions
- Alibi
- RoPE
Fine Tuning
- Instruction-tuning
- Alignment-tuning
- Transfer Learning
Transformers Architectures
- Encoder Decoder : This architecture processes inputs through
the encoder and passes the intermediate representation to the
decoder to generate the output.
- Causal Decoder : A type of architecture that does not have an
encoder and processes and generates output using a decoder,
where the predicted token depends only on the previous time
steps
-0 Prefix Decoder : where the attention calculation is not
strictly dependent on the past information and the attention
is bidirectional
- Mixture-of-Experts: It is a variant of transformer architecture
with parallel independent experts and a router to route tokens
to experts.
Language Modeling
- Full Language Modeling
- Prefix Language Modeling
- Masked Language Modeling
- Unified Language Modeling
Prompting
- Zero-Shot Prompting
- In-context Learning
- Single and Multi -Turn Instructions
Background
Attention in LLMs
Architecture
Language Modeling
LLMs Adaptation Stages
Pre-Training
Fine-Tuning
Alignment-tuning
RLHF
Transfer Learning
Instruction-tuning
Prompting
Zero-Shot
In-context
Reasoning in LLMs
Single-Turn Instructions
Multi-Turn Instructions
Fine-Tuning
Fine-Tuning I
Large Labeled
Dataset is Avaiable
Fine-Tuning II
Our Dataset is
Different from the
Pre-Trained Data
PEFT
Limited Computational
Resource
Distributed LLM Training
Data Parallelism
Replicates the entire model a
cross devices, easy to implement
but limited by memory constraints.
Model Parallelism
Combines aspects of tensor and
pipeline parallelism for high scalability
but requires complex implementation.
Pipeline Parallelism
Divides the model itself into stages
(layers) and assigns each stage to a
different device, reduces memory
usage but introduces latency.
Tensor Parallelism
Shards a single tensor within
a layer across devices, efficient
for computation but requires
careful communication management.
Hybrid Parallelism
Combine pipeline and tensor
parallelism for optimal performance
based on the model architecture
and available resources.