Machine Learning for
Time-Series Problems

ML Fundamentals
(Build data lakes, lakehouses and data warehousesimplement cutting-edge analyticsand develop Big Data applications)

Subtopic

Subtopic

ML Theory

Predictive Modeling

Gradient Boosting and XGBoost overview

Big Data Analytics
(ML distributed modeling and scalable)

Project

Time Series Forecasting (With ML)
(Deliver insight from data and model
Interpretation)

Time Series terminologies and Nomenclature

Taxonomy of Forecasting

EDA and Data preparation for forecasting

Time Series forecasting as Supervised learning

Forecasting Evaluation

Project

Time Series Forecasting (With Deep Learning)
(Deliver insight from data and model
Interpretation)

Deep Learning Fundamentals for Time-Series

Predictive Modeling

Forecasting

Recommendation Systems

Anomaly Detection

Signal Processing

Project

Time Series Classification
(Deliver insight from data and model
Interpretation)

Machine Learning Consulting and Optimization

Predictive Modeling

Forecasting

Recommendation Systems

Anomaly Detection

Signal Processing

Project

Time-Series Anomaly Detection

Computer Vision

Natural Language Processing (NLP)

Project

Time-Series Clustering

Computer Vision

Natural Language Processing (NLP)

Project

Time Series Forecasting (With Transfer Learning)
(Deliver insight from data and model
Interpretation)

Deep Learning Fundamentals for Time-Series

Predictive Modeling

Forecasting

Recommendation Systems

Anomaly Detection

Signal Processing

Project

- BytePairEncoding
- WordPieceEncoding
- SentencePieceEncoding

- Absolute Positional Embeddings
- Relative Positional Embeddings
- Rotary Position Embeddings
- Relative Positional Bias

- Decoder-Only
- Encoder-Decoder
- Hybrid

- Supervised Fine-tuning
- General Fine-tuning
- Multi-turn Instructions
- Instruction Following

Basic

Advanced

Text Embedding

- Masked Language Modeling
- Causal Language Modeling
- Next Sentence Prediction
- Mixture of Experts

LLMs Cpabilities

Basic

Coding

World
Knowledge

Multilingual

Translation

Crosslingual Tasks

Crosslingual QA

Comprehension

Summarization

Simplification

Reading Comprehension

Emerging

In-context learning

Step by step
solving

Symbolic reference

Pos/Neg example

Instruction
following

Task definition

Turn based

Few-shot

Task definition

Reasoning

Logical

Common Sense

Symbolic

Arithmetic

Augmented

Self-improvement

Self-cirtisim

Self-refinement

Tool
utilization

Tool planning

Knowledge base utilization

Task decomposition

Interacting
with users

Assignment planning

Virtual acting

LLM Components

Tokenizations

Positional Encoding

LLM Architectures

Model Pre-training

Fine-tuning and Instruction Tuning

Alignment

Decoding Strategies

Adaptation

LLM Essentials

Attention In LLMs

- Self-Attention : Calculates attention using queries, keys, and values from the same block (encoder or decoder).
- Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder.
- Sparse Attention : To speedup the computation of Self-attention, sparse attention iteratively calculates attention in sliding windows for speed gains.
- Flash Attention : To speed up calculating attention using GPUs, flash attention employs input tiling to minimize the memory reads and writes between the GPU high bandwidth memory (HBM) and the on-chip SRAM.

NLP Fundamentals

Tokenization

- Wordpiece
- Byte pair encoding (BPE)
- UnigramLM

Encoding Positions

- Alibi
- RoPE

Fine Tuning

- Instruction-tuning
- Alignment-tuning
- Transfer Learning

Transformers Architectures

- Encoder Decoder : This architecture processes inputs through
the encoder and passes the intermediate representation to the
decoder to generate the output.
- Causal Decoder : A type of architecture that does not have an
encoder and processes and generates output using a decoder,
where the predicted token depends only on the previous time
steps
-0 Prefix Decoder : where the attention calculation is not
strictly dependent on the past information and the attention
is bidirectional
- Mixture-of-Experts: It is a variant of transformer architecture
with parallel independent experts and a router to route tokens
to experts.

Language Modeling

- Full Language Modeling
- Prefix Language Modeling
- Masked Language Modeling
- Unified Language Modeling

Prompting

- Zero-Shot Prompting
- In-context Learning
- Single and Multi -Turn Instructions

Background

Attention in LLMs

Architecture

Language Modeling

LLMs Adaptation Stages

Pre-Training

Fine-Tuning

Alignment-tuning

RLHF

Transfer Learning

Instruction-tuning

Prompting

Zero-Shot

In-context

Reasoning in LLMs

Single-Turn Instructions

Multi-Turn Instructions

Fine-Tuning

Fine-Tuning I

Large Labeled
Dataset is Avaiable

Fine-Tuning II

Our Dataset is
Different from the
Pre-Trained Data

PEFT

Limited Computational
Resource

Distributed LLM Training

Data Parallelism
Replicates the entire model a
cross devices, easy to implement
but limited by memory constraints.

Model Parallelism
Combines aspects of tensor and
pipeline parallelism for high scalability
but requires complex implementation.

Pipeline Parallelism
Divides the model itself into stages
(layers) and assigns each stage to a
different device, reduces memory
usage but introduces latency.

Tensor Parallelism
Shards a single tensor within
a layer across devices, efficient
for computation but requires
careful communication management.

Hybrid Parallelism
Combine pipeline and tensor
parallelism for optimal performance
based on the model architecture
and available resources.

Optimizer Parallelism:
Focuses on partitioning optimizer
state and gradients to reduce memory
consumption on individual devices.

PaLM Family

Med-PaLM

Med-PaLM2

Med-PaLM M

Flan-PaLM

PaLM

PaLM2

PaLM-E

U-PaLM