przez Eric Nic 5 dni temu
745
Więcej takich
vLLM , DeepSpeed-Inference
Hardware-Assisted Attention
FlashAttention, vAttention
Learnable Pattern Strategies
HyperAttention
Fixed Pattern Strategies
Sparse Transformer , Longformer , Lightning Attention-2
Quantization-Aware Training
Post-Training Quantization
Weight-Activation Co-Quantization
RPTQ, QLLM
Weight-Only Quantization
GPTQ, AWQ, SpQR
Unsloth
Prompt Tuning
Prefix Tuning
Adapter-based Tuning
Low-Rank Adaptation (LoRA, DoRA)
Gradio
Jan
Anyscale
Hugging Face Inference Endpoints.
vLLM
BentoML
LlamaIndex
Indexes
weaviate
Faiss
Pinecone
Qdrant
Chroma
Prompt
Agent
Chains
Memory