Adapter Composition And Inference With Merged Weight Strategies

1

PEFTRepository55/100

via “adapter merging and unmerging”

Parameter-efficient fine-tuning — LoRA, QLoRA, adapter methods for LLMs on consumer GPUs.

Unique: Implements reversible weight merging by storing the original base weights separately and computing merged_weight = base_weight + adapter_weight, enabling unmerge_adapter() to restore the original state. The merge operation is mathematically simple but requires careful state management to support unmerging.

vs others: Eliminates adapter inference overhead (5-10% latency reduction) and removes PEFT runtime dependency, enabling deployment as standard transformers models, but at the cost of losing adapter modularity and storage efficiency.

2

loraModel31/100

via “lora model composition and interpolation”

Using Low-rank adaptation to quickly fine-tune diffusion models.

Unique: Implements weight-space composition by directly summing low-rank updates (ΔW = A₁B₁ᵀ + A₂B₂ᵀ) without retraining, enabling zero-cost model blending. Supports learnable composition weights for automatic optimization.

vs others: Enables true compositional generation without retraining (unlike full fine-tuning) while maintaining 100× smaller file sizes; composition is instantaneous compared to training new models.

3

trlFramework28/100

via “model-merging-and-adapter-composition”

Train transformer language models with reinforcement learning.

Unique: Provides utilities for merging and composing LoRA adapters with support for weighted combinations and sequential stacking, enabling multi-task inference without separate model instances

vs others: More flexible than single-adapter inference because it supports adapter composition, while more efficient than maintaining separate models by combining adapters into single merged weights

4

exllamav2Repository24/100

via “multi-lora adapter composition and switching”

Python AI package: exllamav2

Unique: Implements in-place LoRA composition with dynamic adapter switching without base weight reloading, using a cached adapter registry that pre-computes rank-decomposed products for zero-copy switching between adapters

vs others: Faster adapter switching than HuggingFace PEFT (no model reload); lower memory overhead than storing separate full models; simpler composition API than manual adapter blending

5

peftFine-tune23/100

via “adapter merging and unmerging with weight fusion”

Parameter-Efficient Fine-Tuning (PEFT)

Unique: Implements reversible adapter merging through method-specific merge logic that fuses adapter weights into base weights mathematically (e.g., LoRA: W' = W + alpha/r * A @ B^T), enabling both merged and unmerged states from the same checkpoint. The unmerge operation recovers original weights by subtracting the adapter contribution.

vs others: More flexible than permanent merging because unmerge() enables recovery of original weights and adapter separation, while merged models achieve inference latency parity with non-adapter baselines. Supports both merged and adapter-based deployment strategies from the same training run.

6

QLoRA: Efficient Finetuning of Quantized LLMs (QLoRA)Product22/100

* ⭐ 05/2023: [Voyager: An Open-Ended Embodied Agent with Large Language Models (Voyager)](https://arxiv.org/abs/2305.16291)

Unique: Provides systematic adapter composition strategies (sequential, weighted ensemble) with automatic precision handling when merging full-precision adapters into quantized base weights, enabling flexible multi-task model construction — prior LoRA work focused on single-adapter inference

vs others: Enables multi-task inference without maintaining separate models or adapter routing logic, and supports weighted ensemble composition that would otherwise require custom inference code or model ensembling infrastructure

Top Matches

Also Known As

Company