How to Merge LLMs for Custom Models: The Ultimate Guide
Learn how to merge different large language models (LLMs) to create powerful, specialized custom models. This guide explores the benefits, key techniques, and a practical step-by-step example.

In the rapidly evolving landscape of artificial intelligence, the desire to create more powerful, specialized, and cost-effective models is a constant driver of innovation. While training a large language model (LLM) from scratch requires immense computational resources and vast datasets, a powerful new trend has emerged from the open-source community: model merging. This guide explores how to merge LLMs for custom models, a technique that allows developers to combine existing pre-trained models to create unique "Frankenmodels" with enhanced capabilities.
This approach isn't just a novel experiment; it's a pragmatic solution for developers and organizations looking to achieve state-of-the-art performance without state-of-the-art budgets. By understanding the nuances of model merging, you can unlock new possibilities, combining the creative strengths of one model with the logical reasoning of another. This article provides a comprehensive deep dive into the methods, benefits, and practical steps for creating your own custom-merged LLMs.
What is LLM Model Merging? The "Frankenmodel" Phenomenon
LLM model merging is the process of combining the neural network parameters (or "weights") of two or more pre-trained language models to create a single, new model. The resulting model inherits a blend of traits, knowledge, and skills from its "parent" models. The community affectionately calls these creations "Frankenmodels" or "merges."
Think of it like creating a hybrid engine. You might take the fuel efficiency from a small, economic car engine and combine it with the raw horsepower of a sports car engine. The goal is to create a new engine that is both powerful and efficient. In the world of AI, you might merge a model that excels at creative writing with one that is a master of Python code generation. The resulting merged model could potentially excel at both tasks, a feat that might be difficult to achieve with a single, general-purpose model.
This process is made possible because many open-source models share similar underlying architectures (like the Transformer architecture). This architectural consistency allows their parameters to be combined mathematically, creating a new, coherent set of weights that function as a new model.
Why Merge Language Models? The Core Benefits
The popularity of model merging stems from several key advantages over training or fine-tuning from scratch.
-
Cost-Effectiveness: Training a foundational LLM can cost millions of dollars in GPU time. Merging models, in contrast, is computationally cheap. It often requires only a decent consumer-grade GPU and can be completed in minutes or hours, not weeks or months.
-
Capability Combination: The primary driver for merging is to combine the distinct strengths of different models. No single model is the best at everything. One model might be SOTA (State-of-the-Art) in logical reasoning, another in creative prose, and a third in a specific language or domain like medicine or law. Merging allows you to create a "specialist generalist" that inherits these desired traits.
-
Rapid Innovation and Experimentation: The low cost and high speed of merging enable a rapid cycle of experimentation. Developers can quickly create and test dozens of model combinations to find the optimal blend for a specific use case. This has led to an explosion of powerful new models on platforms like Hugging Face, driven entirely by the community.
-
Circumventing Training Data Limitations: Fine-tuning a model on a new dataset can be effective, but it requires a high-quality, curated dataset. Merging provides an alternative path to specialization by leveraging the knowledge already baked into existing, expertly trained models.
Key Model Merging Techniques Explained
Several methods have been developed for merging LLM parameters. They range from simple averaging to more complex, task-aware algorithms. Here are some of the most popular techniques.
Simple Linear Averaging (Linear)
The most straightforward method. The weights of the new model are a simple weighted average of the parent models' weights. For example, with two models (A and B), the merged model's weights (W_merge) would be W_merge = (1-α) * W_A + α * W_B, where α (alpha) is the interpolation factor between 0 and 1. It's fast and simple but can sometimes "average out" the unique capabilities of the parents.
Spherical Linear Interpolation (Slerp)
Slerp is a more advanced interpolation technique borrowed from computer graphics. Instead of moving in a straight line between the two models' parameter vectors (like linear averaging), Slerp moves along the shortest arc on the surface of a high-dimensional sphere. In our testing, this method is often better at preserving the unique features of the parent models, preventing the "averaging out" problem and leading to more capable merges.
Task Arithmetic and DARE
Task Arithmetic is a technique where a "task vector" is calculated by subtracting the weights of a base model from a fine-tuned model. This vector represents the "skill" learned during fine-tuning. You can then add this task vector to a different model to impart that skill. DARE (Drop and Rescale) is a recent improvement on this. It randomly drops a large portion of the task vector's values and then rescales the remaining ones. This sounds counterintuitive, but it has proven exceptionally effective at merging models without conflicts, often producing SOTA results.
TIES-Merging
TIES-Merging (Trim, Elect, Sign-based Merging) is a more sophisticated method that focuses on resolving parameter-sign disagreements between models. First, it trims away parameters that have changed the least during fine-tuning. Then, it creates a "dominant sign" for each parameter based on which sign appears most often across the models being merged. This focus on resolving conflicts at the parameter sign level can lead to very coherent and high-performing merged models.
Comparison of Popular Merging Techniques
| Technique | Complexity | Compute Cost | Key Advantage | Best For |
|---|---|---|---|---|
| Linear Averaging | Very Low | Very Low | Simplicity and speed. | Quick experiments and merging very similar models. |
| Slerp | Low | Low | Better preservation of parent model capabilities. | Creating a balanced hybrid of two distinct models. |
| DARE | Medium | Medium | Excellent at combining multiple, diverse skills without conflict. | Merging multiple fine-tuned models onto a base model. |
| TIES-Merging | Medium | Medium | Resolves sign conflicts to create coherent merges. | Complex merges where parameter conflicts are a major issue. |
How to Merge LLMs for Custom Models: A Step-by-Step Guide
Creating your own merged model is surprisingly accessible thanks to open-source tools. The most popular is mergekit, a command-line tool that implements most of the techniques discussed above. Here’s a simplified, actionable walkthrough.
-
Install
mergekit: First, you need to set up your Python environment and install the library. It's usually as simple as runningpip install mergekitin your terminal. -
Choose Your Parent Models: This is the most crucial step. Identify the models you want to merge from the Hugging Face Hub. For this example, let's say we want to merge a coding model (
Model-Coder) with a creative writing model (Model-Creative).. You'll need their Hugging Face repository IDs. -
Create a Merge Configuration File:
mergekituses a YAML file to define the merge process. Let's create aconfig.ymlfile. Here's an example using the DARE method:models: - model: Org/Model-Coder # No parameters needed for the base model - model: Org/Model-Creative parameters: density: 0.5 # A DARE parameter weight: 0.5 # A DARE parameter merge_method: dare_ties base_model: Org/Model-Coder dtype: float16 -
Run the Merge Command: Open your terminal, navigate to the directory with your
config.yml, and execute the merge command:mergekit-yaml config.yml merge --copy-tokenizer -
Wait for the Process to Complete:
mergekitwill now download the parent models (if you don't have them locally) and perform the mathematical operations to combine their weights. The output will be a new directory, typically namedmerge, containing the new model files. -
Test Your New Model: The
mergedirectory contains your new, ready-to-use LLM. You can load it into any framework that supports Hugging Face models (like Transformers, llama.cpp, etc.) and start interacting with it. Evaluate its performance on both coding and creative writing tasks to see if you successfully combined their strengths.
Mini Case Study: The "Goliath-120b" Merge
A prominent real-world example that showcases the power of this technique is the "Goliath-120b" model. This model was not trained by a large corporation but was created by a community member merging two fine-tuned Llama-2 70B models.
At the time of its release, Goliath-120b demonstrated remarkable performance, often outperforming the individual parent models and even rivaling some proprietary models on certain benchmarks. It combined the knowledge and nuances from two different fine-tuning runs into one powerful package. This case study is a testament to how open-source collaboration and clever techniques like model merging can push the boundaries of AI without requiring foundational training resources.
Common Pitfalls and What to Avoid
While powerful, model merging is not a magic bullet. Here are some common pitfalls:
- Merging Incompatible Models: Merging models with different architectures or tokenizers will fail or produce a nonsensical result. Always stick to models within the same family (e.g., Llama-2 based, Mistral based).
- The "Averaging to Mediocrity" Problem: A poorly executed linear merge can result in a model that is mediocre at everything and excels at nothing. Using more advanced techniques like DARE or Slerp can help mitigate this.
- Ignoring the Tokenizer: When you run
mergekit, always use the--copy-tokenizerflag to copy the tokenizer from one of the parent models. A model without a correct tokenizer is useless. - Not Testing and Iterating: Your first merge is unlikely to be your best. The key to success is to try different model combinations, different merge techniques, and different parameters, and rigorously test the results to find what works for your specific goal.
By being mindful of these challenges, you can approach your merging experiments with a higher chance of success and create a truly useful custom model.
About the Author
The neural.ai editorial team consists of expert SEO strategists and senior tech journalists dedicated to providing E-E-A-T compliant content. Our team leverages hands-on analysis and deep industry knowledge to create articles that are insightful, accurate, and engineered to rank. We are passionate about demystifying complex AI topics for our audience.
Internal Linking Suggestions
- Anchor Text: "Using Synthetic Data for AI Model Training" Target Topic: Our recent guide on how synthetic data is used to improve model performance.
- Anchor Text: "fine-tuning a model" Target Topic: A future article explaining the difference between fine-tuning, RAG, and model merging.
- Anchor Text: "open-source models" Target Topic: Meta Llama 3.1 Release Analysis: A New Challenger to OpenAI?
- Anchor Text: "Transformers architecture" Target Topic: What Is Yann LeCun's I-JEPA? A Deep Dive Into Predictive AI
Related Articles to Explore
- Title: Fine-Tuning vs. Merging vs. RAG: Which LLM Customization Technique Is Right for You?
- Title: A Deep Dive into
mergekit: Advanced Configuration and Techniques - Title: The Top 10 Merged Models on Hugging Face: A Performance Benchmark
- Title: How to Evaluate a Merged LLM: A Guide to Custom Benchmarking
- Title: The Legal and Ethical Implications of Merging Open-Source AI Models
Key Takeaways
- ▸Model merging is a powerful technique to combine two or more LLMs into a single, more capable custom model.
- ▸It is significantly cheaper and faster than training a model from scratch, enabling rapid experimentation.
- ▸Key merging techniques include simple linear averaging, Slerp, DARE, and TIES-merging, each with unique advantages.
- ▸Tools like `mergekit` make it easy for developers to create their own merged models using a simple configuration file.
- ▸Successful merging requires careful model selection, the right technique, and iterative testing to avoid common pitfalls.
Frequently Asked Questions
What is LLM model merging?+
LLM model merging is a machine learning technique that combines the weights of two or more pre-trained language models to create a single new model. This new model, often called a "Frankenmodel," inherits a blend of skills and knowledge from its parent models, allowing for the creation of powerful custom AIs without training from scratch.
Is merging models better than fine-tuning?+
Merging and fine-tuning are different techniques for different goals. Fine-tuning adapts a model to a specific dataset or task. Merging combines the existing, broad capabilities of entire models. Merging is often faster and better for combining disparate skills (like coding and poetry), while fine-tuning is better for specializing in a narrow domain.
What are the best tools for merging LLMs?+
The most popular and powerful open-source tool for merging LLMs is `mergekit`. It's a command-line utility that implements various advanced merging techniques like DARE, TIES, and Slerp. It operates using a simple YAML configuration file, making the process highly accessible for developers with a basic Python environment.
Can I merge any two LLMs?+
No, you cannot merge any two models. For a successful merge, the models must share the same underlying architecture (e.g., both based on Llama or Mistral) and tokenizer. Attempting to merge models with different architectures will result in an error or a non-functional model. Always check for compatibility before attempting a merge.
Sources & further reading
Recommended AI Tools
Hand-picked tools related to this article — explore reviews, pricing, and use cases.
Stay ahead of the curve.
Bookmark neural.ai or share this article — new stories drop every 12 hours.
Explore more articlesRelated in Machine Learning
- Using Synthetic Data for AI Model Training: The Ultimate GuideAs the well of high-quality internet data runs dry, the AI industry is turning to a powerful solution: synthetic data. Here's how it works and why it matters.
- How to Implement Constitutional AI for Safer LLMs in 2026Learn how to implement Constitutional AI to build safer, more reliable, and less biased large language models. This step-by-step guide covers everything from drafting your constitution to the reinforcement learning phase.
- What Is Amazon Bedrock Studio? AWS's New Bet on Generative AIAWS just launched Amazon Bedrock Studio, a new web-based environment for building generative AI apps. We break down exactly what it is and why it matters.
