How to Deduplicate a Contact List Before Importing Into Your CRM
CRM duplicate detection misses most company name variations. Before importing any list, deduplicate it internally with fuzzy matching, then match it against your existing CRM accounts. Split into two batches: records to update and records to create. This 10-minute process prevents weeks of cleanup later.
Every conversation I have with sales ops teams starts the same way.
"We have a duplicate problem in Salesforce."
I ask when it started. They're not sure — it's been building for years. I ask what they've tried. They've turned on duplicate matching rules, maybe run a few merge jobs. But the duplicates keep coming.
Then I ask: "How do duplicates usually get created?"
The answer, almost always: imports.
Someone had a spreadsheet — purchased leads, tradeshow data, partner referrals, enrichment data — and imported it. Green checkmarks, no errors. And then slowly, over weeks, the problems appeared. Reps complaining about duplicate accounts. Marketing sending duplicate emails. Reports that don't add up.
The duplicates weren't created inside the CRM. They were imported into it. And once they're in, getting them out is painful.
Why CRM Duplicate Detection Doesn't Catch What You Think
Salesforce has Duplicate Rules. HubSpot has duplicate management. Dynamics 365 has duplicate detection. They all claim to prevent duplicates on import.
Here's the problem: they're looking for near-exact matches. They might catch "Acme Inc." vs "Acme Inc" (with and without period). They won't catch what real-world data actually looks like:
| Your CRM | Import File | Same Company? | CRM Detects It? |
|---|---|---|---|
| Acme Corporation | ACME Corp | Yes | No |
| Johnson & Johnson | J&J | Yes | No |
| International Business Machines | IBM | Yes | No |
| PricewaterhouseCoopers | PwC | Yes | No |
| The Procter & Gamble Company | P&G | Yes | No |
| McKinsey & Company | McKinsey | Yes | No |
| Ernst & Young | EY | Yes | No |
| Goldman Sachs Group, Inc. | Goldman Sachs | Yes | No |
Every single one of these passes through CRM duplicate detection without a warning. The system creates new accounts for companies you already have — just with slightly different names.
This is why Excel's Remove Duplicates fails on company names too. Any tool that requires exact or near-exact text matching will miss the majority of real-world duplicates.
The Real Damage Happens Slowly
The insidious thing about imported duplicates is that they don't cause immediate problems. The import completes. Everyone moves on. The damage accumulates over weeks and months:
Wasted sales time. A rep researches "Goldman Sachs Group, Inc." for 20 minutes, not realizing another rep already has "Goldman Sachs" as an account with full notes and history. This happens dozens of times per week across a sales team.
Broken lead routing. A new lead comes in for "IBM." Your routing rules assign it to rep A, who owns "International Business Machines." But the lead gets matched to the new "IBM" account, which has no owner. It falls into a queue. Days pass.
Incorrect reporting. How many active opportunities do you have? How many accounts per territory? What's your average deal size per account? Every metric that relies on accurate account counts is wrong when 15-30% of your accounts are duplicates.
Marketing email problems. You send a product update to all customers. "J&J" and "Johnson & Johnson" both get it. Same person at the same company receives the same email twice. Or worse: different contacts at the same company receive it, and one asks the other, "Why did we get this twice?"
The cleanup tax. Eventually someone has to fix it. That project — the one detailed in this Salesforce duplicate cleanup guide — takes weeks, not hours. Because now every duplicate account has contacts, activities, opportunities, and notes attached. Merging two records inside a CRM is exponentially harder than preventing the duplicate in the first place.
The 10-Minute Pre-Import Process
Here's the process that prevents all of this. It adds about 10 minutes to any import — and saves days of cleanup later.
Step 1: Deduplicate the Import File Internally
Before comparing against your CRM, check if the import file has duplicates within itself.
This is especially common with:
- Purchased lead lists (vendors often have the same company multiple times)
- Tradeshow attendee data (multiple people from the same company)
- Combined files from multiple sources
Run fuzzy matching on the company name column against itself. You're looking for rows like "Acme Corp" and "ACME Corporation" that appear to be different but are actually the same company.
For files under 500 rows, you can do this free with DedupFuzzy — upload the CSV, select the company name column, and see duplicates in about 60 seconds. For larger files, Python's rapidfuzz library works well if you code.
Merge or remove the internal duplicates before proceeding. Decide which row to keep — usually the one with more complete data.
Step 2: Match Against Your Existing CRM
Now compare the deduplicated import file against your current CRM accounts.
Export your CRM accounts to CSV (just the Account Name column is fine for matching). Run fuzzy matching between the two files:
- File A: Your import file (company names)
- File B: Your CRM export (account names)
The result is a list of matches with similarity scores. Set a threshold — 75% is usually a good starting point — and anything above that threshold means "this import record already exists in your CRM."
Review the matches, especially in the 65-80% range where you'll find a mix of real matches and false positives.
Step 3: Split Into Two Import Batches
After matching, you have two categories:
- Matched records: Companies that already exist in your CRM under a different name. Don't create new accounts for these. Instead, update the existing accounts with the new contact information.
- Unmatched records: Companies that genuinely don't exist in your CRM. These can be imported as new accounts.
Split your import file accordingly. The matched records become an "update" batch. The unmatched records become a "create" batch.
Step 4: Use CRM Import Settings Wisely
When importing the "create" batch, use your CRM's duplicate detection as a backstop — not your primary defense. You've already cleaned the data; the CRM rules are just catching anything you might have missed.
For the "update" batch, you'll typically need to include the existing Account ID or use your CRM's match-and-update import option. This ensures the new contacts get associated with the correct existing accounts.
For more detail on import settings, see the guide on cleaning data before CRM import.
Why Post-Import Cleanup Is Worse Than Prevention
Some teams skip the pre-import process because they figure they can "clean it up later." This is almost always a mistake.
In a spreadsheet, a duplicate company is two rows. Delete one, done.
In a CRM, a duplicate company has:
- Contacts — which record keeps them?
- Opportunities — some might be linked to the wrong account
- Activities — emails, calls, meetings, all attached to records
- Notes — institutional knowledge from the sales team
- Custom fields — different values on each record
- Workflow history — automations that ran on both accounts
Merging two account records in Salesforce requires deciding what to do with each of these elements. And CRMs have limitations — Salesforce, for example, can only merge two accounts at a time, which becomes painful when you have thousands of duplicates.
A 10-minute pre-import process prevents a 10-hour post-import cleanup. The math is not close.
Real Example: Importing a Purchased Lead List
A SaaS company bought a list of 2,500 "decision makers at mid-market companies." Marketing wanted to add them to HubSpot for an ABM campaign.
Here's what pre-import deduplication found:
| Step | Records | Issue Found |
|---|---|---|
| Original import file | 2,500 | — |
| After internal dedup | 2,247 | 253 duplicates within the list itself |
| Matched to existing HubSpot accounts | 892 | Companies already in CRM |
| Net-new companies to create | 1,355 | — |
Without pre-import cleaning: 2,500 records imported, creating 253 internal duplicates + 892 duplicates of existing accounts = 1,145 duplicate records to eventually clean up.
With pre-import cleaning: 1,355 clean net-new records imported, plus 892 contacts added to existing accounts with proper attribution. Zero duplicates created.
Time spent on pre-import process: 25 minutes.
Time that would have been spent on post-import cleanup: conservatively, 15-20 hours over the following months.
The Types of Imports That Cause the Most Damage
Not all imports are equally risky. Based on what I've seen, here's a ranking from most to least problematic:
- Tradeshow and event attendee lists — Event organizers use different naming conventions than your CRM. Names are often truncated, abbreviated, or entered by the attendees themselves. Plus, the same companies appear repeatedly across different contacts. See the detailed guide on matching tradeshow attendee lists against your CRM.
- Purchased lead lists — Data vendors compile information from multiple sources, each with their own naming conventions. Quality control is usually minimal. The same company appears under 3-4 different names in a single purchased list.
- Partner referrals — Partners use their own naming conventions. If a partner sends you "J&J" and you have "Johnson & Johnson," that's a duplicate waiting to happen.
- Marketing campaign responses — Web forms capture whatever the user types. "Google," "Google Inc," "Google LLC," "Alphabet," and "google.com" might all appear for the same company.
- System migrations — When consolidating from multiple CRMs or merging after an acquisition, you're combining datasets that were never standardized. This is usually the largest volume of duplicates, but it's also a one-time event that teams typically handle more carefully.
Building a Habit, Not a One-Time Fix
The pre-import process isn't something you do once and forget. It needs to become a standard operating procedure for anyone who imports data into your CRM.
Document the process. Create a checklist. Add it to your onboarding for new ops team members. Make it so routine that nobody even thinks about importing data without deduplicating first.
Some teams create a shared folder where import files go before they're loaded into the CRM. A quick review step catches problems before they become permanent.
The goal is to shift from "cleanup mode" (constantly fixing duplicates) to "prevention mode" (stopping them before they exist). It's the difference between bailing water out of a leaky boat and actually fixing the leak.
Quick Comparison: Deduplication Methods
| Method | Catches Exact Dupes | Catches Name Variations | Speed | Technical Skill |
|---|---|---|---|---|
| Excel Remove Duplicates | Yes | No | Fast | None |
| VLOOKUP / INDEX MATCH | Yes | No | Medium | Basic |
| Excel Power Query Fuzzy Merge | Yes | Limited | Slow | Medium |
| Python (rapidfuzz) | Yes | Yes | Fast | High |
| DedupFuzzy | Yes | Yes | Fast | None |
If you're comfortable with Python, rapidfuzz is excellent and free. If you're not, an online tool gets you the same results without the learning curve. The key is using something that does fuzzy matching, not just exact matching.
The Process, Summarized
Before every import:
- Deduplicate internally: Remove duplicates within the import file itself using fuzzy matching
- Match against CRM: Compare import company names against existing CRM accounts
- Split the batch: Separate into "update existing" and "create new" groups
- Import carefully: Use CRM duplicate detection as a backup, not primary defense
Total time: 10-30 minutes depending on file size.
Time saved: Hours to days of cleanup work, plus all the downstream problems avoided.
This isn't complicated. It's just a habit that most teams never build — until they're drowning in duplicates and wishing they had.
Need to deduplicate a list before importing to your CRM? Upload your CSV and find company name duplicates — including abbreviations, variations, and typos — in about 60 seconds. Free for 500 rows, no signup required.
Try DedupFuzzy Free