I Had 4,000 Salesforce Accounts. Turns Out 1,200 Were Duplicates. Here's How I Fixed It.
Last month, our VP of Sales walked into my office with a question that seemed simple: "How many customers do we actually have?"
I pulled up Salesforce. 4,127 accounts.
"About four thousand," I said.
"Then why does finance say we have 2,800 paying customers?"
That question kicked off a two-week project that taught me more about data quality than I ever wanted to know. The short answer: 1,200 of our 4,127 Salesforce accounts were duplicates. Same companies, different records, created by different reps at different times with slightly different names.
And Salesforce's built-in duplicate management had caught almost none of them.
How We Got Here
The classic way. We'd been using Salesforce for four years. Three different sales teams had entered data during that time. We'd imported lists from trade shows, marketing campaigns, and partner referrals.
Nobody had ever done a serious data cleanup.
The symptoms were everywhere once we started looking:
- Reps were fighting over accounts that turned out to be the same company
- Our "new business" pipeline was inflated by deals attached to duplicate accounts
- Marketing was sending the same email to the same company three times because each account had different contacts
- Our territory mapping was a mess because one company might show up in two territories under different names
We just hadn't connected the dots until the VP asked the obvious question.
Why Salesforce Duplicate Rules Didn't Help
Salesforce has built-in Duplicate Rules and Matching Rules. When they work, they're great — they pop up a warning when someone creates a record that looks like a duplicate.
Ours were turned on. They just weren't catching much.
Here's why: Salesforce's standard matching rules use exact and fuzzy matching, but their fuzzy matching is… not very fuzzy. It catches "Acme Inc" vs "Acme Inc." (with a period). It might catch "Acme" vs "Acme Inc."
It does not catch:
- "Acme Corporation" vs "ACME Corp" (case + abbreviation difference)
- "The Walt Disney Company" vs "Disney" (common name vs legal name)
- "Johnson Controls International" vs "JCI" (acronym)
- "Ernst & Young Global Ltd" vs "EY" (complete rebrand)
- "PricewaterhouseCoopers" vs "PwC"
These are exactly the kinds of duplicates that accumulate in a real CRM. Nobody types the same company name twice the same way.
There's a great breakdown of why this happens in this article on why VLOOKUP can't match company names — the same fundamental limitation applies to Salesforce's built-in matching. Any tool that requires near-exact text overlap will miss most real-world company name duplicates.
The Cleanup Process
Here's exactly what I did, step by step. The whole project took about 12 working hours spread over two weeks.
Week 1: Assessment and Matching
Day 1: Full export. I exported all account records from Salesforce to CSV. Every field: Account Name, Website, Industry, Owner, Created Date, Last Activity, everything. 4,127 rows.
Day 2: Fuzzy matching. I needed to find duplicate account names, including all those abbreviation and formatting variations. Salesforce couldn't do it internally, and I wasn't about to scan 4,000 names by eye.
I uploaded the CSV to DedupFuzzy, selected the Account Name column, and let it run. In about two minutes, it had identified 1,847 potential duplicate pairs. I exported the results.
Now, not all 1,847 pairs were real duplicates. Some were false positives — "American Express" and "American Airlines" share a word but are different companies. But the tool scored each pair, so I could sort by confidence and review from the top.
Day 3-4: Manual review. This was the tedious part. I went through each pair and categorized them:
- Definite duplicate (score 85%+, same website domain) — 1,040 pairs
- Probable duplicate (score 70-85%, similar details) — 210 pairs
- Needs investigation (score 60-70%, unclear) — 145 pairs
- Not duplicates (false positives) — 452 pairs
For the "needs investigation" batch, I cross-referenced with the Website field. If two accounts with similar names shared the same domain, they were definitely duplicates. This resolved most of the ambiguous cases.
Final count: 1,194 duplicate accounts. Just under 30% of our entire database.
Week 2: Merge Strategy and Execution
Day 5: Decide what to keep. For each duplicate pair, I needed to pick a "master" record. My rules:
- Keep the record with more associated contacts
- If tied, keep the one with more activity history (emails, calls, tasks)
- If still tied, keep the older record (it probably has more institutional context)
- Always keep the more complete record (more fields filled in)
I built a spreadsheet mapping each duplicate pair to the master record ID and the record ID to merge/delete.
Day 6: Backup everything. Before touching Salesforce, I did a full data export and saved it to our shared drive. Twice. Labeled "BACKUP BEFORE DEDUP — DO NOT DELETE."
If you're about to mass-edit a CRM, backup is not optional. There's a good explanation of why in the CRM data cleaning guide — undoing a botched merge inside a CRM is exponentially harder than redoing a spreadsheet cleanup.
Day 7-8: Merge in batches. I used Salesforce's built-in Merge Accounts feature for the first 50 pairs to make sure everything worked correctly. Contacts, opportunities, activities — all transferred to the master record.
Then I used a Salesforce merge tool (DemandTools) for the remaining 1,144 merges. Doing them one-by-one in-app would have taken a week.
Day 9: Validation. After merging, I re-exported and ran fuzzy matching again to check for anything I missed. Found 23 more duplicate pairs that had been masked by the earlier duplicates (i.e., Company A had three records, and after merging two, the third still existed). Merged those manually.
Day 10: Set up prevention. The cleanup is pointless if duplicates just accumulate again. I:
- Tightened Salesforce Duplicate Rules to be more aggressive
- Created a monthly task to export accounts and run fuzzy matching externally
- Added a "standardized name" field that strips suffixes and normalizes formatting
- Briefed the sales team on naming conventions
The Results
Our Salesforce went from 4,127 accounts to 2,910. The VP of Sales finally had an accurate customer count.
But the real impact was operational:
Pipeline accuracy improved. We'd been double-counting deals because the same company had two accounts with opportunities attached to each. Removing this inflation showed our actual pipeline was 15% smaller than reported — painful to see, but better to know.
Rep productivity went up. Reps stopped wasting time researching companies that another rep already knew. Account assignments became clear.
Marketing targeting improved. With clean data, our account-based marketing campaigns became sharper. No more sending the same company three different versions of the same email.
Reporting became trustworthy. Our board deck suddenly made sense. Revenue per customer, accounts per territory, conversion rates — all the metrics that depend on accurate account counts were now reliable.
What I'd Do Differently
Start earlier. We should have done this after year one, not year four. Cleaning 4,000 records is a project. Cleaning 1,000 records is an afternoon.
Clean before import. Every time we imported a trade show list or partner referral batch, we should have deduplicated it against our existing data first. The article on matching and merging two company lists covers exactly this workflow.
Automate more. Monthly fuzzy matching should be automated, not a manual reminder. I'm looking into setting up a recurring process for this.
Lessons for Your Salesforce Cleanup
If you're in a similar situation — CRM full of duplicate accounts, no clear ownership — here's the practical advice:
Don't trust Salesforce's built-in dedup for existing data. The matching rules are fine for preventing new duplicates at creation time. They're inadequate for finding duplicates that already exist, especially when names vary significantly.
Export and match externally. Pull your accounts to CSV. Run fuzzy matching with a tool built for this. Import the results back as your merge plan.
Do it in phases. Don't try to merge 1,200 records in one sitting. Do 50, verify everything is correct, then batch the rest.
Get buy-in first. When I told the VP the database was 30% duplicates, his reaction was "fix it yesterday." That executive support made it easy to block time for the project. Without it, this kind of cleanup always gets deprioritized.
Set up prevention. Cleanup without prevention is just creating work for your future self. Tighten duplicate rules, train the team on naming conventions, and schedule recurring checks.
The Bottom Line
Your CRM is only as good as the data in it. If 30% of your accounts are duplicates, every report, every forecast, and every decision based on that data is wrong.
The cleanup took me two weeks of part-time work. The impact on our pipeline accuracy, rep productivity, and marketing targeting was worth months of work.
Don't wait for the VP to ask the hard question. Check your data now.
Think your Salesforce might have a duplicate problem? Export your Account Name column to CSV and run a quick check. DedupFuzzy finds duplicate company names — including abbreviations, formatting variations, and typos — in about 60 seconds. Free for 500 rows, no signup needed.
🚀 Try DedupFuzzy Free