How to Remove Duplicates in Excel When the Names Don't Match Exactly
If you've ever clicked Excel's "Remove Duplicates" button and felt like it didn't actually remove all your duplicates, you're not imagining it.
Excel's built-in Remove Duplicates is a fine tool — for a very narrow problem. It removes rows that are identical, character for character. That's it. The moment your data has any inconsistency — a different capitalization, an extra space, a "Corp." instead of "Corporation" — it stops working.
For most real-world spreadsheets, that's a problem. Because real data is messy.
Let's walk through what Excel can and can't do, and how to actually find the duplicates hiding in your file.
What Excel's Remove Duplicates Actually Does
Open Excel. Highlight your data. Click Data → Remove Duplicates. Excel scans for rows that are exactly the same and deletes the extras.
If your data looks like this:
| Company |
|---|
| Acme Corp |
| Acme Corp |
| Acme Corp |
Excel will reduce it to a single "Acme Corp" row. Perfect.
But if your data looks like this:
| Company |
|---|
| Acme Corp |
| ACME Corporation |
| Acme Corp. |
| Acme, Corp |
Excel sees four unique rows. It keeps all of them. To Excel, "Acme Corp" and "Acme Corp." (with a period) are completely different strings.
This is technically correct — those are different strings. But they're obviously the same company to a human. And in any spreadsheet that's been touched by more than one person over more than a few months, this is what your data looks like.
Why This Happens to Almost Everyone
You probably didn't create these variations on purpose. They accumulate naturally:
- Sales rep A types "Acme Corp"
- Sales rep B types "ACME Corporation" (the legal name)
- The data import from your trade show software adds "Acme, Corp"
- A manual entry months later adds "Acme Corp." with a period
By the time you sit down to clean the file, you have four versions of the same company, and Excel can't help you collapse them.
The Trick That Sometimes Works
There's an intermediate technique that catches some of these duplicates. It involves using a helper column to "normalize" your text before running Remove Duplicates.
Here's how. Let's say your company names are in column A, starting at A2. In cell B2, type:
=UPPER(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,".",""),",","")," "," ")))
This formula does four things:
SUBSTITUTE(A2,".","")removes periods- The next
SUBSTITUTEremoves commas - The next removes double spaces
TRIMremoves leading/trailing spacesUPPERconverts everything to uppercase
Drag this formula down to apply it to every row. Now column B has a "cleaned" version of each company name.
Run Remove Duplicates on column B (or use it as the matching column with column A as your data). You'll catch more duplicates than before.
For our earlier example:
| Original (A) | Normalized (B) |
|---|---|
| Acme Corp | ACME CORP |
| ACME Corporation | ACME CORPORATION |
| Acme Corp. | ACME CORP |
| Acme, Corp | ACME CORP |
Three of the four now match in column B. Progress! But "ACME Corporation" is still seen as different — it has the full word "Corporation" instead of "Corp."
You could extend the formula to strip suffixes like "Corp," "Corporation," "Inc," "LLC," and so on. But the formula gets unwieldy fast, and you'll never catch every variation.
The Limitation You Can't Get Around
Here's the fundamental problem with Excel-based approaches: they all require exact string matching at some point. Even with helper columns and normalization, you're just making the strings more similar so they'll match exactly.
But many real-world duplicates can't be fixed by normalization alone:
| Duplicate Pair | Even Excel formulas can't match these |
|---|---|
| "Acme" vs "Acme Holdings" | Different words, no formula fixes this |
| "GE" vs "General Electric" | Abbreviation vs full name |
| "JPMorgan" vs "J.P. Morgan Chase" | Word reordering and partial overlap |
| "Johnsen Industries" vs "Johnson Industries" | Typo (one letter difference) |
These are all the same company, but no Excel formula can make their strings identical.
To catch these, you need something Excel doesn't have: fuzzy matching.
What Fuzzy Matching Does Differently
Fuzzy matching doesn't ask "are these strings identical?" It asks "how similar are these strings?"
It compares two pieces of text and gives you a similarity score, usually as a percentage. "Acme Corp" and "ACME Corporation" might score 85% similar. "Apple Inc" and "Microsoft" might score 12%.
Then you set a threshold — say, anything above 80% is probably a duplicate — and review the matches.
We covered this in detail in fuzzy matching explained, but here's the short version: fuzzy matching catches the duplicates that exact-match tools miss, including abbreviations, typos, word reordering, and missing words.
How to Actually Do This (Three Options)
You have three practical paths forward. Pick whichever fits your situation.
Option 1: Use a Web Tool (Fastest)
This is the easiest path for most people. Export your data to CSV, upload it to a fuzzy matching tool, review matches, download the cleaned result.
Here's how it works with DedupFuzzy:
- In Excel, save your file as CSV (File → Save As → CSV)
- Go to app.dedupfuzzy.com
- Upload the CSV
- Select the column with company names
- Wait about 60 seconds — the tool finds all fuzzy matches above your similarity threshold
- Review the matches, mark any false positives
- Download the cleaned CSV
- Open it back in Excel
Total time: about 5 minutes for a list of a few thousand rows. Free for files up to 500 rows, no signup needed.
Option 2: Excel Power Query (Built-in but Tricky)
Microsoft 365 versions of Excel include Power Query, which has a "Fuzzy Merge" option. It can do fuzzy matching, but with significant limitations.
To access it: Data → Get Data → From Other Sources → Blank Query → use Power Query's Merge feature with the "Use fuzzy matching" checkbox.
Honest warning: Power Query's fuzzy merge struggles specifically with company names. It misses abbreviations entirely (won't match "GE" to "General Electric") and tends to produce false positives on common words. We covered the specific problems in Excel Power Query fuzzy merge not working. For typo correction it's fine. For real company name matching, it's frustrating.
Option 3: Excel + Python (For Coders)
If you're comfortable with Python, you can do this entirely outside Excel using the rapidfuzz library. We walked through the exact code in how to remove duplicate company names from CSV.
This gives you maximum control but requires programming knowledge. Most people skip this option.
A Real Example
Let's say you have 1,000 rows of company names in Excel. Here's what each method would actually find:
| Method | Duplicates Found | Time Required |
|---|---|---|
| Excel Remove Duplicates | ~20 | 5 seconds |
| Helper column normalization | ~45 | 15 minutes |
| Power Query Fuzzy Merge | ~70 | 30 minutes (with cleanup) |
| Web fuzzy matching tool | ~110 | 5 minutes |
In a typical messy dataset, Excel's built-in tool catches less than 20% of real duplicates. The helper column trick gets you to maybe 40%. Fuzzy matching tools catch most of the rest.
The difference matters. If your "1,000-customer database" is actually 850 unique customers, every report you generate from it is wrong.
Avoiding the Problem in the First Place
Once you've cleaned your file, here's how to prevent it from getting messy again:
Use data validation. Excel can restrict input to a predefined list. If "Acme Corp" is the canonical name, set up validation so users have to pick it from a dropdown rather than typing it.
Standardize at import. When importing data from CRMs, exports, or trade show lists, run it through a normalization step before merging it with your master file.
Schedule regular cleanups. Even with prevention, drift happens. Run a fuzzy match check quarterly. It's much faster to clean 50 new duplicates than 500 accumulated ones.
Document your conventions. If your team uses "Inc." instead of "Incorporated," write that down somewhere people will see it. Most data inconsistency comes from unclear conventions, not bad intent.
When Excel Is Enough
Don't overcomplicate this. If your file is small (under 200 rows) and the duplicates are mostly formatting issues (capitalization, periods, spaces), the helper column approach probably gets you 90% of the way there. You don't always need a fuzzy matching tool.
The fuzzy matching path is for when you have:
- More than a few hundred rows
- Real name variations (abbreviations, word reordering, typos)
- Data merged from multiple sources
- Need for accuracy in reporting
If that's your situation, stop fighting Excel. Export, run fuzzy matching externally, import the cleaned result back. It's faster than spending hours on increasingly complex helper column formulas.
Working with messy duplicates in Excel right now? Save your file as CSV and run a quick fuzzy match check — most people are shocked at how many duplicates Excel was missing. Free for 500 rows, no signup required.
🚀 Try DedupFuzzy Free