How to Remove Duplicate Lines from a List

What counts as a duplicate?

A duplicate is any line that appears more than once in a list. But “appears more than once” has two interpretations depending on whether you treat uppercase and lowercase letters as the same character.

Case-sensitive deduplication

apple and Apple are treated as different lines. Neither is removed.

apple✓

Apple✓

apple×

banana✓

Case-insensitive deduplication

apple and Apple are treated as the same. Only the first occurrence is kept.

apple✓

Apple×

apple×

banana✓

For most practical tasks — email lists, keyword lists, product names, URLs — case-insensitive deduplication is what you want. [email protected] and [email protected] are the same address.

For code, file paths, or data where case carries meaning — use case-sensitive mode. /Users/john and /users/john are different paths on a case-sensitive filesystem.

Three deduplication modes

Removing duplicates is not the only option. Sometimes you want to do the opposite.

Keep unique lines only

The standard mode. Removes all duplicate occurrences and returns each line exactly once. The first occurrence of each line is kept; subsequent copies are removed.

Use when: Cleaning an email list, deduplicating a keyword set, removing redundant tags.

Keep only the duplicates

Returns only the lines that appeared more than once — each shown once. This is useful for auditing: it shows you exactly which items were repeated without showing the full list.

Use when: Finding which items were double-entered in a spreadsheet export. Checking which keywords appear in multiple campaigns.

Mark duplicates in place

Returns the full original list with duplicate lines labelled. The first occurrence is marked as the original; subsequent occurrences are marked as duplicates. Useful when you need to see context around duplicates before deciding what to remove.

Use when: Auditing a large list before cleanup. Identifying patterns in where duplicates appear.

Common sources of duplicates

Duplicates usually appear in one of a few predictable ways:

CSV exports from databases or CRMs: Rows get duplicated when joining tables, merging exports, or running the same export twice.
Copy-paste from multiple sources: Combining lists from two spreadsheet tabs, two emails, or two search results will almost always produce overlapping entries.
Form submissions: Users who submit a form twice (double-click, back button, retry) create duplicate records.
Tag and category lists: The same tag entered with different capitalisation or spacing — "machine learning" vs "Machine Learning" vs "machine-learning" — creates near-duplicates that are technically different.
URL lists and sitemaps: Trailing slashes, query parameters, and case differences all create URL variants that point to the same page but appear as separate entries.