Lead lists from different sources almost always have overlaps. Here's how to merge and dedupe them without losing data — or the most valuable row.
Quick answer
To deduplicate a LinkedIn lead list from multiple CSV exports, dedupe by LinkedIn profile URL (the only truly unique identifier for a person across sources), not by email or name. When the same person appears twice, keep the row with more complete data — not the first one. Scrupp handles this automatically when you bulk-upload multiple CSVs: it normalizes LinkedIn URLs (stripping query params), merges rows on URL match, and keeps the "best" version of each field across duplicates.
Step by step
6 steps — about 10-15 minutes end-to-end.
Copy rows from every source CSV into a single sheet. Keep a "source" column to track where each row came from.
Strip query parameters (?miniProfileUrn=...) and trailing slashes. Normalize to lowercase. Example: https://linkedin.com/in/john-doe/ → linkedin.com/in/john-doe
Sort the combined sheet alphabetically by the normalized URL column. This groups duplicates together.
For each group of duplicates, create a single merged row. For each field, keep the non-empty value (or the longest/most complete). This preserves data across sources.
Keep only the merged row. In Google Sheets: use =UNIQUE(A:A) on the URL column and lookup the merged values. Excel: use Remove Duplicates on URL column.
Check total count vs sum of source CSVs. The dedupe typically removes 15-30% of rows when merging from 2-3 sources. Re-run enrichment on the merged list if emails are stale.
Pro tips
Dedupe by LinkedIn URL, not email. Emails change jobs. URLs don't (except for rare renames). Name-based dedupe is the worst — different people share names.
Keep a "source" column. You'll want to know which source contributed each lead for attribution later.
Merge, don't discard. If Source A has the email and Source B has the phone, the merged row should have both.
Don't dedupe before enrichment. Enrich each source separately first, then merge. This maximizes data coverage per person.
FAQ
They'll have different LinkedIn URLs — URL-based dedupe keeps them separate. This is why URL-based dedupe is correct and name-based dedupe is dangerous.
Use an array formula with IFERROR(INDEX(MATCH)) to pick non-empty values across duplicates. Or use a pivot table with "First" or "Last" aggregation. Scrupp handles this in one click on upload.
The LinkedIn URL stays the same, but the company field will differ. Keep the most recent export's version as authoritative.
Yes — bulk upload tools accept multiple CSVs and deduplicate + merge on LinkedIn URL automatically. No manual spreadsheet work needed.
When merging 2-3 sources (e.g. Sales Navigator export + Apollo export + manual research), expect 15-30% overlap. The more targeted your ICP, the higher the overlap because multiple tools find the same people.
After. Enrich each source separately first — this maximizes data coverage per person. Then merge the enriched CSVs. During merge, keep the version of each field with the most complete data (e.g. Source A has email, Source B has phone → merged row has both).
Before importing, cross-check the deduped list against existing CRM contacts by email address. Most CRMs (HubSpot, Salesforce, Pipedrive) have built-in import deduplication — set the merge key to email and choose "update existing" to fill empty fields without overwriting.
Yes — normalize URLs first by stripping query parameters, trailing slashes, and converting to lowercase. linkedin.com/in/John-Doe/ and linkedin.com/in/john-doe?miniProfileUrn=abc become the same key after normalization.
Free Chrome extension. Pay only for successful enrichments. No credit card to start.