You have scraped data to enrich an existing pipeline but you have probably scraped some profiles that already exist in your database…

If you are lucky, your ATS system will deduplicate your data… but this option works sometimes & sometimes not…

For just one profile, the validation is super easy… a quick search in the platform you’ve got the answer… but what about 20, 50 or more profiles…?

 

 

This tutorial shows how to use the match function to avoid duplicate profiles…

Match Function

We have an existing pipeline about 167 profiles – We have scraped 267 profiles from a new search. Before importing these profiles we want to compare the 2 files to avoid duplicate.

The “Match function” will help us – Open your favorite spreadsheet editor – It works either in Google or Excel.

Import your “Existing Pipeline” in a first sheet and create a second one where you’ll import your “scraped data” – Now your data are in the same spreadsheet in two different sheets.

The MATCH function searches for a specified item in a range of cells and then returns the relative position of that item in the range.

MATCH(lookup_value, lookup_array, [match_type])

The MATCH function syntax has the following arguments:

  • lookup_value    Required. The value that you want to match in lookup_array. For example, when you look up someone’s number in a telephone book, you are using the person’s name as the lookup value, but the telephone number is the value you want. The lookup_value argument can be a value (number, text, or logical value) or a cell reference to a number, text, or logical value.
  • lookup_array    Required. The range of cells being searched.
  • match_type    Optional. The number -1, 0, or 1. The match_type argument specifies how your editor matches lookup_value with values in lookup_array. The default value for this argument is 1.

In our example, we’ll check if the “Scraped data” already exists or not in the “Existing Pipeline” sheet.

We’ll use the Full name to deduplicate our scraped data.

We will match the Full name of the “scraped data” sheet and look if it exists in the “existing pipeline” full name column.

Match function building :

I create a Column called Match – the title of the column doesn’t matter.

I select “Adrien Mater” – the first cell of the column “Full Name”

I select the column where I want to research my first element “Adrien Mater”.

Match Type will be “0” for a strict matching.

This formula is for the first cell of our sheet: “=MATCH(B2,‘Existing Pipeline’!A:A,0)”

We can now increment the function of all the Row & check the result…

 

We have to create a filtered view and exclude all the row with a number…

Filter by values :

 

Select all & Clear – then select #N/A

 

After filtering – We obtain only the data that aren’t present in the “Existing Pipeline” – We had 26 duplicates according to the counter…

 

We can now add these data to our pipeline without any risk of duplicate.