The Goal of this tutorial is using a simple scraping tool to scrape a google search and how to prepare data for an integration to an existant pipeline using the custom option of “split to column” in Google spreadsheet …

For example :

we are looking for Java Developer in Paris, France present on Linkedin

We have 34 results :

We want to add these search of our existant pipeline. We’ll scrape all the result & add them to our pipeline.

Scraping with bookmarklet

We will used a bookmarklet to scrape all the result. Of course we could used to scrape – but for people we aren’t autorized to used Google Chrome Or Chromium, here is a good alternative…

What is a Bookmarklet?

Ask Wikipedia :
A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. 
Bookmarklets are unobtrusive JavaScripts stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. 
Bookmarklets are usually JavaScript programs.

The “Installation” of a bookmarklet is performed by creating a new bookmark, and pasting the code into the URL destination field. Alternatively, if the bookmarklet is presented as a link, under some browsers it can be dragged and dropped onto the bookmark bar. The bookmarklet can then be run by loading the bookmark normally.

For this tutorial we will used the code below – I have found it on a Seo blog – cognitiveSEO in this article : 69 Amazing SEO Bookmarklets to SuperCharge Your Internet Marketing– by the way, there a lot of cool stuff here.

Instant Google SERP Scraper Bookmarklet

Quickly SCRAPE the Rankings for any search you do with the click of a button. You can Export the Links,Anchors or the Full Data as CSV for more processing, if needed. You could use this to extract all the indexed pages for a site for example or identify the ranking sites for a specific keyword.

Go back to your search and click on your Bookmarklet…  Happy scraping…. 😉

Cleanning “import data”

Copy & Paste in a your favorite text editor & save as a Txt file.

Then go to your Spreadsheet editor – For this Tutorial we’ll used Google Spreadsheet –  import your Txt File.

File>Import>Upload> « your doc »

We have our file with 2 columns – We need to split the second one – to obtain a “Full Name” column and other columns with title & company…

To split the column B, we will used the “split text to column” item in the data menu. Select all the “B column” – Data>Split text to column

Select the custom option to split your text

The first iteration is quiet good. But we can adjust for a better result…

After few iteration… using ” | ”  & “…” as “custom separator” in few second, the result is perfect. And we can compare these results with our existing pipeline to complete it.

Compare your existing pipeline with scraping data with Match function… read the article