How to create scraping recipes

In this short tutorial we are going to run through how to create your own scraping recipes and use these inside a workflow.

Scraping recipes enable you to create your own bots to scrape data from any website via a point and click browser.

There are two ways of using scraping recipes:

-You can create a scraping recipe based on a single search result or category page to automatically capture all the listing names and urls (paginating through all the pages available).

-Or you can provide a list of product or listing page urls (using the data input automation) and for each url scrape specific fields (keep in mind all pages need to share the same html template).

In this tutorial we will cover both scenarios and show you how you can combine both approaches to scrape a wide range of different websites.

How to create a scraping recipe to capture a list of product or listings URLS

To get started go to the website you want to scrape and find the search result page that shows all the listings or products you want to capture. We recommend setting the maximum number of products per page if there is such an option then copy the URL from your browser.

Then head over to the scraping recipes section and click the “Create new recipe” button.

Next, use the URL you captured and click preview.

Once the page has loaded click on any element you want to capture.

Then choose whether you want to select that specific element only or Hexomatic can select all the matching elements found on the page. In this case use the “select all” option to get all the product names and product destination URLS.


Next label your field, for example “Product URL” and choose the type of data you want to capture from the field. For example:

-Text
-Number
-Link URLS
-Source URLS (for example image urls)

You can specify as many fields to capture as needed on the page, but in this scenario we are interested in getting the product names and URLS.

If your page contains products or listings over more than one page, find the pagination options (typically on the bottom of the page) and click on the “next page” link or button.

Then choose the pagination option from the menu and specify how many search results pages exist. For example if you have 5 pages of products in total, use 4.

Lastly name your recipe at the top of the page and click save.

Your recipe will be saved in the scraping recipes section.

To run your scraping recipes, simply create a new workflow and drag and drop your scraping recipe. Workflows also enable you to chain additional scraping recipes or automations as well as specify whether to run this one time, or with a recurring schedule. 

When the workflow has completed, you will be able to download the result in a .csv file. 

How to create a scraping recipe to capture specific fields from single listing or product pages

In this scenario, we already have a list of product or listing urls that all share the same html template and we want to capture the product or listing details for each.

To get started go to one of the single listing or product pages with your browser and copy the URL.

Then head over to the scraping recipes section and click the “Create new recipe” button.

Next, use the URL you captured and click preview.


Then click on each of the elements you want to capture, choosing the “Select Single” option each time. For example you could capture the:

-Product title
-Product description
-Product image
-Product price
Etc…

Label each field and choose whether you want to capture text, numbers, source urls (for image urls) or destination URLS.


Next name your recipe and click “Save”.

To run your scraping recipe, create a new workflow for it, you then have two ways to run this:

1/ Using the “Data input” automation and copy pasting the list of URLS you generated in the previous scraping recipe. Then chaining the scraping recipe to fetch the product details (specifying data input as the source of the scraping recipe).

2/ Chaining together both scraping recipes one after the other


To do this simply create a new automation and first add the scraping recipe that fetches the listing/product URLS.
Then add the scraping recipe that fetches the product / listing details specifying the URL field as the source.

Next save the recipe name, choose whether to run now or schedule a recurring task.

When the workflow has completed, you will be able to download the result in a .csv file. 


Automate & scale time consuming tasks like never before

Hexomatic. The no-code, point and click work automation platform.

Harness the internet as your own data source, build your own scraping bots and leverage ready made automations to delegate time consuming tasks and scale your business.

No coding or PhD in programming required.