How to perform 2-step scraping

How to Perform 2-step Scraping

Two-step web scraping is a powerful method for capturing large amounts of data from websites quickly and efficiently. It works by breaking the process down into two stages:

The first step involves gathering information from the initial web page (for example URLs from a product listing page) with the second step capturing data from each listing (for example product pages or listing pages).

If you’re an online retailer this tactic can help you scrape a large number of product listings in order to capture all the necessary products in a structured data format. 

Doing this manually will take you ages, but using a web scraper like Hexomatic you can automate this task saving you tons of time and effort. 

In this tutorial, we will show you how to perform 2-step scraping to capture product listings and single product data and how to run this in a single workflow to get all this data in a structured format in a convenient Google Sheets or a CSV file. 

To get started, be sure to have a account.

How to scrape product listings 

Let’s first see how to scrape product listings from a category page of an eCommerce website.

Before getting started, go to the targeted website and capture the desired page URL. 

Step 1: Create a new scraping recipe

From your dashboard, create a new blank scraping recipe

scraping recipe

Step 2: Add the page URL to Hexomatic

Next, paste the copied page URL to Hexomatic and click Preview

For this website and other websites that use JS or HTML and may have previewing issues, it is recommended to set Full-Stack as the browser mode.

You will now have the overview of the page. In case a pop-up window appears on the page, you can close it using the Click action. 

Step 3:  Select elements to scrape

Once the page has loaded, you can select the desired elements to scrape. In this case, we are going to scrape product URLs.

To do this, click on the first product URL and choose Select All.

 Next, set Link URL as the type of the element. 

Then, Save the recipe.

How to scrape detailed page data

Now that we have the URLs of our desired product pages from a category page, let’s see how to scrape detailed data from a single product page. 

This time, we will navigate to one of the single product pages from our list and copy the page URL. 

Step 1: Create a new scraping recipe

From your dashboard, create a new blank scraping recipe as we did in the first section.

scraping recipe

Step 2: Add the page URL to Hexomatic

Next, paste the copied page URL to Hexomatic and click Preview

Step 3: Select elements to scrape

Once the page has loaded, you can specify the elements to scrape. 

Let’s scrape the product title, the price, and the main image. 
Click on each element, choose Select single, and set the element type. You need to set Text for the product title and Source URL for the image. You can set Text as the element type for the price if you want to scrape the price along with the currency, and Number if you want to get only the number.

After selecting all the elements, save the recipe by clicking on the Save button.

How to combine 2 recipes to get data from a list of URLs

Now, let’s explore how to combine the 2 recipes to scrape all the data, including from a list of URLs. 

Step 1:  Create a new workflow 

Get back to your dashboard and go to the “Scraping recipes” section. Then, find the first recipe you created and use it in a workflow.

Step 2: Add the second recipe

Next, add the product detailed page recipe to the workflow, selecting product urls as the source.

Then, click Continue

Step 3: Run the workflow

You can run your workflow simply by clicking Run Now

Step 4: View and save the results

Once the workflow has finished running, you can export the results to CSV or Google Sheets. 

As you can see, we automatically scraped product URLs from a category page, including detailed data for each URL in just a few clicks by implementing the 2 step scraping method of Hexomatic. 

Automate & scale time-consuming tasks like never before

Hexomatic. The no-code, point and click work automation platform.

Harness the internet as your own data source, build your own scraping bots and leverage ready made automations to delegate time consuming tasks and scale your business.

No coding or PhD in programming required.

Scroll to Top