Two-step web scraping is a powerful method for capturing large amounts of data from websites quickly and efficiently. It works by breaking the process down into two stages:
The first step involves gathering information from the initial web page (for example URLs from a product listing page) with the second step capturing data from each listing (for example product pages or listing pages).
If you’re an online retailer this tactic can help you scrape a large number of product listings in order to capture all the necessary products in a structured data format.
Doing this manually will take you ages, but using a web scraper like Hexomatic you can automate this task saving you tons of time and effort.
In this tutorial, we will show you how to perform 2-step scraping to capture product listings and single product data and how to run this in a single workflow to get all this data in a structured format in a convenient Google Sheets or a CSV file.
To get started, be sure to have a Hexomatic.com account.
How to scrape product listings
Let’s first see how to scrape product listings from a category page of an eCommerce website.
Before getting started, go to the targeted website and capture the desired page URL.
Step 1: Create a new scraping recipe
From your dashboard, create a new blank scraping recipe.
Step 2: Add the page URL to Hexomatic
Next, paste the copied page URL to Hexomatic and click Preview.
For this website and other websites that use JS or HTML and may have previewing issues, it is recommended to set Full-Stack as the browser mode.
You will now have the overview of the page. In case a pop-up window appears on the page, you can close it using the Click action.
Step 3: Select elements to scrape
Once the page has loaded, you can select the desired elements to scrape. In this case, we are going to scrape product URLs.
To do this, click on the first product URL and choose Select All.
Next, set Link URL as the type of the element.
Then, Save the recipe.
How to scrape detailed page data
Now that we have the URLs of our desired product pages from a category page, let’s see how to scrape detailed data from a single product page.
This time, we will navigate to one of the single product pages from our list and copy the page URL.
Step 1: Create a new scraping recipe
From your dashboard, create a new blank scraping recipe as we did in the first section.
Step 2: Add the page URL to Hexomatic
Next, paste the copied page URL to Hexomatic and click Preview.
Step 3: Select elements to scrape
Once the page has loaded, you can specify the elements to scrape.
Let’s scrape the product title, the price, and the main image.
Click on each element, choose Select single, and set the element type. You need to set Text for the product title and Source URL for the image. You can set Text as the element type for the price if you want to scrape the price along with the currency, and Number if you want to get only the number.
After selecting all the elements, save the recipe by clicking on the Save button.
How to combine 2 recipes to get data from a list of URLs
Now, let’s explore how to combine the 2 recipes to scrape all the data, including from a list of URLs.
Step 1: Create a new workflow
Get back to your dashboard and go to the “Scraping recipes” section. Then, find the first recipe you created and use it in a workflow.
Step 2: Add the second recipe
Next, add the product detailed page recipe to the workflow, selecting product urls as the source.
Then, click Continue.
Step 3: Run the workflow
You can run your workflow simply by clicking Run Now.
Step 4: View and save the results
Once the workflow has finished running, you can export the results to CSV or Google Sheets.
As you can see, we automatically scraped product URLs from a category page, including detailed data for each URL in just a few clicks by implementing the 2 step scraping method of Hexomatic.
Automate & scale time-consuming tasks like never before
Marketing Specialist | Content Writer
Experienced in SaaS content writing, helps customers to automate time-consuming tasks and solve complex scraping cases with step-by-step tutorials and in depth-articles.
Follow me on Linkedin for more SaaS content