Not all websites place all the information on one page.
The first reason is that a large amount of information cannot even fit on a single page, and the second reason is that having multiple pages improves navigation, increases user experience, etc.
This is especially true for e-commerce websites like Amazon, Etsy, and more. For eCommerce websites, it would be impossible to have all the products on a page.
The good news is that Hexomatic can natively handle pagination so you can extract data from multi-page websites such as web directories, search results, or product catalogs across multiple pages.
In this tutorial, we will run you through different pagination types with concrete use cases.
The very first step in practicing pagination is to create a blank scraping recipe:
Next, add the web page to scrape data, select elements to scrape, and let’s get started with pagination.
How to use Automatic pagination
Automatic pagination is the fastest way to get content from all the necessary pages while scraping.
Let’s see how to use it, first.
Step 1: Click on the number in the area around the pagination
After selecting the elements you want to scrape, you can get to the pagination.
To begin, click exactly on the page numbers at the bottom of the webpage for automatic pagination. Then, click Paginate (Automatic) in the pop-up window.
Step 2: Specify the pages to paginate
Next, specify the pages to paginate. Select Number if the pages are marked in numbers, and Letter once the pages are marked in letters.
Step 3: Click Proceed
After proceeding, pagination will appear on the right side of the window.
Once you run the recipe in a workflow, you can export the results to CSV or Google Sheets.
How to use Advanced pagination (for number-based pagination)
If you are unable to click on the area of pagination at the bottom of the page or in any other case when automatic pagination doesn’t work, you can use advanced pagination.
In this case, we will show you how to deal with number-based pagination, where each subsequent page in the results uses a number.
For example:
Please note that the advanced pagination option can be used only for Full-stack browser mode, so be sure to set it after adding the web page and previewing it in the scraping recipe builder.
While trying to paginate websites, different URLs of pages may appear. For example:
A. https://hexomatic.com/academy/page/2
https://hexomatic.com/academy/page/3/
B. https://www.etsy.com/ca/shop/CreativeBubblesLab?page=1#items
https://www.etsy.com/ca/shop/CreativeBubblesLab?page=2#items
Now let’s see what steps to take with different URL types.
Example A
Step 1. Select Advanced pagination
To get started, you need to click Add action and select Advanced pagination from the pop-up window.
To get started, you need to click Add action and select Advanced pagination from the pop-up window.
Step 2. Capture the URL
For example, if you have added https://hexomatic.com/academy/ to the scraping recipe builder and now want to use pagination, you need to go to the 2nd page (https://hexomatic.com/academy/page/2/) and capture the URL.
Step 3: Paste the URL into the pop-up window
Next, paste the captured URL to the pop-up window, appearing after selecting Advanced pagination. Then, add %page% instead of the page number and specify the targeted pages in the fields below.
Step 4: Click proceed to see the results
After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.
After, save the recipe.
To view the scraped data and export it to Google Sheets, you should run the recipe in a workflow.
Example B
Now, let’s see how to go through the same process using a page URL with a different structure.
Step 1. Select Advanced pagination
To get started, you need to click Add action and select Advanced pagination from the pop-up window.
Step 2. Capture the URL
In this example, we are scraping https://www.etsy.com/ca/shop/CreativeBubblesLab?page=1#items. So, we will go to the second page and capture it.
Step 3: Paste the URL into the pop-up window
Next, paste the captured URL to the pop-up window, appearing after selecting Advanced pagination. Then, add %page% instead of the page number and specify the targeted pages in the fields below.
Step 4: Click proceed to see the results
After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.
After, save the recipe.
To view the scraped data and export it to Google Sheets, you should run the recipe in a workflow.
How to use advanced pagination with pages in alphabetical order
In the previous section, we explained how to use pages in numeric order.
Additionally, Hexomatic allows you to paginate pages in alphabetical order like this.
Step 1: Select advanced pagination
After selecting all the necessary elements on the page to scrape, you need to click Add actions and select Advanced pagination from the pop-up window.
Here, the URLs will look as follows:https://www.netgalley.com/catalog/publishers/all/B
https://www.netgalley.com/catalog/publishers/all/C
https://www.netgalley.com/catalog/publishers/all/D
So, first, you need to go to page B and capture the URL in the form displayed in the window as an example. (example.com/%page% ). Note that you need to select Letter as the type. Then, add the pages to be scraped (for example, from b to d).
Step 2: Click proceed to see the results
After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.
Once you run the recipe in a workflow, you can export the results to CSV and Google Sheets
How to use pagination when there is an increment gap between the pages
Most websites use 1,2,3 as their increments, but some websites opt for larger gaps in their pagination, for example, URLs of pages may appear like:
domain.com/results/10
domain.com/results/20
domain.com/results/30
In this section, we will demonstrate how you can paginate in such cases.
Before getting started with pagination, open the targeted web page in your browser and get to the 2nd page of the job listing and capture it. Paste the url into the scraping recipe builder and click Preview.
Step 1: Use advanced pagination
After choosing the advanced pagination option from the pop-up window, add the URL of the targeted website, in the form displayed in the window as an example.
(https://example.com/%page%)
As the pages are displayed in gaps, you need to use the Gap option of Advanced pagination. In the Gap field, select how much gap there is between the pages (For example, if the sequence is 10,20,30.. The gap size is 10).
Step 2: Click proceed to see the results
After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.
Once you run the recipe in a workflow, you can export the results to CSV or Google Sheets.
How to use Dynamic pagination
What if the website you are scraping uses JavaScript-based pagination? In this case, you won’t find actual links to results pages in this format:
domain.com/results/10
domain.com/results/20
domain.com/results/30
Hexomatic offers another method for scraping websites that use JavaScript-based pagination called dynamic pagination.
Dynamic pagination method is used when the URL of the next page does not change.
Step 1: Select advanced pagination
Click on the Next button on the pagination area at the bottom of the page and select advanced pagination button in the pop-up window.
Step 2: Select Dynamic pagination
Next, add the web page URL and choose the Dynamic pagination option.
The Path will automatically appear in the field.
Then, specify the pages that you want to paginate.
Step 3: Click proceed to see the results
After proceeding, the pagination cards will be displayed on the right side of the window.
After running the recipe in a workflow, you can export the results to CSV or Google Sheets.
Automate & scale time-consuming tasks like never before
Marketing Specialist | Content Writer
Experienced in SaaS content writing, helps customers to automate time-consuming tasks and solve complex scraping cases with step-by-step tutorials and in depth-articles.
Follow me on Linkedin for more SaaS content