How to use pagination when scraping websites

Not all websites place all the information on one page.

The first reason is that a large amount of information cannot even fit on a single page, and the second reason is that having multiple pages improves navigation, increases user experience, etc.  

This is especially true for e-commerce websites like Amazon, Etsy, and more. For eCommerce websites, it would be impossible to have all the products on a page.

The good news is that Hexomatic can natively handle pagination so you can extract data from multi-page websites such as web directories, search results, or product catalogs across multiple pages.

In this tutorial, we will run you through different pagination types with concrete use cases.

The very first step in practicing pagination is to create a blank scraping recipe: 

Next, add the web page to scrape data, select elements to scrape, and let’s get started with pagination.

How to use Automatic pagination

Automatic pagination is the fastest way to get content from all the necessary pages while scraping.

Let’s see how to use it, first. 

Step 1: Click on the number in the area around the pagination

After selecting the elements you want to scrape, you can get to the pagination.

To begin, click exactly on the page numbers at the bottom of the webpage for automatic pagination. Then, click Paginate (Automatic) in the pop-up window. 

Step 2: Specify the pages to paginate

Next, specify the pages to paginate. Select Number if the pages are marked in numbers, and Letter once the pages are marked in letters.

Step 3: Click Proceed

After proceeding, pagination will appear on the right side of the window.

Once you run the recipe in a workflow, you can export the results to CSV or Google Sheets.

How to use Advanced pagination (for number-based pagination)

If you are unable to click on the area of pagination at the bottom of the page or in any other case when automatic pagination doesn’t work, you can use advanced pagination.

In this case, we will show you how to deal with number-based pagination, where each subsequent page in the results uses a number.

For example:

Please note that the advanced pagination option can be used only for Full-stack browser mode, so be sure to set it after adding the web page and previewing it in the scraping recipe builder.

While trying to paginate websites, different URLs of pages may appear. For example:

A. https://hexomatic.com/academy/page/2

https://hexomatic.com/academy/page/3/

B. https://www.etsy.com/ca/shop/CreativeBubblesLab?page=1#items

https://www.etsy.com/ca/shop/CreativeBubblesLab?page=2#items

Now let’s see what steps to take with different URL types.

Example A

Step 1. Select Advanced pagination

To get started, you need to click Add action and select Advanced pagination from the pop-up window.

To get started, you need to click Add action and select Advanced pagination from the pop-up window. 

Step 2. Capture the URL

For example, if you have added https://hexomatic.com/academy/ to the scraping recipe builder and now want to use pagination, you need to go to the 2nd page (https://hexomatic.com/academy/page/2/) and capture the URL.

Step 3: Paste the URL into the pop-up window

Next, paste the captured URL to the pop-up window, appearing after selecting Advanced pagination. Then, add %page% instead of the page number and specify the targeted pages in the fields below.

Step 4: Click proceed to see the results

After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.

After, save the recipe. 

To view the scraped data and export it to Google Sheets, you should run the recipe in a workflow.

Example B

Now, let’s see how to go through the same process using a page URL with a different structure.

Step 1. Select Advanced pagination

To get started, you need to click Add action and select Advanced pagination from the pop-up window. 

Step 2. Capture the URL

In this example, we are scraping  https://www.etsy.com/ca/shop/CreativeBubblesLab?page=1#items. So, we will go to the second page and capture it.

Step 3: Paste the URL into the pop-up window

Next, paste the captured URL to the pop-up window, appearing after selecting Advanced pagination. Then, add %page% instead of the page number and specify the targeted pages in the fields below. 

Step 4: Click proceed to see the results

After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.

After, save the recipe. 

To view the scraped data and export it to Google Sheets, you should run the recipe in a workflow. 

How to use advanced pagination with pages in alphabetical order

In the previous section, we explained how to use pages in numeric order. 

Additionally, Hexomatic allows you to paginate pages in alphabetical order like this.

Step 1: Select advanced pagination

After selecting all the necessary elements on the page to scrape, you need to click Add actions and select Advanced pagination from the pop-up window.

Here, the URLs will look as follows:https://www.netgalley.com/catalog/publishers/all/B
https://www.netgalley.com/catalog/publishers/all/C
https://www.netgalley.com/catalog/publishers/all/D

So, first, you need to go to page B and capture the URL in the form displayed in the window as an example. (example.com/%page% ). Note that you need to select Letter as the type. Then, add the pages to be scraped (for example, from b to d).

Step 2: Click proceed to see the results

After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.

Once you run the recipe in a workflow, you can export the results to CSV and Google Sheets

How to use pagination when there is an increment gap between the pages

Most websites use 1,2,3 as their increments, but some websites opt for larger gaps in their pagination, for example, URLs of pages may appear like:

domain.com/results/10
domain.com/results/20
domain.com/results/30

In this section, we will demonstrate how you can paginate in such cases.

Before getting started with pagination, open the targeted web page in your browser and get to the 2nd page of the job listing and capture it. Paste the url into the scraping recipe builder and click Preview.

Step 1: Use advanced pagination

After choosing the advanced pagination option from the pop-up window, add the URL of the targeted website, in the form displayed in the window as an example.
(https://example.com/%page%).

 As the pages are displayed in gaps, you need to use the Gap option of Advanced pagination. In the Gap field, select how much gap there is between the pages (For example, if the sequence is 10,20,30.. The gap size is 10).

Step 3: Click proceed to see the results

After proceeding, the URLs of the paginated pages will be displayed on the right side of the window.

Once you run the recipe in a workflow, you can export the results to CSV or Google Sheets.

How to use Dynamic pagination


What if the website you are scraping uses JavaScript-based pagination? In this case, you won’t find actual links to results pages in this format:

domain.com/results/10
domain.com/results/20
domain.com/results/30

Hexomatic offers another method for scraping websites that use JavaScript-based pagination called dynamic pagination.

Dynamic pagination method is used when the URL of the next page does not change.

Step 1: Select advanced pagination

Click on the Next button on the pagination area at the bottom of the page and select advanced pagination button in the pop-up window. 

Step 3: Click proceed to see the results

After proceeding, the pagination cards will be displayed on the right side of the window.

After running the recipe in a workflow, you can export the results to CSV or Google Sheets. 


Automate & scale time-consuming tasks like never before

Hexomatic. The no-code, point and click work automation platform.

Harness the internet as your own data source, build your own scraping bots and leverage ready made automations to delegate time consuming tasks and scale your business.

No coding or PhD in programming required.