Regex 101 tutorial — A quick Regex cheatsheet with examples

A Regular expression (in short- regex) is a text string that enables creating patterns for matching, locating, and managing text. They are commonly used for search and replace operations and are super powerful for solving any parsing issues.

You may wonder how regex can be used in web scraping and what advantages it can bring.

Well. In web scraping, regex is ideal for getting only the relevant information from a huge amount of data. It’s especially useful for websites that do not have a clear structure to pull out key elements. Additionally, regex can be used for validating any character combinations (for example, special characters).

Hexomatic allows you to use regular expressions to scrape data:

#1 Using our Regex automation.

#2 Applying regular expressions in our scraping recipe builder.

The tutorial reveals the Regex cheatsheet and explains how to use our regex automation versus applying regex in the scraping recipe builder and when to use each.

Table Of Contents

#1 How to use regex automation to scrape data
#2 How to use regex to scrape data using only our scraping recipe builder
- Example

#1 How to use regex automation to scrape data

This section demonstrates how to use our regex automation.

First and foremost, this automation can be used when it is necessary to exclude unnecessary text and scrape only the relevant content.

For example, you want to scrape only full names from the list but while using the scraping recipe, other data is selected automatically.

This can’t be excluded via the scraping recipe builder. Here is where our Regex automation comes in handy.

Example A

Step 1: Create a blank scraping recipe

To get started, go to your dashboard and create a blank scraping recipe.

Step 2: Add the web page URL

Next, add the web page URL to scrape data.

Step 3: Select elements to scrape

Here, we are going to scrape full names of experts by choosing Select All option.

The problem here is that while selecting full names, the academic titles are being selected automatically and scraping recipe builder doesn’t allow to exclude these.

That’s why we need to scrape full names with academic titles, then run our Regex automation to get the desired results.

After selecting, Save the recipe.

Step 4: Use the recipe in a workflow

To run the Regex automation on the scraped data, you need to use the previously created recipe in a workflow.

Step 5: Add the Regex automation

Next, you need to add the Regex automation, select “experts_names” as the source, “Text” as the source type.

Then, add the ^[^—]* regex.

Step 6: Run the workflow

Finally, you can run your workflow.

Step 7: View the results

Once the workflow has finished running, you can view the results and export them to CSV or Google Sheets.

Example B

Now let’s see another example of using our Regex automation.

Often, the element you want to scrape is not displayed on the page directly, but it exists in the HTML code. Good news is that Hexomatic can solve this problem in a few clicks.

In this case, it is necessary to use the combination of our HTML grabber automation+ Regex automation.

In this example, we are going to get a publication date from the HTML code of a specific web page. Here, regular expression helps to save time and effort in the search for a specific element to scrape and the HTML grabber helps to get the HTML code of the page.

Here we go.

Step 1: Create a new workflow

Go to your dashboard and create a new workflow by choosing the blank option.