What is Regex? A Guide to Regular Expressions

 





What is Regex? A Guide to Regular Expressions

Regex, short for regular expressions, is a sequence of characters that defines a search pattern. It's a powerful tool used for matching, manipulating, and validating text. Unlike simple text searches, regex allows you to find patterns, such as all email addresses, phone numbers, or specific data formats within a larger body of text.


How Regex Works

A regex pattern is composed of two main types of characters:


Literal Characters: These are characters that match themselves directly (e.g., a, 1, _).


Metacharacters: These are special characters that have a unique meaning and give regex its power (e.g., . for any character, * for zero or more occurrences).



Why Use Regex in Google Looker Studio?


Looker Studio (formerly Google Data Studio) uses regex to transform data. This is essential for:

  1. Data Cleaning: Standardizing  text entries.
  2. Filtering: Precisely selecting data rows that match a specific pattern.
  3. Extraction: Pulling out specific information, such as product IDs from URLs.



The regex syntax in Looker Studio is based on the RE2 engine. Here's a table of common metacharacters and their functions.


Common Regex Metacharacters and Their Meanings

Character(s)MeaningExample PatternMatches
.Matches any single character.a.cabc, a1c, a-c

*Matches zero or more of the preceding character.colou*rcolor, colour

+Matches one or more of the preceding character.go+glegoogle, gooogle

?Matches zero or one of the preceding character.favou?ritefavorite, favourite

[ ]Matches any single character within the brackets.[aeiou]a, e, i, o, u

[ ^ ]Matches any single character NOT within the brackets.[^0-9]a, B, @,

|Matches this OR thatcol(ou|o)r ,color or
 colour

( )Groups characters together.(ab)+ab, abab, ababab

^Matches the beginning of the string.^StartStarting now

$Matches the end of the string.End$This is the End


\d
Matches any digit (0-9).\d\d\d
123
, 456


\wMatches any word character (letters, numbers, underscore).\w+Hello_World, v1
\sMatches any whitespace character (space, tab).\s


Practical Examples in Looker Studio

Example 1: Extracting a Product ID

Query: How do I extract a product ID like PROD12345 from a URL in Looker Studio?

Formula: REGEXP_EXTRACT(URL_Field, '/products/(PROD\\d+)/')

  • /products/: Matches the literal text.

  • (): The parentheses create a capturing group.

  • PROD\\d+: Matches "PROD" followed by one or more digits.

Example 2: Cleaning Data

Query: How do I remove tracking codes like ?source=email from a URL in Looker Studio?

Formula: REGEXP_REPLACE(URL_Field, '\\?.+', '')

  • \\?: Matches the literal question mark.

  • .+: Matches one or more of any character, capturing the rest of the string.

  • '': Replaces the matched pattern with an empty string.

Example 3: Matching a Company Name or Gmail

Query: How do I find all emails that end in either @emai.com or @gmail.com?

Formula: REGEXP_CONTAINS(Email_Field, '(emai|gmail)\\.com$')

  • (): This is a capturing group that applies the OR logic to the domain names.

  • emai|gmail: Matches either the literal string emai or gmail.

  • \\.: Matches the literal dot (.) character. The backslash is needed because a dot is a metacharacter. A double backslash is used because Looker Studio requires it to represent a single literal backslash in the pattern.

  • com: Matches the literal com.

  • $: Asserts that the pattern must be at the end of the string.


Summary

Regex is an essential skill for data analysis. By understanding the core metacharacters and applying Looker Studio's regex functions, you can efficiently clean, filter, and extract valuable insights from your data

Use online tools like Regex101.com or RegExr.com to build and test your patterns in real time.

VISIT HOME PAGE 

Comments

Popular posts from this blog

What is Google data Studio or Looker studio

Google Data Studio functions corresponding Power BI DAX