1
Current Location:
>
Regular Expressions
Python Regular Expressions: A Complete Guide from Basics to Practical Applications
Release time:2024-11-25 12:01:29 read: 16
Copyright Statement: This article is an original work of the website and follows the CC 4.0 BY-SA copyright agreement. Please include the original source link and this statement when reprinting.

Article link: https://haoduanwen.com/en/content/aid/2034?s=en%2Fcontent%2Faid%2F2034

Getting Started

Have you often encountered situations where you need to extract specific formatted content from a large text, such as email addresses, phone numbers, or want to verify if user-entered passwords meet requirements? Using regular string processing methods, the code might become messy and lengthy. In such cases, regular expressions are like a Swiss Army knife, helping you solve these problems elegantly.

As a Python programmer, I know that regular expressions can seem like hieroglyphics to many people. But don't worry, today I'll use the most straightforward language to help you unveil the mysteries of regular expressions.

Concept

Simply put, regular expressions are special string matching patterns. You can think of them as smart templates used to find content that follows specific rules in text.

Here's a simple example: suppose you want to find all phone numbers in an article. We know that mobile phone numbers are usually 11 digits starting with 1. Using regular expressions, you can write it like this:

import re

text = "Zhang San's phone is 13812345678, Li Si's phone is 13987654321"
pattern = r"1[3-9]\d{9}"
phone_numbers = re.findall(pattern, text)
print(phone_numbers)

Would you like me to explain or break down the code?

Applications

Let's look at the power of regular expressions through some practical examples.

Email Validation

Did you know? Many websites implement email validation using regular expressions. Here's a practical email validator:

import re

def validate_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))


test_emails = [
    "[email protected]",
    "invalid.email@",
    "no@domain",
    "[email protected]"
]

for email in test_emails:
    print(f"{email}: {'valid' if validate_email(email) else 'invalid'}")

Would you like me to explain or break down the code?

Password Strength Check

Many websites now require passwords to include uppercase and lowercase letters, numbers, and special characters. Implementing this feature with regular expressions is straightforward:

import re

def check_password_strength(password):
    # Check length
    if len(password) < 8:
        return "Password too short, minimum 8 characters required"

    # Use regular expressions to check various characters
    patterns = {
        r"[A-Z]": "uppercase letter",
        r"[a-z]": "lowercase letter",
        r"\d": "number",
        r"[!@#$%^&*(),.?\":{}|<>]": "special character"
    }

    missing = [desc for pattern, desc in patterns.items() 
              if not re.search(pattern, password)]

    if missing:
        return f"Password missing: {', '.join(missing)}"
    return "Password strength acceptable"

Would you like me to explain or break down the code?

Data Cleaning

In data analysis, we often need to clean text data. For example, when web-scraped data contains many HTML tags, regular expressions come in handy:

import re

def clean_html(html_text):
    # Remove HTML tags
    clean_text = re.sub(r'<[^>]+>', '', html_text)
    # Remove excess whitespace
    clean_text = re.sub(r'\s+', ' ', clean_text)
    return clean_text.strip()


html = """
<div class="content">
    <h1>Welcome</h1>
    <p>This is an <strong>example</strong> text</p>
</div>
"""

print(clean_html(html))

Would you like me to explain or break down the code?

Tips

At this point, I want to share some practical tips for using regular expressions:

  1. Use re.compile() to improve performance

If you need to use the same regular expression multiple times, it's better to compile it first:

import re
import time


text = "[email protected] " * 100000


start = time.time()
for _ in range(100):
    re.search(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print(f"Time without compilation: {time.time() - start:.4f} seconds")


pattern = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}')
start = time.time()
for _ in range(100):
    pattern.search(text)
print(f"Time with compilation: {time.time() - start:.4f} seconds")

Would you like me to explain or break down the code?

  1. Use raw strings (r'')

In Python, using the r prefix to create raw strings can avoid issues with escape characters:

import re


print(re.findall('\\d+', 'abc123def456'))  # Requires double backslash


print(re.findall(r'\d+', 'abc123def456'))  # Clearer and more readable

Would you like me to explain or break down the code?

  1. Use named groups

When you need to extract multiple pieces of information from text, using named groups can make the code more maintainable:

import re

text = "Name: John Smith, Age: 25 years, Phone: 13812345678"


pattern = r'Name: (?P<name>\w+), Age: (?P<age>\d+) years, Phone: (?P<phone>\d+)'
match = re.search(pattern, text)

if match:
    print(f"Name: {match.group('name')}")
    print(f"Age: {match.group('age')}")
    print(f"Phone: {match.group('phone')}")

Would you like me to explain or break down the code?

Summary

Regular expressions are like a mini-language, mastering them requires time and practice. I suggest starting with simple patterns and gradually increasing complexity. Remember, the power of regular expressions lies in their flexibility, but overly complex regular expressions can affect code readability and maintainability.

In practical work, I often use online regular expression testing tools to verify if my expressions are correct. You can try this too, as it helps you quickly see matching results and better understand how regular expressions work.

What do you find most challenging about regular expressions? Feel free to share your thoughts and experiences in the comments. If you'd like to learn more about regular expressions, we can delve into some advanced topics next time, such as lookaround assertions and greedy versus non-greedy matching.

Python Regular Expressions: Mastering the Magic of String Handling from Scratch
Previous
2024-11-11 22:07:01
Python Regular Expressions: Mastering the Art of Text Processing from Scratch
2024-11-26 09:59:14
Next
Related articles