Python search and matching skills: master search () and match () from zero to high

overview

In Python, regular expressions are a powerful tool for manipulating strings. search() and match() are two commonly used regular expression methods in the re module of the Python standard library. This article will explain in detail the use of these two methods, from entry to mastery.


Table of contents

  1. Introduction to Regular Expressions

  2. Use of the search() method

  3. Use of match() method

  4. Important regular expression metacharacters

  5. The difference between search() and match()

  6. Use compiled regular expressions

  7. Example: Matching valid email addresses

  8. Example: Match date format

  9. Summarize


1. Introduction to regular expressions

A regular expression is an expression that describes a string pattern and is used to search, match, and replace strings in text. It uses specific grammar rules to define a pattern for a sequence of characters. In Python, the re module provides support for regular expressions. By using the search() and match() methods, we can perform string matching and searching.

2. Use of the search() method

The search() method is used to search the entire string for the first occurrence of a matching regular expression. Returns a match object if a matching substring is found, or None otherwise.

import re

# 定义正则表达式
pattern = r'\d+'

# 定义目标字符串
text = "Hello 123 World 456"

# 使用search()方法搜索匹配的子串
match = re.search(pattern, text)

if match:
    print("找到匹配的子串:", match.group())  # 输出:找到匹配的子串: 123
else:
    print("未找到匹配的子串")

In the above code, we first define a simple regular expression r'\d+' to match one or more numbers. Then, we define the target string text, which contains the number "123". Use the search() method to search for the first matching substring in the target string and output the result.

3. Use of match() method

The match() method is used to match the regular expression from the beginning of the string. Returns a match object if a matching substring is found, or None otherwise.

import re

# 定义正则表达式
pattern = r'\d+'

# 定义目标字符串
text = "123 Hello World 456"

# 使用match()方法从字符串开头开始匹配
match = re.match(pattern, text)

if match:
    print("找到匹配的子串:", match.group())  # 输出:找到匹配的子串: 123
else:
    print("未找到匹配的子串")

In the above code, we put the number "123" in the target string text at the beginning of the string. Use the match() method to start matching from the beginning, and find the matching substring "123".

4. Important regular expression metacharacters

In regular expressions, there are some special characters called metacharacters, which have special meanings. Here are some important regular expression metacharacters:

  • .: Matches any character except newline.

  • *: Matches the preceding character 0 or more times.

  • +: Matches the preceding character 1 or more times.

  • ?: Matches the preceding character 0 or 1 time.

  • ^: matches the beginning of the string.

  • $: Matches the end of the string.

  • []: Match any character in the brackets.

  • |: Matches any one of two or more expressions.

These metacharacters are available in both the search() and match() methods.

5. The difference between search() and match()

The main difference between the search() and match() methods is the starting position of the search:

  • The search() method searches the entire string for the first matching substring, without limiting the starting position of the search.

  • The match() method starts matching from the beginning of the string and only finds matching substrings at the beginning of the string.

import re

# 定义正则表达式
pattern = r'\d+'

# 定义目标字符串
text = "123 Hello World 456"

# 使用search()方法搜索匹配的子串
match_search = re.search(pattern, text)

# 使用match()方法从字符串开头开始匹配
match_match = re.match(pattern, text)

if match_search:
    print("search()找到匹配的子串:", match_search.group())  # 输出:search()找到匹配的子串: 123
else:
    print("search()未找到匹配的子串")

if match_match:
    print("match()找到匹配的子串:", match_match.group())  # 输出:match()找到匹配的子串: 123
else:
    print("match()未找到匹配的子串")

In the above code, we use the search() and match() methods to search separately. The matching substring "123" can be found using the search() method, and the matching substring "123" is also found using the match() method, because "123" is exactly at the beginning of the string.

6. Using compiled regular expressions

When we need to use the same regular expression multiple times, we can compile the regular expression first to improve efficiency.

import re

# 定义正则表达式
pattern = r'\d+'

# 定义目标字符串
text = "Hello 123 World 456"

# 编译正则表达式
regex = re.compile(pattern)

# 使用编译后的正则表达式进行搜索
match = regex.search(text)

if match:
    print("找到匹配的子串:", match.group())  # 输出:找到匹配的子串: 123
else:
    print("未找到匹配的子串")

In the above code, we first use the re.compile() function to compile the regular expression to get a compiled regular expression object regex. We can then use this regex object multiple times to search, which improves efficiency.

7. Example: Matching valid email addresses

Let us understand the use of search() and match() methods more deeply through an example. Let's write a regular expression that matches valid email addresses.

import re

# 定义正则表达式
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

# 定义目标字符串
emails = [
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "invalid_email"
]

# 使用search()方法匹配有效的邮箱地址
for email in emails:
    match = re.search(pattern, email)
    if match:
        print("有效的邮箱地址:", match.group())
    else:
        print("无效的邮箱地址")

In the above code, we define a complex regular expression to match valid email addresses. Then, we define a list emails, which contains some email addresses. Use the search() method to match email addresses one by one and output the results.

8. Example: match date format

Let's look at another example, we write a regular expression to match the format of the date.

import re

# 定义正则表达式
pattern = r'\d{4}-\d{2}-\d{2}'

# 定义目标字符串
dates = [
    "2023-07-30",
    "2023/07/30",
    "30-07-2023",
    "07-30-2023",
    "2023-13-30"
]

# 使用search()方法匹配日期格式
for date in dates:
    match = re.search(pattern, date)
    if match:
        print("匹配的日期格式:", match.group())
    else:
        print("无效的日期格式")

In the above code, we define a simple regular expression r'\d{4}-\d{2}-\d{2}' to match dates in the format "YYYY-MM-DD". Then, we define a list dates, which contains some date strings. Use the search() method to match date formats one by one and output the result.

9. Summary

Through the explanation of this article, we have learned about the use of search() and match(), two regular expression methods commonly used in Python, from entry to proficiency.

  • The search() method is used to search the entire string for the first occurrence of a matching regular expression.

  • The match() method is used to match the regular expression from the beginning of the string.

We also learned about some important regular expression metacharacters, and how to use compiled regular expressions to be more efficient. Finally, through examples, we have a deep understanding of the use of the search() and match() methods in practical applications. After mastering the basic knowledge and methods of these regular expressions, we can better handle strings, perform effective matching and searching operations, and thus write efficient and flexible Python code.

Guess you like

Origin blog.csdn.net/Rocky006/article/details/132181008