====== 第二十六章：正则表达式 ======

===== 本章目标 =====

完成本章学习后，你将能够：
  * 使用re模块处理正则表达式
  * 掌握常见正则模式
  * 进行文本匹配和替换

===== 正则基础 =====

<code python>
import re

# 常用模式
# .      任意字符（除换行）
# \d     数字
# \w     单词字符
# \s     空白字符
# ^      开头
# $      结尾
# *      0次或多次
# +      1次或多次
# ?      0次或1次
# {n}    n次
# {n,m}  n到m次
# []     字符类
# |      或
# ()     分组
</code>

===== re模块函数 =====

<code python>
import re

text = "The price is $42.50"

# search - 搜索第一个匹配
match = re.search(r'\$([\d.]+)', text)
if match:
    print(match.group())   # $42.50
    print(match.group(1))  # 42.50

# findall - 查找所有
prices = re.findall(r'\$([\d.]+)', text)

# match - 从开头匹配
result = re.match(r'The', text)

# sub - 替换
new_text = re.sub(r'\$([\d.]+)', r'\1 USD', text)

# split - 分割
parts = re.split(r'\s+', text)

# compile - 编译模式（重复使用）
pattern = re.compile(r'\$([\d.]+)')
matches = pattern.findall(text)
</code>

===== 高级用法 =====

<code python>
# 命名分组
text = "John Doe, 30"
pattern = r'(?P<name>\w+ \w+), (?P<age>\d+)'
match = re.search(pattern, text)
print(match.group('name'))  # John Doe

# 贪婪 vs 非贪婪
re.search(r'<.*>', '<div>content</div>')     # 贪婪，匹配全部
re.search(r'<.*?>', '<div>content</div>')    # 非贪婪，匹配<div>

# 多行模式
text = "line1\nline2"
re.search(r'^line', text, re.MULTILINE)

# 忽略大小写
re.search(r'the', text, re.IGNORECASE)
</code>

===== 本章练习 =====

1. 验证邮箱格式
2. 提取所有URL
3. 解析日志文件
4. 替换敏感词

下一章：[[python_course:chapter27|第二十七章：网络编程]]