python和JavaScript的正则表达式详细使用对比

雅嵐發表於 2024-5-13 08:28:53

python和JavaScript的正则表达式详细使用对比

<div id="navCategory"><h5 class="catalogue">目录</h5><ul class="first_class_ul"><li><a href="#_label0">前言</a></li><li><a href="#_label1">1 正则表达式的构造和使用</a></li><li><a href="#_label2">2 正则表达式的实例方法（仅JavaScript有）</a></li><ul class="second_class_ul"><li><a href="#_lab2_2_0">2.1. exec()</a></li><li><a href="#_lab2_2_1">2.2. test()</a></li><li><a href="#_lab2_2_2">2.3. compile()</a></li></ul><li><a href="#_label3">3 正则表达式的属性（仅JavaScript有）</a></li><ul class="second_class_ul"><li><a href="#_lab2_3_3">3.1 实例属性</a></li><li><a href="#_lab2_3_4">3.2 静态属性</a></li></ul><li><a href="#_label4">总结 </a></li><ul class="second_class_ul"></ul></ul></div><p class="maodian"><a name="_label0"></a></p><h2>前言</h2>
<p>正则表达式在 Python 和 JavaScript 中都是一种强大的工具，用于匹配、搜索和操作字符串。尽管它们在基本语法上相似，但也存在一些差异。以下是 Python 和 JavaScript 在正则表达式的构造和使用上的主要比较：</p>
<p class="maodian"><a name="_label1"></a></p><h2>1 正则表达式的构造和使用</h2>
<table><tbody><tr><th><p>特性</p></th><th><p>Python</p></th><th><p>JavaScript</p></th></tr><tr><td><p>导入库</p></td><td><p>使用 re 模块</p></td><td><p>无需导入，直接内置</p></td></tr><tr><td><p>定义正则表达式</p></td><td><p>re.compile(r"pattern")</p></td><td><p>/pattern/flags 或 new RegExp("pattern", "flags")</p></td></tr><tr><td><p>匹配全部</p></td><td><p>re.findall(pattern, string)</p></td><td><p>string.match(/pattern/g) 或 string.matchAll(/pattern/g)</p></td></tr><tr><td><p>搜索（找到第一个匹配项）</p></td><td><p>re.search(pattern, string)</p></td><td><p>string.match(/pattern/) 或 string.search(/pattern/)<br /><br />search 仅返回第一个匹配的位置。</p></td></tr><tr><td><p>替换</p></td><td><p>re.sub(pattern, repl, string)</p></td><td><p>string.replace(/pattern/g, repl)</p></td></tr><tr><td><p>分割字符串</p></td><td><p>re.split(pattern, string)</p></td><td><p>string.split(/pattern/)</p></td></tr><tr><td><p>忽略大小写标志</p></td><td><p>re.IGNORECASE 或 'i'</p></td><td><p>'i'</p></td></tr><tr><td><p>多行匹配标志</p></td><td><p>re.MULTILINE 或 'm'</p></td><td><p>'m'</p></td></tr><tr><td><p>点号匹配任意字符（包括换行符）</p></td><td><p>re.DOTALL 或 's'</p></td><td><p>无直接等价，可使用[^]来匹配任意字符包括换行</p></td></tr><tr><td><p>Unicode匹配</p></td><td><p>re.UNICODE 或 'u'</p></td><td><p>使用'u'标志</p></td></tr></tbody></table>
<p>下面分别用 Python 和 JavaScript 的示例代码展示正则表达式的常用操作，包括匹配、搜索、分割、替换、量词、正向声明、反向声明、表达式分组和子表达式引用</p>
<div class="jb51code"><pre class="brush:py;">import re

text = "The quick brown fox jumps over the lazy dog 123. Windows 2000 and XP Windows. test test."

# 匹配
match = re.search(r'\bfox\b', text)
if match:
print("Match found:", match.group())# 输出 'fox'

# 搜索（使用量词和表达式分组）
search_result = re.findall(r'(\b\w{4}\b)', text)
print("Search results:", search_result)# 输出 ['quick', 'jumps', 'over', 'lazy']

# 分割
split_result = re.split(r'\s', text)
print("Split results:", split_result)

# 替换（使用子表达式引用）
replace_result = re.sub(r'(\w+) (\w+)', r'\2 \1', text)
print("Replace results:", replace_result)

# 正向和反向声明（Lookahead and Lookbehind）
lookahead = re.search(r'Windows(?= 2000)', text)
if lookahead:
print("Lookahead found:", lookahead.group())# 输出 'Windows'

lookbehind = re.search(r'(?<=XP )Windows', text)
if lookbehind:
print("Lookbehind found:", lookbehind.group())# 输出 'Windows'

# 子表达式引用
repeat_word = re.search(r'(\b\w+\b) \1', text)
if repeat_word:
print("Repeat word found:", repeat_word.group())# 输出 'test test'

# 表达式分组使用
grouped = re.search(r'(\b\w+\b) over the (\b\w+\b)', text)
if grouped:
print("Words over:", grouped.groups())# 输出 ('jumps', 'lazy')
</pre></div>
<div class="jb51code"><pre class="brush:js;">let text = "The quick brown fox jumps over the lazy dog 123. Windows 2000 and XP Windows. test test.";

// 匹配
let match = text.match(/fox/);
if (match) {
console.log("Match found:", match);// 输出 'fox'
}

// 搜索（使用量词和表达式分组）
let searchResult = text.match(/\b\w{4}\b/g);
console.log("Search results:", searchResult);// 输出 ['quick', 'jumps', 'over', 'lazy']

// 分割
let splitResult = text.split(/\s/);
console.log("Split results:", splitResult);

// 替换（使用子表达式引用）
let replaceResult = text.replace(/(\w+) (\w+)/g, '$2 $1');
console.log("Replace results:", replaceResult);

// 正向和反向声明（Lookahead and Lookbehind）
let lookahead = text.match(/Windows(?= 2000)/);
if (lookahead) {
console.log("Lookahead found:", lookahead);// 输出 'Windows'
}

let lookbehind = text.match(/(?<=XP )Windows/);
if (lookbehind) {
console.log("Lookbehind found:", lookbehind);// 输出 'Windows'

// 子表达式引用
let repeatWord = text.match(/(\b\w+\b) \1/);
if (repeatWord) {
console.log("Repeat word found:", repeatWord);// 输出 'test test'
}

// 表达式分组使用
let grouped = text.match(/(\b\w+\b) over the (\b\w+\b)/);
if (grouped) {
console.log("Words over:", grouped, grouped);// 输出 'jumps', 'lazy'
}
</pre></div>
<p class="maodian"><a name="_label2"></a></p><h2>2 正则表达式的实例方法（仅JavaScript有）</h2>
<p class="maodian"><a name="_lab2_2_0"></a></p><h3>2.1. exec()</h3>
<ul><li><p>描述: 执行对字符串的搜索匹配，并返回一个结果数组或 null。如果正则表达式包含了全局标志 (g)，每次调用 exec() 将从正则表达式的 lastIndex 属性指定的位置开始搜索下一个匹配。</p></li><li><p>返回值: 返回一个数组，其中第 0 个元素是匹配的完整字符串，后续元素是匹配的捕获组（如果有）。如果没有找到匹配，则返回 null。</p></li><li><p>示例:</p></li></ul>
<div class="jb51code"><pre class="brush:js;">const regex = /(\w+)\s/g;
const text = "hello world";
let match;

while ((match = regex.exec(text)) !== null) {
console.log(`Found ${match}, next starts at ${regex.lastIndex}.`);
// 输出: Found hello , next starts at 6
//    Found world, next starts at 11
}
</pre></div>
<p class="maodian"><a name="_lab2_2_1"></a></p><h3>2.2. test()</h3>
<ul><li><p>描述: 测试字符串是否匹配正则表达式的模式。</p></li><li><p>返回值: 如果找到匹配则返回 true，否则返回 false。</p></li><li><p>示例:</p></li></ul>
<div class="jb51code"><pre class="brush:js;">const regex = /hello/;
const text = "hello world";
const result = regex.test(text);// 返回 true
console.log(result);
</pre></div>
<p class="maodian"><a name="_lab2_2_2"></a></p><h3>2.3. compile()</h3>
<ul><li><p>描述: 重新编译正则表达式。建议避免使用它，直接创建新的正则表达式实例更为合适和安全。</p></li></ul>
<p class="maodian"><a name="_label3"></a></p><h2>3 正则表达式的属性（仅JavaScript有）</h2>
<p class="maodian"><a name="_lab2_3_3"></a></p><h3>3.1 实例属性</h3>
<p>实例属性是绑定到正则表达式实例上的属性。它们提供有关特定正则表达式对象的信息，每个实例的这些属性都是独立的。常见的实例属性包括：</p>
<ul><li><p><strong>source</strong>：</p>
<ul><li>描述：正则表达式的源文本字符串。</li><li>用途：允许查看创建正则表达式时使用的确切模式。</li></ul></li><li><p><strong>flags</strong>：</p>
<ul><li>描述：标明正则表达式使用的修饰符（如 <code>g</code>, <code>i</code>, <code>m</code> 等）。</li><li>用途：快速查看正则表达式对象应用的全局规则和配置。</li></ul></li><li><p><strong>lastIndex</strong>：</p>
<ul><li>描述：下一次匹配开始的字符位置，仅在正则表达式使用全局标志 <code>g</code> 或粘连标志 <code>y</code> 时有效。</li><li>用途：在进行多次匹配时，控制或查询下次匹配的起始位置。</li></ul></li><li><p><strong>global</strong>, <strong>ignoreCase</strong>, <strong>multiline</strong>, <strong>dotAll</strong>, <strong>unicode</strong>, <strong>sticky</strong>：</p>
<ul><li>描述：这些布尔值属性反映了相应的修饰符是否被应用于正则表达式。</li><li>用途：提供对正则表达式行为详细了解的快速方式。</li></ul></li></ul>
<div class="jb51code"><pre class="brush:js;">// 定义一个正则表达式对象，包含多个修饰符
let regex = new RegExp('foo', 'gim');

// 实例属性
console.log("Source:", regex.source);       // 输出: foo
console.log("Flags:", regex.flags);       // 输出: gim
console.log("Global:", regex.global);       // 输出: true
console.log("Ignore Case:", regex.ignoreCase); // 输出: true
console.log("Multiline:", regex.multiline); // 输出: true

// 使用正则表达式进行匹配
let text = "Foo bar foo";
let match;
while ((match = regex.exec(text)) !== null) {
console.log(`Found '${match}' at index ${match.index}`);
console.log("LastIndex after match:", regex.lastIndex); // 显示匹配后的 lastIndex
}
</pre></div>
<p class="maodian"><a name="_lab2_3_4"></a></p><h3>3.2 静态属性</h3>
<p>静态属性与特定的 <code>RegExp</code> 对象无关，而是与 <code>RegExp</code> 构造函数本身关联。这些属性主要用于存储有关最近一次正则表达式操作的全局信息。静态属性的值会在正则表达式操作后更新，并且可以在不同的匹配和搜索操作之间共享。常见的静态属性包括：</p>
<ul><li><p><strong>RegExp.input</strong> (<code>$_</code>)：</p>
<ul><li>描述：存储最近一次被匹配的完整字符串。</li><li>用途：可以快速查看或再次处理上一次匹配的字符串。</li></ul></li><li><p><strong>RegExp.lastMatch</strong> (<code>$&</code>)：</p>
<ul><li>描述：存储最近一次成功匹配的整个字符串。</li><li>用途：用于引用上一次匹配的结果。</li></ul></li><li><p><strong>RegExp.lastParen</strong> (<code>$+</code>)：</p>
<ul><li>描述：存储最近一次匹配的最后一个捕获组。</li><li>用途：在需要动态访问最后一个捕获组时非常有用。</li></ul></li><li><p><strong>RegExp.leftContext</strong> (<code>$```) 和 **RegExp.rightContext** (</code>$'`)：</p>
<ul><li>描述：分别存储在最近一次匹配之前和之后的字符串部分。</li><li>用途：允许访问与匹配相关的上下文信息。</li></ul></li><li><p><strong>RegExp.$1</strong> 到 <strong>RegExp.$9</strong>：</p>
<ul><li>描述：存储最近一次匹配的第1到第9个捕获组。</li><li>用途：快速访问最近一次匹配中的特定捕获组。</li></ul></li></ul>
<div class="jb51code"><pre class="brush:js;">let text = "Example text with 'term' and another 'term'.";
let regex = /'term'/g; // 全局搜索 'term'

// 进行匹配
regex.exec(text);
regex.exec(text);

// 静态属性
console.log("Last Match:", RegExp.lastMatch); // 输出: 'term'
console.log("Last Paren:", RegExp.lastParen); // 输出: '', 没有捕获组
console.log("Left Context:", RegExp.leftContext); // 输出: Example text with 'term' and another
console.log("Right Context:", RegExp.rightContext); // 输出: '.

// 匹配多次后检查静态属性
console.log("Input:", RegExp.input); // 输出: Example text with 'term' and another 'term'.
</pre></div>
<p style="text-align:center"><img alt="" src="https://img.jbzj.com/file_images/article/202405/202405130824501.png" /></p>
<p class="maodian"><a name="_label4"></a></p><h2>总结 </h2>

頁: [1]

圆梦公社's Archiver

python和JavaScript的正则表达式详细使用对比