Regular expressions (a) - character matches Raiders

Regular expressions are pattern matching, or matching character, or matching location.
However, learning about how the regular matching characters, more messy, metacharacters too much, it seems there is no systematic, bad record.
Now summarized as follows:

两种模糊匹配
字符组
量词
分支结构
案例分析

1 two fuzzy matching

Fuzzy matching lateral 1.1

定义:横向模糊指的是,一个正则可匹配的字符串的长度不是固定的,可以是多种情况的。
表示:{m,n},表示连续出现最少m次,最多n次。
例子:比如正则 /ab{2,5}c/ 表示匹配这样一个字符串:第一个字符是 "a",接下来是 2 到 5 个字符 "b",最后是字符 "c"。

Fuzzy matching longitudinal 1.2

定义:纵向模糊指的是,一个正则匹配的字符串,具体到某一位字符时,它可以不是某个确定的字符,可以有多种 可能。
表示:[abc],表示该字符可以是字符"a"、"b"、"c"中的任何一个。
例子:比如 /a[123]b/ 可以匹配如下三种字符串: "a1b"、"a2b"、"a3b"

2, character set

It is emphasized that, although the group called character (character classes), but only one character.
For example, [ABC], which matches a character, it can be "a", "b", "c" one.

2.1 Scope Representation

使用场景:字符组里的字符特别多时,使用范围表示法。比如 [123456abcdefGHIJKLM],可以写成 [1-6a-fG-M]。
表示:用连字符 - 来省略和简写。
注意:因为连字符有特殊用途,那么要匹配 "a"、"-"、"z" 这三者中任意一个字符,
不能写成 [a-z],因为其表示小写字符中的任何一个字符。
可以写成如下的方式:[-az] 或 [az-] 或 [a\-z]。
即要么放在开头,要么放在结尾,要么转义。总之不会让引擎认为是范围表示法就行了。

2.2 excluded character set

纵向模糊匹配,还有一种情形就是,某位字符可以是任何东西,但就不能是 "a"、"b"、"c"。
此时就是排除字符组(反义字符组)的概念。例如 [^abc],表示是一个除 "a"、"b"、"c"之外的任意一个字 符。字符组的第一位放 ^(脱字符),表示求反的概念。

2.3 common shorthand

Character Group Specific meaning
\d Represent [0-9]. He represents a digit.
Memory way: their English is digit (numeric).
\D Represent [^ 0-9]. It represents any character except a digit.
\w Represents [0-9a-zA-Z_]. Represent numbers, uppercase and lowercase letters and underlined.
Memory Mode: w is the word shorthand, also known as word characters.
\W Represents [^ 0-9a-zA-Z_]. Non-word character.
\s Represents [\ t \ v \ n \ r \ f]. It represents a whitespace, including spaces, horizontal tab, vertical tab, line feed, carriage return, page break.
Memory mode: s is the space of the first letter of the word white space is white space.
\S Represents [^ \ t \ v \ n \ r \ f]. Non-whitespace characters.
. It represents [^ \ n \ r \ u2028 \ u2029]. Wildcard meaning almost any character. Line feed, carriage return, line separators and separator sections except character.
Memory way: Think of each point in the ellipsis ..., it can be understood as a placeholder to indicate any similar things.

Guess you like

Origin www.cnblogs.com/xsnow/p/11712462.html