Python Regular Expression

25-04-29 1

本文将介绍PythonRegularExpression的详细情况，。我们将通过案例分析、数据研究等多种方式，帮助您更全面地了解这个主题，同时也将涉及一些关于10.RegularExpressionM

本文将介绍Python Regular Expression的详细情况，。我们将通过案例分析、数据研究等多种方式，帮助您更全面地了解这个主题，同时也将涉及一些关于10. Regular Expression Matching、10. Regular Expression Matching (JAVA)、10. Regular Expression Matching [H] 正则表达式匹配、10. Regular Expression Matching(hard)的知识。

本文目录一览：

Python Regular Expression
10. Regular Expression Matching
10. Regular Expression Matching (JAVA)
10. Regular Expression Matching [H] 正则表达式匹配
10. Regular Expression Matching(hard)

Python Regular Expression

1. References

http://docs.python.org/library/re.html#module-re

http://docs.python.org/library/re.html#raw-string-notation

http://docs.python.org/howto/regex.html#regex-howto

"mastering regular expression"

2. an example in bitbake

''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?''

This regular expression matches strings which have form ''xxx_append'' or ''xxx_prepend'' or ''xxx_append_yyy'' or ''xxx_prepend_yyy''.

Let''s take a simple analysis on it.

REGEXP = ''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?'' = ABC? (match AB or ABC)

A = ''(?P<base>.*?)'' matches the same string set with ''.*?'' but the matched substring could be accessed by the identifier base.

B = ''(?P<keyword>_append|_prepend) matches ''_append'' or ''_prepend''.

C = (_(?P<add>.*)) matches string like ''_xxxxxxxxx''

Following the test code for this regular expression the the corresponding output in shell.

#!/usr/bin/env python                                                                                                                                                                                           

# test_regexp.py regexp string                                                                                                                                                                                  
import sys
import re

#pattern = sys.argv[1]                                                                                                                                                                                          
#string = sys.argv[2]                                                                                                                                                                                           

def test(pattern, string):
    result = re.match(pattern, string)
    if result == None:
        print (pattern, string, None)
    else:
        print (pattern, string, result.group(''keyword''), result.group(''add''), result.group(0))

pattern = ''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?''
test(pattern, ''hello_append'')
test(pattern, ''hello'')
test(pattern, ''hello_prepend'')
test(pattern, ''hello_append_add_package'')
test(pattern, ''hello_append_world_package'')

chenqi@chenqi-OptiPlex-760:~/mypro/python$ ./test_regexp.py
(''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?'', ''hello_append'', ''_append'', None, ''hello_append'')
(''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?'', ''hello'', None)
(''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?'', ''hello_prepend'', ''_prepend'', None, ''hello_prepend'')
(''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?'', ''hello_append_add_package'', ''_append'', ''add_package'', ''hello_append_add_package'')
(''(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?'', ''hello_append_world_package'', ''_append'', ''world_package'', ''hello_append_world_package'')

3. a complete list of metacharacters

. ^ $ * + ? { } [ ] \ | ( )

4. predefined special sequences

\d: Matches any decimal digit; this is equivalent to the class [0-9].
\D: Matches any non-digit character; this is equivalent to the class [^0-9].
\s: Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].
\S: Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].
\w: Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].
\W: Matches any non-alphanumeric character; this is equivalent to the class[^a-zA-Z0-9_].

To Be Continue ...

10. Regular Expression Matching

implement regular expression matching with support for ''.'' and ''*''.

''.'' Matches any single character.
''*'' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

public class Solution {
    public boolean isMatch(String s, String p) {
		if (s == null || p == null) {
			return false;
		}
		boolean[][] dp = new boolean[s.length() + 1][p.length() + 1];
		dp[0][0] = true;
		for (int i = 0; i < p.length(); i++) {
			if (p.charAt(i) == ''*'' && dp[0][i - 1]) {
				dp[0][i + 1] = true;
			}
		}    
		for (int i = 0; i < s.length(); i++) {
			for (int j = 0; j < p.length(); j++) {
				if (p.charAt(j) == ''.'') {
					dp[i + 1][j + 1] = dp[i][j];
				}
				if (p.charAt(j) == s.charAt(i)) {
					dp[i + 1][j + 1] = dp[i][j];
				}
				if (p.charAt(j) == ''*'') {
					if (p.charAt(j - 1) != s.charAt(i) && p.charAt(j - 1) != ''.'') { //match 0个的case，前一个字符跟当前不等
						dp[i + 1][j + 1] = dp[i + 1][j - 1];
					} else {
						dp[i + 1][j + 1] = (dp[i + 1][j] || dp[i][j + 1] || dp[i + 1][j - 1]); //match 多个的情况
					}
				}
			}
		}
		return dp[s.length()][p.length()];
    }

}

10. Regular Expression Matching (JAVA)

Given an input string (s) and a pattern (p), implement regular expression matching with support for ''.'' and ''*''.

''.'' Matches any single character.
''*'' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).

Note:

s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.
Example 1:

Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".
Example 2:

Input:
s = "aa"
p = "a*"
Output: true
Explanation: ''*'' means zero or more of the precedeng element, ''a''. Therefore, by repeating ''a'' once, it becomes "aa".
Example 3:

Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".
Example 4:

Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches "aab".
Example 5:

Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

class Solution {
    public boolean isMatch(String s, String p) {
        return recur(s,p,0,0);
    }
    
    public boolean recur(String s, String p, int sPtr, int pPtr) {
        if(s.length() == sPtr && p.length() == pPtr) return true;
        if(p.length() == pPtr) return false;
        if(s.length() == sPtr){
            if(p.length() > pPtr+1 && p.charAt(pPtr+1)==''*'') return recur(s,p,sPtr,pPtr+2);
            else return false;
        }
        
        if(p.length() > pPtr+1 && p.charAt(pPtr+1)==''*''){ //next bit is *
            if(recur(s,p,sPtr,pPtr+2)) return true; //* match 0 element
            else{
                for(int i = 1; sPtr + i <= s.length() && (p.charAt(pPtr)==s.charAt(sPtr+i-1)|| p.charAt(pPtr)==''.''); i++){
                    if(recur(s,p,sPtr+i,pPtr+2)) return true; //* match i elements
                }
            } 
        }
        else if(p.charAt(pPtr)==''.'' || p.charAt(pPtr)==s.charAt(sPtr)) 
            return recur(s, p, sPtr+1, pPtr+1);
        
        return false;
    }
}

当当前字符之后的那个字符是 * 时，我们需要对当前字符做特别判断，所以没次递归中要判断 p 字符串的下一个字符是否是 *

10. Regular Expression Matching [H] 正则表达式匹配

#题目

Given an input string(s) and a pattern(p), implement regular expression matching with support for ''.'' and ''''. ''.'' Matches any single character. '''' Matches zero or more of the preceding element.

The matching should cover the entire input string(not partial). Note： s could be empty and contains only lowercase letters a-z. p could be empty and contains only lowercase letters a-z, and characters . or * .

Example1： Input：s = "aa" p="a" Output：false Explanation："a" does not match the entire string "a" Example2： Input：s = "aa" p="a*" Output：true Explanation：".*" means zero or more of the precedeng element, ''a''. Therefore, by repeating ''a'' once, it become "a".

Example3： Input：s = "ab" p=".*" Output：true Explanation："." means " zero or more () of any character (.) " . #思路

### 动态规划 #####Step1. 刻画一个最优解的结构特征 $dp [i][j]$ 表示 $s [0,\cdots,i-1]$ 与 $p [0,\cdots,j-1]$ 是否匹配 #####Step2. 递归定义最优解的值 1.$p [j-1] == s [i-1]$，则状态保存，$dp [i][j] = dp [i-1][j-1]$ 2.$p [j-1] ==$ .，. 与任意单个字符匹配，于是状态保存，$dp [i][j] = dp [i-1][j-1]$ 3.$p [j-1] == $*，* 只能以 X* 的形式才能匹配，但是由于 * 究竟作为几个字符匹配不确定，此时有两种情况：

$p [j-2] != s [i-1]$，此时 $s [0,\cdots,i-1]$ 与 $p [0,\cdots,j-3]$ 匹配，即 $dp [i][j] = dp [i][j-2]$
$p [j-2] == s [i-1]$ 或者 $p [j-2] == $ .，此时应分为三种情况： * 作为零个字符，$dp [i][j] = dp [i][j-2]$ * 作为一个字符，$dp [i][j] = dp [i][j-1]$ * 作为多个字符，$dp [i][j] = dp [i-1][j]$

#####Step3. 计算最优解的值根据状态转移表，以及递推公式，计算 dp [i][j]

#Tips

### 数组初始化（python） #####（1）相同的值初始化 (一维数组)

#方法一：list1 = [a a a ]
list1 = [ a for i in range(3)]
#方法二:
list1 = [a] * 3

#####（2）二维数组初始化初始化一个 $4*3$ 每项固定为 0 的数组

list2 = [ [0 for i in range(3)] for j in range(4)]

#C++

class Solution {
public:
    bool isMatch(string s, string p) {
        int m = s.length(),n = p.length();
        bool dp[m+1][n+1];
        dp[0][0] = true;
        //初始化第0行,除了[0][0]全为false，因为空串p只能匹配空串，其他都无能匹配
        for (int i = 1; i <= m; i++)
            dp[i][0] = false;
        //初始化第0列，只有X*能匹配空串
        for (int j = 1; j <= n; j++)
            dp[0][j] = j > 1 && ''*'' == p[j - 1] && dp[0][j - 2];
        for (int i = 1; i <= m; i++)
        {
           for (int j = 1; j <= n; j++)
           {
               if (p[j - 1] == ''*'')
               {
                   dp[i][j] = dp[i][j - 2] || (s[i - 1] == p[j - 2] || p[j - 2] == ''.'') && dp[i - 1][j];
               }
               else //只有当前字符完全匹配，才能传递dp[i-1][j-1] 值
               {
                   dp[i][j] = (p[j - 1] == ''.'' || s[i - 1] == p[j - 1]) && dp[i - 1][j - 1];
               }
           }
        }
        return dp[m][n];
    }
};

#Python

def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        len_s = len(s)
        len_p = len(p)
        dp = [[False for i in range(len_p+1)]for j in range(len_s+1)]
        dp[0][0] = True
        for i in range(1, len_p + 1):
            dp [0][i] = i>1 and dp[0][i - 2] and p[i-1] == ''*''
        for i in range (1, len_s + 1 ):
            for j in range(1, len_p + 1):
                if p[j - 1] == ''*'':
                    #状态保留
                    dp[i][j] = dp[i][j -2] or (s[i-1] == p[j-2] or p[j-2] == ''.'') and dp[i-1][j]
                else:
                    dp[i][j] = (p[j-1] == ''.'' or s[i-1] == p[j-1]) and dp[i-1][j-1]
        return dp[len_s][len_p]

10. Regular Expression Matching(hard)

10. Regular Expression Matching

题目

Implement regular expression matching with support for ''.'' and ''*''.

''.'' Matches any single character.
''*'' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

解析

动态规划的问题一直都是难点，需要找出状态转移方程！！！

如果“*”不好判断，那我大不了就来个暴力的算法，把“*”的所有可能性都测试一遍看是否有满足的，用两个指针i,j来表明当前s和p的字符。
我们采用从后往前匹配，为什么这么匹配，因为如果我们从前往后匹配，每个字符我们都得判断是否后面跟着“*”，而且还要考虑越界的问题。但是从后往前没这个问题，一旦遇到“*”，前面必然有个字符。

    如果j遇到”*”，我们判断s[i] 和 p[j-1]是否相同，
        如果相同我们可以先尝试匹配掉s的这个字符，i–，然后看之后能不能满足条件，满足条件，太棒了！我们就结束了，如果中间出现了一个不满足的情况，马上回溯到不匹配这个字符的状态。
        不管相同不相同，都不匹配s的这个字符，j-=2 (跳过“*”前面的字符)

if(p[j-1] == ''.'' || p[j-1] == s[i])
    if(myMatch(s,i-1,p,j))
        return true;
    return myMatch(s,i,p,j-2);

    如果j遇到的不是“*”，那么我们就直接看s[i]和p[j]是否相等，不相等就说明错了，返回。

    if(p[j] == ''.'' || p[j] == s[i])
         return myMatch(s,i-1,p,j-1);
    else return false;

    再考虑退出的情况
        如果j已经<0了说明p已经匹配完了，这时候，如果s匹配完了，说明正确，如果s没匹配完，说明错误。
        如果i已经<0了说明s已经匹配完，这时候，s可以没匹配完，只要它还有”*”存在，我们继续执行代码。

// 10. Regular Expression Matching
class Solution_10 {
public:
	/*
	动态规划
	如果 p[j] == str[i] || pattern[j] == ''.'', 此时dp[i][j] = dp[i-1][j-1];
	如果 p[j] == ''*''
	分两种情况:
	1: 如果p[j-1] != str[i] && p[j-1] != ''.'', 此时dp[i][j] = dp[i][j-2] //*前面字符匹配0次
	2: 如果p[j-1] == str[i] || p[j-1] == ''.''
	此时dp[i][j] = dp[i][j-2] // *前面字符匹配0次
	或者 dp[i][j] = dp[i][j-1] // *前面字符匹配1次
	或者 dp[i][j] = dp[i-1][j] // *前面字符匹配多次
	*/
	bool isMatch(string s, string p) { //p去匹配s

		vector<vector<bool> > dp(s.size() + 1, vector<bool>(p.size() + 1, false));

		dp[0][0] = true; // 空串匹配空串
		//第一列空串p去匹配，为false
		//第一行非空串p去匹配空串s;只要p中有*，就可以匹配
		for (int i = 1; i < dp[0].size(); i++)
		{
			if (p[i - 1] == ''*'')
			{
				dp[0][i] = i>1 && dp[0][i - 2];
			}	
		}
		for (int i = 1; i <= s.size();i++)
		{
			for (int j = 1; j <= p.size();j++)
			{
				if (s[i-1]==p[j-1]||p[j-1]==''.'') //直接匹配成功
				{
					dp[i][j] = dp[i - 1][j - 1];
				}
				else if (p[j-1]==''*'')
				{
					if (s[i-1]!=p[j-2]&&p[j-2]!=''.'') //匹配*前面的字符0次,跳过当前p
					{
						dp[i][j] = dp[i][j-2];
					}
					else
					{
					    //*前面字符匹配1次 || *前面字符匹配0次 || *前面字符匹配多次
						dp[i][j] = dp[i][j - 1] || dp[i][j - 2] || dp[i - 1][j];
					}
				}
			}
		}

		return dp[s.size()][p.size()];
	}
};

题目来源

《LeetBook》leetcode 题解 (10): Regular Expression Matching——DP 解决正则匹配
10. Regular Expression Matching

今天的关于Python Regular Expression的分享已经结束，谢谢您的关注，如果想了解更多关于10. Regular Expression Matching、10. Regular Expression Matching (JAVA)、10. Regular Expression Matching [H] 正则表达式匹配、10. Regular Expression Matching(hard)的相关知识，请在本站进行查询。

本文标签：