同样的正则表达式,为什VBScript Regular Expressions1.0和RegularExpressions.Regex结果不同?
下面是用正则表达式提取邮箱地址的代码,完全一样的字符文本,完全一样正则表达式(都是"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"),为什使用VBScript Regular Expressions1.0和System.Text.RegularExpressions.Regex 提取出来的邮件地址不一样?用System.Text.RegularExpressions.Regex提取出来的会有中文字符的,如"我的邮箱@163.com"这样的地址都提取出来(而实际是显然不会有这样的邮箱).
我本来也想既然你能成功那就使用VBScript Regular Expressions1.0就算了,但是它在vb.net里面还得拖着个dll文件,不方便.而System.Text.RegularExpressions.Regex不用另外拖着个文件.有没什么办法让不要提取中文的出来呢?是不是System.Text.RegularExpressions.RegexOptions.IgnoreCase这里可以怎么改?
Thank you!
用System.Text.RegularExpressions.Regex的代码:
- VB.NET code
Function getmail(ByRef htm As Object) As Object Dim strRegex As String = "\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*" '"http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?" '这就是url表达式 Dim RetStr As Object Dim r As System.Text.RegularExpressions.Regex Dim m As System.Text.RegularExpressions.MatchCollection r = New System.Text.RegularExpressions.Regex(strRegex, System.Text.RegularExpressions.RegexOptions.IgnoreCase) m = r.Matches(htm) Dim i As Integer For i = 0 To m.Count - 1 RetStr = RetStr & m(i).Value & vbCrLf Next i getmail = RetStr End Function
用VBScript Regular Expressions1.0的代码:
- VB.NET code
Function getmail(ByRef htm As Object) As Object Dim Match As Object Dim Matches, regEx, RetStr As Object regEx = New VBScript_RegExp_10.RegExp regEx.Pattern = "\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*" regEx.IgnoreCase = True regEx.Global = True Matches = regEx.Execute(htm) For Each Match In Matches RetStr = RetStr & Match.Value & vbCrLf Next Match getmail = RetStr End Function
[解决办法]
- VB code
单独编程检测中文Dim s As stingDim i as LongDim L as Longs="ThisIs我的邮箱@163.com"L=len(s)for i=1 to L if asc(mid(s,i,1))<0 then exit fornextif i<=L then 's中有中文else 's中没有有中文endif