读书人

正则婚配不到值的原因

发布时间: 2013-07-08 14:13:00 作者: rapoo

正则匹配不到值的原因
string reg = @"(?is)<a[^>]*?href=([''""])(?<url>[^""]*?/([a-z]|(\d+?)|[a-z](\d+?))\.(html|shtml|htm))([''""])[^>]*?>(?<Content>.*?)</a>";


Regex Urlreg = new Regex(reg, RegexOptions.Multiline | RegexOptions.Singleline);
MatchCollection matchUrlList = Urlreg.Matches(htmldata);


<a href='http://sports.sohu.com/20130522/n376775797.shtml' target='_blank'>幻灯:足协杯绿城vs武汉宏兴 村队表现值得关注</a>


为什么这个正则匹配不到URL和标题呢?? 正则
[解决办法]

XMLHTTP xmlhttp = new XMLHTTPClass();
xmlhttp.open("get", @"http://sports.sohu.com/s2013/2013zuxiebei/", false, null, null);
xmlhttp.send("");
while (xmlhttp.readyState != 4) Thread.Sleep(1);

string htmldata = Encoding.GetEncoding("GBK").GetString((byte[])xmlhttp.responseBody);

string reg = @"(?is)<a[^>]*?href=(['""])(?<url>[^""]*?/([a-z]
[解决办法]
(\d+?)
[解决办法]
[a-z](\d+?))\.(html
[解决办法]
shtml
[解决办法]
htm))(['""])[^>]*?>(<(img\s[^<>]+
[解决办法]
BR)>)*(?<Content>((?!</a>)[^<>])+)</a>";


Regex Urlreg = new Regex(reg, RegexOptions.Compiled);
MatchCollection matchUrlList = Urlreg.Matches(htmldata);

richTextBox1.Clear();

StringBuilder builder = new StringBuilder();
foreach (Match m in matchUrlList)
{
builder.AppendLine(m.Groups["Content"].Value);
}

richTextBox1.Text = builder.ToString();


需要添加COM引用:Microsoft XML, v2.6

读书人网 >C#

热点推荐