读书人

正则提取 Html页码内容解决思路

发布时间: 2013-04-20 19:43:01 作者: rapoo

正则提取 Html页码内容
<div id='box_9_0'


href="http://baike.baidu.com/view/6590.htm?fromId=756347#3" target="_blank" title="">设计目标
</a> | <a onclick="reportUrl(this,'1','1');st_get(this,'w.2.10.2',1,2,2);"
href="http://baike.baidu.com/view/6590.htm?fromId=756347#4" target="_blank" title="">语言结构
</a></span>
</p>
</div>
<div class="result_summary">
<div class="url">
<cite>内容3 2012-1-12</cite></div>
<div class="sp">
<span class="line">-</span><span class="summaryshare" id="sws_9_0"><span class="yl1"
onfocus="blur();">sss</span></span><span class="line2">-</span><span class="preview"
id="pws_9_0"><span class="iPre" onfocus="blur();"><span class="iPreBox"><em class="iPreArr"></em></span></span></span></div>


</div>
</div>
<div alt="正则提取 Html页码内容解决思路" /> HTML 正则
[解决办法]


string pattern = @"(?is)<div\s*id='box_9_0'[^>]*?class=""selected boxGoogleList""[^>]*?>.*?<a\s*href=""(?<href>[^""]*?)""\s*class=""tt tu""[^>]*?>(?<txt1>.*?)</a>.*?<p\s*class=""ds"">(?<txt2>.*?)</p>.*?<div\s*class=""url"">\s*<cite>(?<txt3>.*?)</cite>";
string htmlsource = File.ReadAllText(@"C:\1.txt", Encoding.GetEncoding("GB2312"));

Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["href"].Value);
Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["txt1"].Value);
Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["txt2"].Value);
Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["txt3"].Value);

[解决办法]
木有规则,木有正则
[解决办法]
引用:
内容3里,只要里面的时间。谢谢


string pattern = @"(?is)<div\s*id='box_9_0'[^>]*?class=""selected boxGoogleList""[^>]*?>.*?<a\s*href=""(?<href>[^""]*?)""\s*class=""tt tu""[^>]*?>(?<txt1>.*?)</a>.*?<p\s*class=""ds"">(?<txt2>.*?)</p>.*?<div\s*class=""url"">\s*<cite>内容3\s*(?<txt3>.*?)</cite>";
string htmlsource = File.ReadAllText(@"C:\1.txt", Encoding.GetEncoding("GB2312"));

Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["href"].Value);
Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["txt1"].Value);
Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["txt2"].Value);
Console.WriteLine(Regex.Match(htmlsource, pattern).Groups["txt3"].Value);

读书人网 >asp.net

热点推荐