正则表达式提取网页内容
function jumptoie5()
{
if(event.srcElement.className== "menuitems ")
{
if(event.srcElement.target!=null)
window.open(event.srcElement.url)
else
window.location=event.srcElement.url
}
}
</script>
<div id= "ie5menu " class= "skin0 " onMouseover= "highlightie5() " onMouseout= "lowlightie5() " onClick= "jumptoie5() ">
<table width=100% border=0 cellPadding=1 cellSpacing=1>
<tr>
<td class=xt valign=top style= "line-height:180% ">
<div align=left style= "padding-left:15;padding-top:3 " class= "menuitems " url= "javascript:document.location=prevpage "> 上一页 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "javascript:document.location=nextpage "> 下一页 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "readbook.asp?bl_id=95133 "> 回书目 </div>
<hr>
<div align=left style= "padding-left:15; " class= "menuitems " url= "javascript:location.href=history.go(0) "> 刷新 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "javascript:window.print() "> 打印 </div>
<hr>
<div align=left style= "padding-left:15; " class= "menuitems " url= "mybook_addmybooksq.asp?a_id=95133&b_id=2533771 " target= "_blank "> 加入书签书架 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "mybook_votebook.asp?a_id=&b_id=95133 " target= "_blank "> 推荐本书 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "showbook.asp?bl_id=95133 "> 返回书页 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "index.asp "> 返回首页 </div>
<hr>
<div align=left style= "padding-left:15; " class= "menuitems " url= "showbook_review.asp?id=95133&qd=0&name=风流道士闯江湖 " target=_blank> 发表书评 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "/newpay/a_vip_pay.asp " target=_blank> 成为VIP会员 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "http://author1.cmfu.com/email/onlinemail.asp?bookname=风流道士闯江湖&bookid=95133 " target=_blank> 向朋友推荐本书 </div>
<div align=left style= "padding-left:15; " class= "menuitems " url= "mybook/addremindbook.asp?bookid=95133&bookname=风流道士闯江湖 " target=_blank> 更新提醒 </div>
</td>
</tr>
</table>
</div>
<script language=javascript>
if(document.all&&window.print) //判断必须是IE5.5或以上
{
//以下设置自定义菜单的格式表
if(menuskin==0)
ie5menu.className= "skin0 "
else
ie5menu.className= "skin1 "
document.oncontextmenu=showmenuie5
document.body.onclick=hidemenuie5
}
</script>
<BODY leftMargin=5 topMargin=0 onLoad= "this.focus(); " bgcolor=#E7F4FE>
<table border=0 cellPadding=0 cellSpacing=0 width=95% valign=top align=center>
<tr>
<td width=180 class=zt style= "padding-top:15;border-bottom:1px green solid "> 读书在起点原创无极限 </td>
<td class=zt align=right style= "padding-top:15;border-bottom:1px green solid "> 『 <a href= "http://down1.cmfu.com/bookall/95133.htm " target=_blank> 全文阅读 </a> <a href= "mybook_addmybooksq.asp?a_id=95133&b_id=2533771 " target=_blank> <font color=red> 加入书架书签 </font> </a> <a href= "mybook_votebook.asp?a_id=&b_id=95133 " target=_blank> 推荐本书 </a> <a href= "mybook_bookcase.asp " target=_blank> 打开书架 </a> <a href= "readbook.asp?bl_id=95133 " target=_top> 返回书目 </a> <a href= "showbook.asp?bl_id=95133 " target=_top> 返回书页 </a> 』 </td>
</tr>
</table>
<table border=0 cellPadding=0 cellSpacing=0 width=95% valign=top align=center>
<tr>
<td align=center width=100%>
<div style= "overflow:hidden;height:65px;width:940px;padding-top:5px ">
<script src= "/Count/ReadPage_up_turn.js "> </script>
</div>
</td>
</tr>
</table>
<br>
<table border=0 cellPadding=0 cellSpacing=0 width=95% valign=top align=center>
<tr>
<td align=center> <br>
<p align=center style= "FONT-SIZE:18pt;color:#990000;font-family:楷体_GB2312 "> <b> 第一卷 莫名其妙 第二章 遭遇怪兽 </b> </p>
<div align=left style= 'font-size:10.5pt;color:black;line-height:180%;padding-left:10;padding-right:10 '> <script src= 'http://newauthor2.cmfu.com/books/95133/2533771.txt '> </script>
<br>
如何将这句“http://newauthor2.cmfu.com/books/95133/2533771.txt”话提出来
[解决办法]
Regex reg = new Regex(@ "\w+(> <script)(?url)( </script> ) ");
Match match = reg.Match(yourstring);
string url = match.Groups[ "url "].Value;
[解决办法]
try..
string str = "..... <script src=http://newauthor2.cmfu.com/books/95133/2533771.txt> </script> ...... ";
Match m = Regex.Match(str, @ " <script\s+src=(? <url> [\S\s]+)> </script> ", RegexOptions.IgnoreCase);
if (m.Success)
{
Console.WriteLine(m.Groups [ "url "].Value );
}
输出:
http://newauthor2.cmfu.com/books/95133/2533771.txt
[解决办法]
string yourStr = ..............;
string resultStr = Regex.Match(yourStr,@ " <script\s+src= '([\S\s]+?) '> </script> ",RegexOptions.IgnoreCase).Groups[1].Value; //要提取的内容
这样虽然能把楼主要的内容提取出来,但并不是通用的表达式,很可能只适用于这一个实例,楼主如果只是想取这一个,或者形式与这个完全相同的网页内容当然无所谓,但是如果是多个网页,而 <script src= 'http://newauthor2.cmfu.com/books/95133/2533771.txt '> </script> 此处出现的情况不尽相同,最好能说明一下限定条件