读书人

求正则表达式清除网页标签中的内容解

发布时间: 2012-01-18 00:23:26 作者: rapoo

求正则表达式,清除网页标签中的内容
问题1:
<table border= "2 "> 转换成: <table>
问题2:
<td colspan= "2 " align= "center "> 转换成: <td colspan= "2 ">
问题3:
<col width=72> 转换成:空

三个问题一个问题30分。 其他10分给顶贴的人。
问题解决。就给分。

[解决办法]
1和3都可以用正则来做:


Regex regex = new Regex( "\ <table.*?\> ", RegexOptions.IgnoreCase );
regex.Replace( str, " <table> " );

[解决办法]
Regex regex = new Regex( "\ <table.*?\> ", RegexOptions.IgnoreCase );
==>
Regex regex = new Regex( @ "\ <table.*?\> ", RegexOptions.IgnoreCase );

[解决办法]
To:问题1,如果只是想去掉border属性,可以这样:

string str = " <table border=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ ")[\\s\\S]*> ");
str = str.Replace(m.Groups[1].Value, " ");
输出: <table align= "center ">


如果想去掉所有的属性可以这样:
string str = " <table border=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ "[\\s\\S]*)> ");
str = str.Replace(m.Groups[1].Value, " ");

输出: <table >
[解决办法]
To:问题二

string str = " <td colspan=\ "2\ " align=\ "center\ "> ";
Match m = Regex.Match(str, " <\\s*td[\\s\\S]*(align=\ "\\w*\ ")[\\s\\S]*> ");
str = str.Replace(m.Groups[1].Value, " ");

输出: <td colspan= "2 " >
[解决办法]
对于问题一,再改下,这样更好..

Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ ")[\\s\\S]*> ");
-> >
Match m = Regex.Match(str, " <\\s*table[\\s\\S]*(border=\ "\\d*\ ")[\\s\\S]*> ");


Match m = Regex.Match(str, " <\\s*table\\s+(border=\ "\\d*\ "[\\s\\S]*)> ");
-> >
Match m = Regex.Match(str, " <\\s*table[\\s\\S]*(border=\ "\\d*\ "[\\s\\S]*)> ");


[解决办法]
string a=@ " <table border= " "2 " "> <table border= " "2 " "> ";
a=Regex.Replace(a,@ "(? <= <table)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
//a=Regex.Replace(a,@ " <table[^ <]+?> ", " <table> ",RegexOptions.IgnoreCase);
a=@ " <td colspan= " "2 " " align= " "center " "> <td colspan= " "2 " " align= " "center " "> ";
a=Regex.Replace(a,@ "(? <= <td\s+?\S+?)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
//a=Regex.Replace(a,@ "(? <= <td\s+?colspan=\S+?)\s+?.+?(?=> ) ", " ",RegexOptions.IgnoreCase);
a=@ " <col width=72> <col width=72> ";
a=Regex.Replace(a,@ " <col[^ <]+?> ", " ",RegexOptions.IgnoreCase);


[解决办法]
对于问题三,不太清楚需求..

如果是清除 <col ..> 只要里面包含有width属性的..

可以这样:

string str = " <col width=72 others..> ";
Match m = Regex.Match(str, " <\\s*col[\\s\\S]*(width=\\d*)[\\s\\S]*> ");
str = str.Replace(m.Groups[0].Value, " ");

输出空.

读书人网 >C#

热点推荐