读书人

跪求!正则表达式或其他方法解决有关问

发布时间: 2012-09-19 13:43:54 作者: rapoo

跪求!正则表达式或其他方法解决问题!!
有一个获取的string,(内容如下:)现在要将其规范并分成行:

<td class="t_H2">综合损益表-全年业绩</td>
<td class="t_H2" id="H">2011/12</td>
<td class="t_H2" id="H">2011/06</td>
<td class="t_H2" id="H">2010/06</td>
<td class="t_H2" id="H">2009/06</td>
<td class="t_H2" id="H">2008/06</td>

<td class="t_T1">营业额</td>
<td class="t_C1_B" id="C">3,361</td>
<td class="t_C1" id="C">5,714</td>
<td class="t_C1" id="C">12,580</td>
<td class="t_C1" id="C">4,696</td>
<td class="t_C1" id="C">10,553</td>
<td class="t_LC" colspan="6"></td>

<td class="t_T1">经营溢利</td>
<td class="t_C1_B" id="C">3,525</td>
<td class="t_C1" id="C">7,842</td>
<td class="t_C1" id="C">30,495</td>
<td class="t_C1" id="C">7,003</td>
<td class="t_C1" id="C">17,468</td>
<td class="t_LC" colspan="6"></td>

<td class="t_T1">非经营/ 特殊项目</td>
<td class="t_C1_B" id="C"><span style="color:green;">-149</span></td>
<td class="t_C1" id="C"><span style="color:green;">-139</span></td>
<td class="t_C1" id="C"><span style="color:green;">-120</span></td>
<td class="t_C1" id="C">0</td>
<td class="t_C1" id="C">0</td>
<td class="t_LC" colspan="6"></td>


规范后要达到如下效果:


<tr>
<td class="t_H2">综合损益表-全年业绩</td>
<td class="t_H2" id="H">2011/12</td>
<td class="t_H2" id="H">2011/06</td>
<td class="t_H2" id="H">2010/06</td>
<td class="t_H2" id="H">2009/06</td>
<td class="t_H2" id="H">2008/06</td>
</tr>
<tr>
<td class="t_T1">营业额</td>
<td class="t_C1_B" id="C">3,361</td>
<td class="t_C1" id="C">5,714</td>
<td class="t_C1" id="C">12,580</td>
<td class="t_C1" id="C">4,696</td>
<td class="t_C1" id="C">10,553</td>
<td class="t_LC" colspan="6"></td>
</tr>
<tr>
<td class="t_T1">经营溢利</td>
<td class="t_C1_B" id="C">3,525</td>
<td class="t_C1" id="C">7,842</td>
<td class="t_C1" id="C">30,495</td>
<td class="t_C1" id="C">7,003</td>
<td class="t_C1" id="C">17,468</td>
<td class="t_LC" colspan="6"></td>
</tr>
<tr>
<td class="t_T1">非经营/ 特殊项目</td>
<td class="t_C1_B" id="C"><span style="color:green;">-149</span></td>
<td class="t_C1" id="C"><span style="color:green;">-139</span></td>
<td class="t_C1" id="C"><span style="color:green;">-120</span></td>


<td class="t_C1" id="C">0</td>
<td class="t_C1" id="C">0</td>
<td class="t_LC" colspan="6"></td>
</tr>


应该怎么做?我想到的方法是以正则表达式匹配以汉字为内容的单元格,然后再在其前后插入<tr></tr>,但不会写正则啊,请各位高手帮帮我?或者以什么其他办法解决之?



[解决办法]
yourhtml=Regex.Replace(yourhtml,@"(<td[^>]*>.*?</td>\n)+","<tr>$0</tr>");
[解决办法]
string str = File.ReadAllText("D:\\1.txt", Encoding.Default);
str = Regex.Replace(str, @"<td[^>]*>[^<]*?[\u4e00-\u9fa5]+((?!<td[^>]*>[^<]*?[\u4e00-\u9fa5]+)[\s\S])+", "<tr>$0</tr>", RegexOptions.IgnoreCase);
Console.WriteLine(str);

[解决办法]

探讨
string str = File.ReadAllText("D:\\1.txt", Encoding.Default);
str = Regex.Replace(str, @"<td[^>]*>[^<]*?[\u4e00-\u9fa5]+((?!<td[^>]*>[^<]*?[\u4e00-\u9fa5]+)[\s\S])+", "<tr>$0</tr>", RegexOptions.Ig……

[解决办法]
不能用汉字作为区分标识,我观察到的规律是标题列没有id

C# code
Regex reg = new Regex(@"(?is)(<td class=""[^""]*"">.*?</td>)\s*(?=<td class=""[^""]*"">|$)");string result = reg.Replace(yourStr, "<tr>\n$1\n</tr>\n");
[解决办法]
试试这个

string str = File.ReadAllText("D:\\1.txt", Encoding.Default);
str = Regex.Replace(str, @"<td((?!id=)[^>])+>[\s\S]*?(?=$|<td((?!(id|colspan))[^>])+>)", "<tr>$0</tr>\n", RegexOptions.IgnoreCase);
Console.WriteLine(str);
[解决办法]
需要找到规律

C# code
string tempStr = File.ReadAllText(@"C:\Documents and Settings\Administrator\桌面\Test.txt", Encoding.GetEncoding("GB2312"));//读取txt                string pattern = @"(?i)<td[^>]*?>[\u4e00-\u9fa5]+?[^<>]*?</td>\s*?(\s*?<td[^>]*?(id|colspan)[^>]*?>((?!</td>)[\s\S])*?</td>\s+?)+";                tempStr = Regex.Replace(tempStr,pattern,"<tr>$0</tr>"); 

读书人网 >C#

热点推荐