跪求!正则表达式一条
从网上获取了一个表格的string ,如下:
- HTML code
<td class="t_H2">合现金流量表-全年</td><td class="t_H2" id="H">2011/12</td><td class="t_H2" id="H">2010/12</td><td class="t_H2" id="H">2009/12</td><td class="t_H2" id="H">2008/12</td><td class="t_H2" id="H">2007/12</td><td class="t_T1">经营活动之现金流量</td><td class="t_C1_B" id="C">395,935</td><td class="t_C1" id="C">14,584</td><td class="t_C1" id="C">226,605</td><td class="t_C1" id="C">133,104</td><td class="t_C1" id="C">109,144</td><td class="t_LC" colspan="6"></td><td class="t_T1">投资回报及融资费用之现金流量</td><td class="t_C1_B" id="C"><span style="color:green;">-66,233</span></td><td class="t_C1" id="C">243,749</td><td class="t_C1" id="C">180,372</td><td class="t_C1" id="C">231,016</td><td class="t_C1" id="C">191,366</td><td class="t_LC" colspan="6"></td><td class="t_T1"> 已收利息</td><td class="t_C1_B" id="C">0</td><td class="t_C1" id="C">449,667</td><td class="t_C1" id="C">399,115</td><td class="t_C1" id="C">425,143</td><td class="t_C1" id="C">328,121</td><td class="t_LC" colspan="6"></td><td class="t_T1"> 已付利息</td><td class="t_C1_B" id="C"><span style="color:green;">-3,212</span></td><td class="t_C1" id="C"><span style="color:green;">-149,898</span></td><td class="t_C1" id="C"><span style="color:green;">-164,088</span></td><td class="t_C1" id="C"><span style="color:green;">-150,029</span></td><td class="t_C1" id="C"><span style="color:green;">-120,941</span></td><td class="t_LC" colspan="6"></td><td class="t_T1"> 已收股息</td><td class="t_C1_B" id="C">1,268</td><td class="t_C1" id="C">1,071</td><td class="t_C1" id="C">544</td><td class="t_C1" id="C">652</td><td class="t_C1" id="C">74</td><td class="t_LC" colspan="6"></td><td class="t_T1"> 已付股息</td><td class="t_C1_B" id="C"><span style="color:green;">-64,289</span></td><td class="t_C1" id="C"><span style="color:green;">-57,091</span></td><td class="t_C1" id="C"><span style="color:green;">-55,199</span></td><td class="t_C1" id="C"><span style="color:green;">-44,750</span></td><td class="t_C1" id="C"><span style="color:green;">-15,888</span></td><td class="t_LC" colspan="6"></td><td class="t_T1"> 其他</td><td class="t_C1_B" id="C">0</td><td class="t_C1" id="C">0</td><td class="t_C1" id="C">0</td><td class="t_C1" id="C">0</td><td class="t_C1" id="C">0</td><td class="t_LC" colspan="6"></td><td class="t_T1">退回/(已缴)税项</td><td class="t_C1_B" id="C"><span style="color:green;">-47,812</span></td><td class="t_C1" id="C"><span style="color:green;">-38,774</span></td><td class="t_C1" id="C"><span style="color:green;">-58,938</span></td><td class="t_C1" id="C"><span style="color:green;">-38,545</span></td><td class="t_C1" id="C"><span style="color:green;">-21,400</span></td><td class="t_LC" colspan="6"></td><td class="t_T1">投资活动之现金流量</td><td class="t_C1_B" id="C"><span style="color:green;">-58,001</span></td><td class="t_C1" id="C"><span style="color:green;">-160,657</span></td><td class="t_C1" id="C"><span style="color:green;">-585,828</span></td><td class="t_C1" id="C"><span style="color:green;">-10,576</span></td><td class="t_C1" id="C"><span style="color:green;">-244,866</span></td><td class="t_LC" colspan="6"></td><td class="t_T1"> 增添固定资产</td><td class="t_C1_B" id="C"><span style="color:green;">-22,896</span></td><td class="t_C1" id="C"><span style="color:green;">-20,017</span></td><td class="t_C1" id="C"><span style="color:green;">-20,285</span></td><td class="t_C1" id="C"><span style="color:green;">-15,554</span></td><td class="t_C1" id="C"><span style="color:green;">-9,385</span></td><td class="t_LC" colspan="6"></td><td class="t_T1"> 出售固定资产</td><td class="t_C1_B" id="C">1,278</td><td class="t_C1" id="C">666</td><td class="t_C1" id="C">1,407</td><td class="t_C1" id="C">520</td><td class="t_C1" id="C">2,823</td><td class="t_LC" colspan="6"></td>...
数据来源的网址是这个: http://www.aastocks.com/sc/Stock/CompanyFundamental.aspx?CFType=6&symbol=01398
现在想将其规范一下,在合适的位置添加<tr>和</tr>,也就是变成如下:
- HTML code
<tr><td class="t_H2">合现金流量表-全年</td><td class="t_H2" id="H">2011/12</td><td class="t_H2" id="H">2010/12</td><td class="t_H2" id="H">2009/12</td><td class="t_H2" id="H">2008/12</td><td class="t_H2" id="H">2007/12</td></tr><tr><td class="t_T1">经营活动之现金流量</td><td class="t_C1_B" id="C">395,935</td><td class="t_C1" id="C">14,584</td><td class="t_C1" id="C">226,605</td><td class="t_C1" id="C">133,104</td><td class="t_C1" id="C">109,144</td><td class="t_LC" colspan="6"></td></tr>....
之前有一位高手写过一个正则(如下),可以匹配到某几行的,但是有一些行却匹配不到,可以作为参考
- C# code
//strhtml 就是上面所提到的stringstring pattern = @"(?i)<td[^>]*?>[\u4e00-\u9fa5]+?[^<>]*?</td>\s*?(\s*?<td[^>]*?(id|colspan)[^>]*?>((?!</td>)[\s\S])*?</td>\s+?)+";//规范tr strhtml = Regex.Replace(strtrs, pattern, "<tr>$0</tr>");
相关帖子 :
http://topic.csdn.net/u/20120829/11/b7c8aebc-ad28-4acc-8203-4a2763782691.html
[解决办法]
Try
- C# code
string pattern = @"(?i)<td[^>]*?>[^<>]*?[\u4e00-\u9fa5]+?[^<>]*?</td>\s*?(\s*?<td[^>]*?(id|colspan)[^>]*?>((?!</td>)[\s\S])*?</td>\s+?)+";