JSOUP 如何处理转义字符?
比如我用JSOUP 爬到如下HTML,我如何解析?
比如第一个元素,翻译过来是<div class="item-inner clearfix">xxxx</div>。这样我才能用Element.select("div[class=item-inner clearfix]"); 如果不翻译过来,就取不到这个元素了。如何解决?
<div class="item-inner clearfix">
<div class="photo" data-spm="1000248">
<a target="_blank" href="http://dd.taobao.com/detail.htm?localstoreId=6f97a76d586e4cb383e669bc81923994" >
<span>
<img src="http://img3.tbcdn.cn:80/L1/76/600401658/41a0e57755f44c508fa46386342ff4e2_160x160.jpg" alt="一茶一坐(金桥店)">
</span
<span class="index">6</span
</a
</div
<div class="info">
<div class="clearfix" data-spm="1000256">
<a target="_blank" href="http://dd.taobao.com/detail.htm?localstoreId=6f97a76d586e4cb383e669bc81923994" class="name">
一茶一坐(金桥店)
</a
<a href="http://bendi.koubei.com/shanghai/list--q-%D2%BB%B2%E8%D2%BB%D7%F8--isfd-1" class=" branch"><em>分店</em</a
<a target="_blank" href="http://dd.taobao.com/detail.htm?localstoreId=6f97a76d586e4cb383e669bc81923994" >
<img src="http://img03.taobaocdn.com/tps/i3/T1wEaPXq8dXXcKFhzf-39-14.gif">
</a
<a target="_blank" href="http://waimai.taobao.com/shop_detail.htm?shopid=46669&city=310100" >
<img src="http://img02.taobaocdn.com/tps/i2/T1IZnfXedqXXcVIxzf-39-14.png" alt="">
</a
</div
<div class="more-info clearfix">
<div class="place-tag">
<div class="pingfen">
<span><label>服务:</label<em>4</em</span
<span><label>口味:</label<em>4</em</span
<span><label>环境:</label<em>4</em</span
<span><label>性价比:</label<em>4</em</span
</div
<p><span class="place">地址:</span浦东新区张杨路3611号金桥国...</p
<div class="tags" data-spm="1000249">
<span class="tag">标签:</span
<p>
<a href="http://bendi.koubei.com/shanghai/list--q-%C8%E2%D4%EF">肉燥</a
<a href="http://bendi.koubei.com/shanghai/list--q-%C2%E9%D3%CD%BC%A6%EC%D2">麻油鸡煲</a
<a href="http://bendi.koubei.com/shanghai/list--q-%BA%EC%B6%B9%C5%D9%B1%F9">红豆刨冰</a
<a href="http://bendi.koubei.com/shanghai/list--q-%CC%BC%C9%D5%D6%ED%BE%B1%C8%E2">碳烧猪颈肉</a
</p
</div
<p data-spm="1000252">
</p
</div
<div class="price">
<span class="g_price g_price-highlight" style="font-size:12px;">
<span style="color:#FD7320">¥</span
<strong style="background:none;font-size:12px;color:#FD7320;padding:0px;">58</strong
</span
</div
<div class="dp">
<p data-spm="1000250">
好评:
<a target="_blank" href="http://detail.koubei.com/store/detail--id-6f97a76d586e4cb383e669bc81923994"><em>100%</em</a(<a href="http://detail.koubei.com/store/detail--id-6f97a76d586e4cb383e669bc81923994" target="_blank">3</a)
</p
<!--点菜按钮-->
<div class="orderDishes_btn"><a href="http://dd.taobao.com/detail.htm?localstoreId=6f97a76d586e4cb383e669bc81923994" target="_blank">点 菜 </a</div
</div
</div
</div
</div
[解决办法]
import org.apache.commons.lang.StringEscapeUtils;
public class MainClass {
public static void main(String[] args) {
String strHTMLInput = "<P>MyName<P>";
String strEscapeHTML = StringEscapeUtils.escapeHtml(strHTMLInput);
String strUnEscapeHTML = StringEscapeUtils.unescapeHtml(strEscapeHTML);
System.out.println("Escaped HTML >>> " + strEscapeHTML);
System.out.println("UnEscaped HTML >>> " + strUnEscapeHTML);
}
}