jsoup使用
?
?? ? ??jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。请参考:http://jsoup.org/
?
?
?? ??jsoup的主要功能如下:
?
?? ? ?从一个URL,文件或字符串中解析HTML;
?? ? ?使用DOM或CSS选择器来查找、取出数据;
?? ? ?可操作HTML元素、属性、文本;
?? ? ?jsoup是基于MIT协议发布的,可放心使用于商业项目。
?????下载和安装:
?? ? ?maven安装方法:
?? ? ? 把下面放入pom.xml下
?? ? ? ?<dependency>
?? ? ? ? ?<!-- jsoup HTML parser library @ http://jsoup.org/ -->
?? ? ? ? <groupId>org.jsoup</groupId>
?? ? ? ? <artifactId>jsoup</artifactId>
?? ? ? ? <version>1.5.2</version>
?? ? ? ?</dependency>
?? ? ?用jsoup解析html的方法如下:
?? ? ? ?解析url html方法
?
/** * Get an absolute URL from a URL attribute that may be relative (i.e. an <code><a href></code> or * <code><img src></code>). * <p/> * E.g.: <code>String absUrl = linkEl.absUrl("href");</code> * <p/> * If the attribute value is already absolute (i.e. it starts with a protocol, like * <code>http://</code> or <code>https://</code> etc), and it successfully parses as a URL, the attribute is * returned directly. Otherwise, it is treated as a URL relative to the element's {@link #baseUri}, and made * absolute using that. * <p/> * As an alternate, you can use the {@link #attr} method with the <code>abs:</code> prefix, e.g.: * <code>String absUrl = linkEl.attr("abs:href");</code> * * @param attributeKey The attribute key * @return An absolute URL if one could be made, or an empty string (not null) if the attribute was missing or * could not be made successfully into a URL. * @see #attr * @see java.net.URL#URL(java.net.URL, String) *///看到这里大家应该明白绝对地址是怎么取的了public String absUrl(String attributeKey) { Validate.notEmpty(attributeKey); String relUrl = attr(attributeKey); if (!hasAttr(attributeKey)) { return ""; // nothing to make absolute with } else { URL base; try { try { base = new URL(baseUri); } catch (MalformedURLException e) { // the base is unsuitable, but the attribute may be abs on its own, so try that URL abs = new URL(relUrl); return abs.toExternalForm(); } // workaround: java resolves '//path/file + ?foo' to '//path/?foo', not '//path/file?foo' as desired if (relUrl.startsWith("?")) relUrl = base.getPath() + relUrl; URL abs = new URL(base, relUrl); return abs.toExternalForm(); } catch (MalformedURLException e) { return ""; } } }
??