读书人

jsoup运用

发布时间: 2012-12-21 12:03:49 作者: rapoo

jsoup使用

?


?? ? ?
?jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。请参考http://jsoup.org/

?

?

?? ??jsoup的主要功能如下:

?

?? ? ?从一个URL,文件或字符串中解析HTML;

?? ? ?使用DOM或CSS选择器来查找、取出数据;

?? ? ?可操作HTML元素、属性、文本;

?? ? ?jsoup是基于MIT协议发布的,可放心使用于商业项目。

?????下载和安装:

?? ? ?maven安装方法:

?? ? ? 把下面放入pom.xml下

?? ? ? ?<dependency>

?? ? ? ? ?<!-- jsoup HTML parser library @ http://jsoup.org/ -->

?? ? ? ? <groupId>org.jsoup</groupId>

?? ? ? ? <artifactId>jsoup</artifactId>

?? ? ? ? <version>1.5.2</version>

?? ? ? ?</dependency>

?? ? ?用jsoup解析html的方法如下:

?? ? ? ?解析url html方法

?

      /**     * Get an absolute URL from a URL attribute that may be relative (i.e. an <code><a href></code> or     * <code><img src></code>).     * <p/>     * E.g.: <code>String absUrl = linkEl.absUrl("href");</code>     * <p/>     * If the attribute value is already absolute (i.e. it starts with a protocol, like     * <code>http://</code> or <code>https://</code> etc), and it successfully parses as a URL, the attribute is     * returned directly. Otherwise, it is treated as a URL relative to the element's {@link #baseUri}, and made     * absolute using that.     * <p/>     * As an alternate, you can use the {@link #attr} method with the <code>abs:</code> prefix, e.g.:     * <code>String absUrl = linkEl.attr("abs:href");</code>     *     * @param attributeKey The attribute key     * @return An absolute URL if one could be made, or an empty string (not null) if the attribute was missing or     * could not be made successfully into a URL.     * @see #attr     * @see java.net.URL#URL(java.net.URL, String)     *///看到这里大家应该明白绝对地址是怎么取的了public String absUrl(String attributeKey) {        Validate.notEmpty(attributeKey);        String relUrl = attr(attributeKey);        if (!hasAttr(attributeKey)) {            return ""; // nothing to make absolute with        } else {            URL base;            try {                try {                    base = new URL(baseUri);                } catch (MalformedURLException e) {                    // the base is unsuitable, but the attribute may be abs on its own, so try that                    URL abs = new URL(relUrl);                    return abs.toExternalForm();                }                // workaround: java resolves '//path/file + ?foo' to '//path/?foo', not '//path/file?foo' as desired                if (relUrl.startsWith("?"))                    relUrl = base.getPath() + relUrl;                URL abs = new URL(base, relUrl);                return abs.toExternalForm();            } catch (MalformedURLException e) {                return "";            }        }    }
?

?

读书人网 >JavaScript

热点推荐