读书人

链接分享的java兑现(抓取标题和描述信

发布时间: 2012-10-30 16:13:36 作者: rapoo

链接分享的java实现(抓取标题和描述信息)

本想实现直接分析任意一个链接地址,如果该站点是文章则只抓取文章,不是文章则只抓取标题和描述信息,但找了很多相关资料,本人能力有限,看了很多砖家写的什么算法也是瞎扯淡,干脆简单的实现标题和表述的抓取,这个很简单,本不想贴在此,但怕以后要用,好找点,先先记录在此:

package com.jyeba.core.html;public class HtmlInfo {private String title;private String desc;public void setTitle(String title) {this.title = title;}public String getTitle() {return title;}public void setDesc(String desc) {this.desc = desc;}public String getDesc() {return desc;}}抓取工具类package com.jyeba.core.html;public class HtmlTools {public static HtmlInfo getHtmlInfo(String url) throws IOException {HtmlInfo html = new HtmlInfo();Document doc = Jsoup.connect(url).data("query", "Java").userAgent("Mozilla").cookie("auth", "token").timeout(6000).get();Elements e = doc.select("title");if (e.size() > 0) {System.out.println(e.text());html.setTitle(e.text());}e = doc.select("meta[name=Description]");if (e.size() > 0) {System.out.println(e.get(0).attr("content"));html.setDesc(e.get(0).attr("content"));}return html;}public static void main(String[] args) throws IOException{HtmlInfo info=HtmlTools.getHtmlInfo("http://news.qq.com/a/20111017/000091.htm");}}
?

?

读书人网 >编程

热点推荐