读书人

怎么解析html

发布时间: 2011-12-15 23:41:24 作者: rapoo

如何解析html
html内容如下:

GET www.baidu.com/pub/WWW/ HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash,

application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword, */*
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: www.baidu.com
Connection: Keep-Alive
Referer: http://www.ntop.org/
Pragma:no-cache
Content-length:244
<html>
<head>
<title> xinxi </title>
<result> </result>
</head>
<body>
<accNo> 12312 </accNO>
</body>
</html>


[解决办法]
文法分析,好像课本里的东西吧
[解决办法]
import au.id.jericho.lib.html.*;
import java.util.*;
import java.io.*;
import java.net.*;

public class DisplayAllElements {
public static void main(String[] args) throws Exception {
String sourceUrlString= "c:\\test.html ";
if (sourceUrlString.indexOf( ': ')==-1) sourceUrlString= "file: "+sourceUrlString;
URL sourceUrl=new URL(sourceUrlString);
String htmlText=Util.getString(new InputStreamReader(sourceUrl.openStream()));
Source source=new Source(htmlText);
source.setLogWriter(new OutputStreamWriter(System.err));
// send log messages to stderr
for (Iterator i=source.findAllElements().iterator(); i.hasNext(); )

{
Element element=(Element)i.next();
System.out.println(element.getDebugInfo());
System.out.println(element);
}
}
}

读书人网 >J2SE开发

热点推荐