解析跟遍历文档

解析和遍历文档
To parse a HTML document(解析一个html文档):

String html = "&lt;html&gt;&lt;head&gt;&lt;title&gt;First parse&lt;/title&gt;&lt;/head&gt;"  + "&lt;body&gt;&lt;p&gt;Parsed HTML into a doc.&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;";Document doc = Jsoup.parse(html);

(See parsing a document from a string for more info.)

The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. It handles(无论html格式是否完整或正确，解析器都会试图建立一个干净的对象或完整的对象):

unclosed tags (如未关闭的标签 )(e.g. Lorem Ipsum parses to Lorem Ipsum)implicit tags (如隐含的标签)(e.g. a naked <td>Table data</td> is wrapped into a <table><tr><td>?)reliably creating the document structure (可靠地创建文档结构)(html containing a head and body, and only appropriate elements within the head (html包含head 和 body,那些只适合在头部的标签))
The object model of a document(一个文档对象模型)
Documents consist of Elements and TextNodes (文档模型中包含很多元素和文字节点)(and a couple of other misc nodes（一些其他的节点）: see the nodes package tree(请看节点包)).
The inheritance chain is(继承连): Document extends Element extends Node(文档继承元素继承节点). TextNode extends Node(文字节点继承节点).
An Element contains a list of children Nodes(一个节点包含许多子节点), and has one parent Element(和有一个父节点). They also have provide a filtered list of child Elements only.
See also
Extracting data: DOM navigation
Extracting data: Selector syntax

解析跟遍历文档

热点推荐