XULRunner with Java: JavaXPCOM Tutorial 三

XULRunner with Java: JavaXPCOM Tutorial 3

6 加载页面的W3C DOM访问

6.1 mozdom4java库
? 访问W3C DOM树比访问Mozilla的DOM树要好，因为它是一个动态访问HTML和XML的DOM树的标准。为了实现这个，我们使用从Mozilla

DOM到W3C DOM的java Bridge。有一个叫做mozdom4java的项目http://mozdom4java.mozdev.org/index.html。
? 下载这个包后，我们把jar包放到classpath里。例如，我们增加一个按钮来抽取HTML文档里的所有链接。

    // When that button is pressed, then we obtain the HTML document corresponding to      // the URL loaded in browser. Next, we extract all its child nodes with 'a' tag name      // and print its content.      final ToolItem anchorItem = new ToolItem(toolbar, SWT.PUSH);      anchorItem.setImage(getImage("resources/anchors.png"));      anchorItem.addSelectionListener(new SelectionAdapter() {              public void widgetSelected(SelectionEvent event) {                                           // First, we obtain a Mozilla DOM Document representation                      nsIDOMDocument doc = browser.getDocument();                                                           // Get all anchors from the loaded HTML document                      nsIDOMNodeList nodeList = doc.getElementsByTagName("a");                      for ( int i = 0; i &lt; nodeList.getLength(); i++ ){                                                                           // Get Mozilla DOM node                              nsIDOMNode mozNode = nodeList.item(i);                                                                           // Get the appropiate interface                              nsIDOMHTMLAnchorElement mozAnchor =                                      (nsIDOMHTMLAnchorElement) mozNode.queryInterface(                                                      nsIDOMHTMLAnchorElement.NS_IDOMHTMLANCHORELEMENT_IID);                                                                           // Get the corresponding W3C DOM node                              HTMLAnchorElement a = (HTMLAnchorElement)                                      HTMLAnchorElementImpl.getDOMInstance(mozAnchor);                                                                                                           // Test the HTML element                              System.out.println("Tag Name: " + a.getNodeName() + " -- Text: " + a.getTextContent()                                              + " -- Href: " + a.getHref());                                                                   }                                   }      });      ...

6.2 给mozdom4j打补丁来实现mozilla DOM Tree到 W3C DOM Tree的转换
?
如果我们总想使用W3C DOM Tree，节点的转换可能有点麻烦。我们建议修改mozdom4java。在我们看来，这些修改简化了代码，因为

我们可以忘掉Mozilla DOM节点。最后，当我们讨论XPath时evaluator将返回一个节点的list，操作W3C element比Mozilla的node方

便，换句话说，我们的目标是构建一个可用的web browser，用标准的方法使用它而不用知道Mozilla实现的任何知识。

首先，我们需要下载Java Language Binding for DOM Level 2规范。比较好的做法是下载mozdom4java项目的jar包，

http://www.mozdev.org/source/browse/mozdom4java/src/jars/，因为他们包含了所有需要的文件，包括手工的扩展，因此我们不

需要关心任何东西。此外，我们也需要Mozilla接口。所有需要的文件：
? w3chtml.jar 包含了W3C DOM HTML level 2的接口，分成两个包 org.w3c.dom.html 和 org.w3c.dom.html2
? w3cextension.jar 包含 KeyEvent 类于org.w3c.dom.events包中。
? MozillaInterfaces.jar
? MozillaGlue.jar
?
? 当你把这些jar包扔到classpaht后，mozdom4java应该可以很好的编译（没有错误，可能有一下警告）。下面我们将修改

mozdom4java的源代码。我们将逐个文件的解释这些修改。当然，你可以直接下载修改好的jar包。
? 要手工patch这些库，请follow下面的步骤：
? 我们将要创建一个HMTL element的factory，这个类能转换Mozilla DOM element节点为相应的W3C DOM element节点。下面的类就

做了这件事情并且包含了许多注释。它使用了java反射来做前面的事情，这种方式可以让你不需要知道任何Mozilla DOM节点。
? 注：代码虽然很长，其实非常简单，就是用反射来调用前面的getDOMInstance方法
package es.ladyr.dom;

import java.lang.reflect.Field; import java.lang.reflect.Method; import java.util.HashMap; import java.util.Map; import org.mozilla.interfaces.*; import org.w3c.dom.html.HTMLElement; public class HTMLElementFactory { private static HTMLElementFactory instance; private Map<String, String> corresp; private HTMLElementFactory() { initCorrespondence(); } public static HTMLElementFactory getInstance(){ if(instance == null){ instance = new HTMLElementFactory(); } return instance; } public static HTMLElement getHTMLElement(nsIDOMNode nsNode) { return getInstance().getConcreteNode(nsNode); } private void initCorrespondence() { corresp = new HashMap<String, String>(); corresp.put("a", "Anchor"); corresp.put("applet", "Applet"); corresp.put("area", "Area"); corresp.put("base", "Base"); corresp.put("basefont", "BaseFont"); corresp.put("body", "Body"); corresp.put("br", "BR"); corresp.put("button", "Button"); corresp.put("dir", "Directory"); corresp.put("div", "Div"); corresp.put("dl", "DList"); corresp.put("fieldset", "FieldSet"); corresp.put("font", "Font"); corresp.put("form", "Form"); corresp.put("frame", "Frame"); corresp.put("frameset", "FrameSet"); corresp.put("head", "Head"); corresp.put("h1", "Heading"); corresp.put("h2", "Heading"); corresp.put("h3", "Heading"); corresp.put("h4", "Heading"); corresp.put("h5", "Heading"); corresp.put("h6", "Heading"); corresp.put("hr", "HR"); corresp.put("html", "Html"); corresp.put("iframe", "IFrame"); corresp.put("img", "Image"); corresp.put("input", "Input"); corresp.put("isindex", "IsIndex"); corresp.put("label", "Label"); corresp.put("legend", "Legend"); corresp.put("li", "LI"); corresp.put("link", "Link"); corresp.put("map", "Map"); corresp.put("menu", "Menu"); corresp.put("meta", "Meta"); corresp.put("ins", "Mod"); corresp.put("del", "Mod"); corresp.put("object", "Object"); corresp.put("ol", "OList"); corresp.put("optgroup", "OptGroup"); corresp.put("option", "Option"); corresp.put("p", "Paragraph"); corresp.put("param", "Param"); corresp.put("pre", "Pre"); corresp.put("q", "Quote"); corresp.put("script", "Script"); corresp.put("select", "Select"); corresp.put("style", "Style"); corresp.put("caption", "TableCaption"); corresp.put("td", "TableCell"); corresp.put("col", "TableCol"); corresp.put("table", "Table"); corresp.put("tr", "TableRow"); corresp.put("thead", "TableSection"); corresp.put("tfoot", "TableSection"); corresp.put("tbody", "TableSection"); corresp.put("textarea", "TextArea"); corresp.put("title", "Title"); corresp.put("ul", "UList"); } /** * Try to convert a Mozilla DOM node into W3C DOM element. * * @param nsNode node to convert into W3C DOM element. * @return W3C HTML element corresponding to a Mozilla DOM node. */ public HTMLElement getConcreteNode(nsIDOMNode nsNode) { // Only converts element nodes. If the mozilla node // isn't a Mozilla DOM element, we cannot convert into // an W3C DOM element if (nsNode.getNodeType() == nsIDOMNode.ELEMENT_NODE) { // We use a hashmap to obtain element names from node names String htmlElementType = corresp.get(nsNode.getNodeName() .toLowerCase()); // If we don't know the element type, we cannot transform // that node into W3C DOM element if(htmlElementType == null){ return null; } // Compose the class name for the Mozilla DOM element. String nsClassName = "org.mozilla.interfaces.nsIDOMHTML" + htmlElementType + "Element"; // Compose the field name for the element IID String nsFieldInterfaceName = "NS_IDOMHTML" + htmlElementType.toUpperCase() + "ELEMENT_IID"; try { // Once we have their names, obtain the class and the field Class nsClass = Class.forName(nsClassName); Field field = nsClass.getField(nsFieldInterfaceName); // Get the field value (is a static field, so the argumentis ignored) String iid = (String) field.get(null); // Get the apropiate node interface Object nsElement = nsNode.queryInterface(iid); // Build the W3C DOM Element implementation class name // (the package org.mozilla.dom.html contains concrete implementations // for the W3C HTML element interfaces) String w3cClassName = "org.mozilla.dom.html.HTML" + htmlElementType + "ElementImpl"; // Obtain the class for the corresponding W3C DOM Element implementation Class w3cClass = Class.forName(w3cClassName); // Extract the method that must be invoked to transform the element Method creationMethod = w3cClass.getMethod("getDOMInstance", nsClass); // Invokes getDOMInstance method of corresponding W3C HTML element // which returns an instance of corresponding W3C HTML element HTMLElement node = (HTMLElement) creationMethod.invoke(null, nsElement); return node; } catch (Exception e) { throw new Error(e); } } return null; } } ?

利用我们的HTMLElementFactory类，我们将要修改NodeFactory类。修改后你可以调用org.w3c.dom.Node getNodeInstance

(nsIDOMNode node)，当输入是类型是nsIDOMNode.ELEMENT_NODE时，返回的是与之对应的W3C DOM element。
修改的代码如下：

... // Import our factory to create W3C HTML elements from Mozilla DOM elements import es.ladyr.dom.HTMLElementFactory; ... public static Node getNodeInstance( nsIDOMNode node ) { if (node == null) { return null; } switch ( node.getNodeType() ) { case nsIDOMNode.ELEMENT_NODE: // Use our factory to obtain a W3C HTML DOM element Node htmlElement = HTMLElementFactory.getHTMLElement(node); if (htmlElement != null) { return htmlElement; } else { // If factory cannot convert the concrete node (for instance, // the type is unknown for our factory implementation), then // returns a generic W3C DOM element return ElementImpl.getDOMInstance((nsIDOMElement) node .queryInterface(nsIDOMElement.NS_IDOMELEMENT_IID)); } ... ?

下面是NodeFactory 类的完整代码：

/* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ * * Software distributed under the License is distributed on an "AS IS" basis, * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License * for the specific language governing rights and limitations under the * License. * * The Original Code is mozdom4java * * The Initial Developer of the Original Code is * Peter Szinek, Lixto Software GmbH, http://www.lixto.com. * Portions created by the Initial Developer are Copyright (C) 2005-2006 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Peter Szinek (peter@rubyrailways.com) * Michal Ceresna (michal.ceresna@gmail.com) * * Alternatively, the contents of this file may be used under the terms of * either the GNU General Public License Version 2 or later (the "GPL"), or * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), * in which case the provisions of the GPL or the LGPL are applicable instead * of those above. If you wish to allow use of your version of this file only * under the terms of either the GPL or the LGPL, and not to allow others to * use your version of this file under the terms of the MPL, indicate your * decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL. * * ***** END LICENSE BLOCK ***** */ import org.w3c.dom.Node; import org.mozilla.dom.*; import org.mozilla.interfaces.*; public class NodeFactory { private NodeFactory() {} public static Node getNodeInstance( nsIDOMEventTarget eventTarget ) { if (eventTarget == null ) { return null; } nsIDOMNode node = (nsIDOMNode) eventTarget.queryInterface(nsIDOMNode.NS_IDOMNODE_IID); return getNodeInstance(node); } public static Node getNodeInstance( nsIDOMNode node ) { if (node == null) { return null; } switch ( node.getNodeType() ) { case nsIDOMNode.ELEMENT_NODE: // Use our factory to obtain a W3C HTML DOM element Node htmlElement = HTMLElementFactory.getHTMLElement(node); if (htmlElement != null) { return htmlElement; } else { // If factory cannot convert the concrete node (for instance, // the type is unknown for our factory implementation), then // returns a generic W3C DOM element return ElementImpl.getDOMInstance((nsIDOMElement) node .queryInterface(nsIDOMElement.NS_IDOMELEMENT_IID)); } case nsIDOMNode.ATTRIBUTE_NODE: return AttrImpl.getDOMInstance((nsIDOMAttr) node.queryInterface (nsIDOMAttr.NS_IDOMATTR_IID)); case nsIDOMNode.TEXT_NODE: return TextImpl.getDOMInstance((nsIDOMText) node.queryInterface (nsIDOMText.NS_IDOMTEXT_IID)); case nsIDOMNode.CDATA_SECTION_NODE: return CDATASectionImpl.getDOMInstance((nsIDOMCDATASection) node.queryInterface(nsIDOMCDATASection.NS_IDOMCDATASECTION_IID)); case nsIDOMNode.ENTITY_REFERENCE_NODE: return EntityReferenceImpl.getDOMInstance((nsIDOMEntityReference) node.queryInterface(nsIDOMEntityReference.NS_IDOMENTITYREFERENCE_IID)); case nsIDOMNode.ENTITY_NODE: return EntityImpl.getDOMInstance((nsIDOMEntity) node.queryInterface (nsIDOMEntity.NS_IDOMENTITY_IID)); case nsIDOMNode.PROCESSING_INSTRUCTION_NODE: return ProcessingInstructionImpl.getDOMInstance ((nsIDOMProcessingInstruction) node.queryInterface(nsIDOMProcessingInstruction.NS_IDOMPROCESSINGINSTRUCTION_IID)); case nsIDOMNode.COMMENT_NODE: return CommentImpl.getDOMInstance((nsIDOMComment) node.queryInterface (nsIDOMComment.NS_IDOMCOMMENT_IID)); case nsIDOMNode.DOCUMENT_NODE: return DocumentImpl.getDOMInstance((nsIDOMDocument) node.queryInterface (nsIDOMDocument.NS_IDOMDOCUMENT_IID)); case nsIDOMNode.DOCUMENT_TYPE_NODE: return DocumentTypeImpl.getDOMInstance((nsIDOMDocumentType) node.queryInterface(nsIDOMDocumentType.NS_IDOMDOCUMENTTYPE_IID)); case nsIDOMNode.DOCUMENT_FRAGMENT_NODE: return DocumentFragmentImpl.getDOMInstance ((nsIDOMDocumentFragment) node.queryInterface(nsIDOMDocumentFragment.NS_IDOMDOCUMENTFRAGMENT_IID)); case nsIDOMNode.NOTATION_NODE: return NotationImpl.getDOMInstance((nsIDOMNotation) node.queryInterface (nsIDOMNotation.NS_IDOMNOTATION_IID)); default: return NodeImpl.getDOMInstance(node); } } public static nsIDOMNode getnsIDOMNode( Node node ) { if (node instanceof NodeImpl) { NodeImpl ni = (NodeImpl) node; return ni.getInstance(); } else { return null; } } private static boolean toLower = true; public static boolean getConvertNodeNamesToLowerCase() { return toLower; } public static void setConvertNodeNamesToLowerCase(boolean convert) { toLower = convert; } private static boolean expandFrames = false; public static boolean getExpandFrames() { return expandFrames; } public static void setExpandFrames(boolean expand) { expandFrames = expand; } } ?

最后，我们需要修改ElementImpl类。这个类有两个方法， public String getAttribute(String name) 和 public String

getTagName() ，这个两个方法最后会调用toLowerCase来把结果变成小写。这可能会带来问题，比如，一个anchor的属性可能是

onclick,这个属性的值可能包含JavaScript代码。如果我们需要执行这段JavaScript代码，那么可能会有问题。所以我们需要修改一

下ElementImpl.java文件：

    ...          public String getAttribute(final String name)          {              //METHOD-BODY-START - autogenerated code              Callable&lt;String&gt; c = new Callable&lt;String&gt;() { public String call() {                  String result = getInstanceAsnsIDOMElement().getAttribute(name);                  return result;              }};              return ThreadProxy.getSingleton().syncExec(c);              //METHOD-BODY-END - autogenerated code          }            ...          public String getTagName()          {              //METHOD-BODY-START - autogenerated code              Callable&lt;String&gt; c = new Callable&lt;String&gt;() { public String call() {                  String result = getInstanceAsnsIDOMElement().getTagName();                  return result;              }};              return ThreadProxy.getSingleton().syncExec(c);              //METHOD-BODY-END - autogenerated code          }            ...

? ... import org.mozilla.dom.NodeFactory; import org.mozilla.interfaces.*; import org.w3c.dom.html.HTMLAnchorElement; import org.w3c.dom.html.HTMLElement; ... final ToolItem anchorItem = new ToolItem(toolbar, SWT.PUSH); anchorItem.setImage(getImage("resources/anchors.png")); anchorItem.addSelectionListener(new SelectionAdapter() { public void widgetSelected(SelectionEvent event) { // First, we obtain a Mozilla DOM Document representation nsIWebBrowser webBrowser = (nsIWebBrowser)browser.getWebBrowser(); if (webBrowser == null) { System.out.println("Could not get the nsIWebBrowser from the Browser widget"); } nsIDOMWindow window = webBrowser.getContentDOMWindow(); nsIDOMDocument doc = window.getDocument(); System.out.println(doc); // Get all anchors from the loaded HTML document nsIDOMNodeList nodeList = doc.getElementsByTagName("a"); analyzeAnchors(nodeList); } private void analyzeAnchors(nsIDOMNodeList nodeList) { for (int i = 0; i < nodeList.getLength(); i++) { // Get Mozilla DOM node nsIDOMNode mozNode = nodeList.item(i); // We are supposing that the NodeList contains only HTMLElements // because we only call this method over HTML nodes // (NodeFactory.getNodeInstance could returns another node // descendants, depends on the input Mozilla DOM node) HTMLElement htmlElement = (HTMLElement) NodeFactory.getNodeInstance(mozNode); // We only are interested in anchors if (htmlElement instanceof HTMLAnchorElement) { HTMLAnchorElement a = (HTMLAnchorElement) htmlElement; // Test the HTML element System.out.println("Tag Name: " + a.getNodeName() + " -- Text: " + a.getTextContent() + " -- Href: " + a.getHref()); } } } }); ?

XULRunner with Java: JavaXPCOM Tuto

热点推荐