关于SAX解析Xml的一点心得体会
本人新手,也就随便写写,有兴趣的可以看下!
SAX解析XML十分简单,可是将解析出来的数据分类存储就不那么容易了!我的XML源文件如下:
- XML code
<?xml version="1.0" encoding="UTF-8"?><config> <Vms total_Vms="5"> <Vm> <Vm_ID>0</Vm_ID> <Vm_mips>278</Vm_mips> </Vm> <Vm> <Vm_ID>1</Vm_ID> <Vm_mips>289</Vm_mips> </Vm> <Vm> <Vm_ID>2</Vm_ID> <Vm_mips>132</Vm_mips> </Vm> <Vm> <Vm_ID>3</Vm_ID> <Vm_mips>209</Vm_mips> </Vm> <Vm> <Vm_ID>4</Vm_ID> <Vm_mips>286</Vm_mips> </Vm> </Vms> <Cloudlets total_Cloudlets="10"> <Cloudlet> <Cloudlet_ID>0</Cloudlet_ID> <Cloudlet_length>19365</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>1</Cloudlet_ID> <Cloudlet_length>49809</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>2</Cloudlet_ID> <Cloudlet_length>30218</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>3</Cloudlet_ID> <Cloudlet_length>44157</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>4</Cloudlet_ID> <Cloudlet_length>16754</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>5</Cloudlet_ID> <Cloudlet_length>18336</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>6</Cloudlet_ID> <Cloudlet_length>20045</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>7</Cloudlet_ID> <Cloudlet_length>31493</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>8</Cloudlet_ID> <Cloudlet_length>30727</Cloudlet_length> </Cloudlet> <Cloudlet> <Cloudlet_ID>9</Cloudlet_ID> <Cloudlet_length>31017</Cloudlet_length> </Cloudlet> </Cloudlets></config>
我用了4个List(vmId、vmMips、CloudletId、CloudletLength)来保存解析出来的数据。saxParse会读取InputStream来一行一行的解析xml文件中的数据,saxParse解析xml文件的核心方法是parse(InputStream in,DefaultsHandler dh),我们只需要用这个方法就好。DefaultHandler里面的方法全部都是空的,我们需要设计自己的handler通过继承来重写里面一些比较重要的方法,在我的代码里面重写了5个方法,其中后面3个是必须的!
- Java code
import java.io.FileInputStream;import java.io.InputStream;import java.util.ArrayList;import java.util.List;import javax.xml.parsers.SAXParser;import javax.xml.parsers.SAXParserFactory;import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.DefaultHandler;public class SAXParseUtil { static List<String> vmId=new ArrayList<String>(); static List<String> vmMips=new ArrayList<String>(); static List<String> cloudletId=new ArrayList<String>(); static List<String> cloudletLength=new ArrayList<String>(); private class MyHandle extends DefaultHandler{ String qqName=null; String temp=null; @Override public void startDocument() throws SAXException { // TODO Auto-generated method stub super.startDocument(); System.out.println("开始解析!"); } @Override public void endDocument() throws SAXException { // TODO Auto-generated method stub super.endDocument(); System.out.println("解析结束!"); } @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { // TODO Auto-generated method stub super.startElement(uri, localName, qName, attributes); if(qName.equals("Vm_ID")) qqName=qName; if(qName.equals("Vm_mips")) qqName=qName; if(qName.equals("Cloudlet_ID")) qqName=qName; if(qName.equals("Cloudlet_length")) qqName=qName; } @Override public void endElement(String uri, String localName, String qName) throws SAXException { // TODO Auto-generated method stub super.endElement(uri, localName, qName); if(qName=="Vm_ID"){ vmId.add(temp); System.out.println(qName+":"+temp); } if(qName=="Vm_mips"){ vmMips.add(temp); System.out.println(qName+":"+temp); } if(qName=="Cloudlet_ID"){ cloudletId.add(temp); System.out.println(qName+":"+temp); } if(qName=="Cloudlet_length"){ cloudletLength.add(temp); System.out.println(qName+":"+temp); } } @Override public void characters(char[] ch, int start, int length) throws SAXException { // TODO Auto-generated method stub super.characters(ch, start, length); //System.out.println(new String(ch,start,length)); if(qqName=="Vm_ID") temp=new String(ch,start,length); if(qqName=="Vm_mips") temp=new String(ch,start,length); if(qqName=="Cloudlet_ID") temp=new String(ch,start,length); if(qqName=="Cloudlet_length") temp=new String(ch,start,length); } } public void parseXML(String fileName) { try { SAXParserFactory saxfac=SAXParserFactory.newInstance(); SAXParser parse=saxfac.newSAXParser(); InputStream in=new FileInputStream(fileName); parse.parse(in, new MyHandle()); } catch (Exception e) { // TODO: handle exception e.printStackTrace(); } } public static void main(String[] args) { new SAXParseUtil().parseXML("/Users/apple/documents/configuration.xml"); System.out.println(); System.out.print("vmId:"); for(String s:vmId)System.out.print(s+" "); System.out.println(); System.out.print("vmMips:"); for(String s:vmMips)System.out.print(s+" "); System.out.println(); System.out.print("cloudletId:"); for(String s:cloudletId)System.out.print(s+" "); System.out.println(); System.out.print("cloudletLength:"); for(String s:cloudletLength)System.out.print(s+" "); System.out.println(); System.out.println(); System.out.println("The size of vmId is:"+vmId.size()+"\n"+"The size of vmMips is:"+vmMips.size()+"\n"+ "The size of CloudletId is:"+cloudletId.size()+"\n"+"The size of " + "CloudletLength is:"+cloudletLength.size()); }}
数据的保存要放在endElement()方法中,要不然,解析器会一直解析导致保存的数据失真(我测试过,里面会保存很多的空格)
下面是程序运行的结果:
开始解析!
Vm_ID:0
Vm_mips:278
Vm_ID:1
Vm_mips:289
Vm_ID:2
Vm_mips:132
Vm_ID:3
Vm_mips:209
Vm_ID:4
Vm_mips:286
Cloudlet_ID:0
Cloudlet_length:19365
Cloudlet_ID:1
Cloudlet_length:49809
Cloudlet_ID:2
Cloudlet_length:30218
Cloudlet_ID:3
Cloudlet_length:44157
Cloudlet_ID:4
Cloudlet_length:16754
Cloudlet_ID:5
Cloudlet_length:18336
Cloudlet_ID:6
Cloudlet_length:20045
Cloudlet_ID:7
Cloudlet_length:31493
Cloudlet_ID:8
Cloudlet_length:30727
Cloudlet_ID:9
Cloudlet_length:31017
解析结束!
vmId:0 1 2 3 4
vmMips:278 289 132 209 286
cloudletId:0 1 2 3 4 5 6 7 8 9
cloudletLength:19365 49809 30218 44157 16754 18336 20045 31493 30727 31017
The size of vmId is:5
The size of vmMips is:5
The size of CloudletId is:10
The size of CloudletLength is:10
SAX解析的优点的简单明了,但是,在解析的过程中,它需要把整个的xml结构树都放在内存中,比较耗费内存。还有一种比较好的基于事件的实时解析方法,我在这里就不过多的赘述,我给出一个解析Vm_ID的简单示例:
- Java code
public String[] getVmsID(String fileName)throws XMLStreamException, IOException{ XMLInputFactory xmlif=XMLInputFactory.newInstance(); XMLEventReader xmler=xmlif.createXMLEventReader(new FileReader(new File(fileName))); XMLEvent event; int i=0; String VmsName []=new String[getVmsNumber(fileName)+1]; while(xmler.hasNext()){ event=xmler.nextEvent(); if(event.isCharacters()){ //如果解析字段为字符,则用数组将其保存 VmsName[i]=event.asCharacters().getData(); } else if(event.isEndElement()){ EndElement ee=event.asEndElement(); if(ee.getName().getLocalPart().equals("Vm_ID")){ //如果结束标签为Vm_ID则进入下一次循环 i++; continue; } } } return VmsName; }[解决办法]
JDK 中 XML 基本上有三种方式:DOM、SAX 和 StAX
SAX 是 push 模式
StAX 是 pull 模式
SAX 只能读,而 StAX 既能读又能写,处理速度比 SAX 快,呵呵。
StAX 原本是 Java EE 中的,从 JDK 6 加到 Java SE 中去了,类库位于 javax.xml.stream.* 有兴趣的话可以去看看。