SAX 解析 XML——JAVA

<?xml version='1.0' encoding='UTF-8'?>
<samples>
<search_results><query id="7015">the raven</query><engine status="OK" timestamp="2014-05-15 13:43:06" name="CiteSeerX" id="FW14-e004"/><snippets><snippet id="FW14-e004-7015-01"><link cache="FW14-topics-docs/e004/7015_01.html" timestamp="2014-05-15 13:43:07">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.51.7167&amp;rank=1</link><title>The Raven System</title><description>The Raven System

by Donald Acton,Terry Coatta,Gerald Neufeld,1992

"... The Raven System 1 Donald Acton,Terry Coatta and Gerald Neufeld Technical Report TR 92-15 August ..."

Abstract \- Cited by 7 (4 self) \- Add to MetaCart

This report describes the distributed object-oriented system,&lt;em&gt;Raven&lt;/em&gt;. &lt;em&gt;Raven&lt;/em&gt; is both a distributed</description></snippet><snippet id="FW14-e004-7015-02"><link cache="FW14-topics-docs/e004/7015_02.html" timestamp="2014-05-15 13:43:08">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.35.4932&amp;rank=2</link><title>The Raven System</title><description>The Raven System

by Donald Acton Terry,Terry Coatta and Gerald Neufeld Technical Report TR 92-15 August ..."

Abstract \- Add to MetaCart

This report describes the distributed object-oriented system,&lt;em&gt;Raven&lt;/em&gt;. &lt;em&gt;Raven&lt;/em&gt; is both a distributed</description></snippet><snippet id="FW14-e004-7015-03"><link cache="FW14-topics-docs/e004/7015_03.html" timestamp="2014-05-15 13:43:08">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.276.8949&amp;rank=3</link><title>book In the Company of Crows and Ravens</title><description>book In the Company of Crows and Ravens

by Marzluff Jm,John Marzluff,Tony Angell,Quote Reverend Henry Ward Beecher’s

"... Book Reviews/Science in the Media Living with the Trickster: Crows,Ravens,and Human Culture ..."

Abstract \- Add to MetaCart

Few groups of wild animals inspire such extreme opinions in the humans who observe them than</description></snippet><snippet id="FW14-e004-7015-04"><link cache="FW14-topics-docs/e004/7015_04.html" timestamp="2014-05-15 13:43:09">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.395.6124&amp;rank=4</link><title>Design by Raven Design</title><description>Design by Raven Design

by Third-level Programmes,Edited Irene Sheridan,Dr Margaret Linehan

"... by Raven Design Printed by City Print Ltd © CIT Press 2011 ISBN 978-1-906953-07-2 The toolkit includes ..."

Abstract \- Add to MetaCart

Work Placement in Third-Level Programmes is one of a number of significant outputs of the Roadmap for Employment–Academic Partnerships (REAP) Project. This report draws together for the first time perspectives on placement from all of the key stakeholders. In addition to providing a unique overview of the placement experience the project team have used the information gathered to develop a useful,transferable toolkit for placement. Publication Information Although every effort has been made to ensure the accuracy of the material contained in this publication,complete accuracy cannot be guaranteed. All or part of this publication may be reproduced without further</description></snippet><snippet id="FW14-e004-7015-05"><link cache="FW14-topics-docs/e004/7015_05.html" timestamp="2014-05-15 13:43:09">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.149.3392&amp;rank=5</link><title>Basic objects in natural categories</title><description>Basic objects in natural categories

by Eleanor Rosch,Carolyn B. Mervis,Wayne D. Gray,David M,Penny Boyes-braem \- Cognitive Psychology,1976

"...,&amp; Raven,1966); and finally,the location of natural groupings at a particular level of abstraction ..."

Abstract \- Cited by 487 (1 self) \- Add to MetaCart

Categorizations which humans make of the concrete world are not arbitrary but highly determined. In taxonomies of concrete objects,there is one level of abstraction at which the most basic category cuts are made. Basic categories are those which carry the most information,possess the highest category cue validity,and are,thus,the most differentiated from one another. The four experiments of Part I define basic objects by demonstrating that in taxonomies of common concrete nouns in English based on class inclusion,basic objects are the most inclusive categories whose members: (a) possess significant numbers of attributes in common,(b) have motor programs which are similar to one another,(c) have similar shapes,and (d) can be identified from averaged shapes of members of the class. The eight experiments of Part II explore implications of the structure of categories. Basic objects are shown to be the most inclusive categories for which a concrete image of the category as a whole can be formed,to be the first categorizations made during perception of the environment,to be the earliest categories sorted and earliest named by children,and to be the categories</description></snippet><snippet id="FW14-e004-7015-06"><link cache="FW14-topics-docs/e004/7015_06.html" timestamp="2014-05-15 13:43:10">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.300.2871&amp;rank=6</link><title>A Bayesian Model of Rule Induction in Raven’s Progressive Matrices</title><description>A Bayesian Model of Rule Induction in Raven’s Progressive Matrices

by Daniel R. Little,Stephan Lewandowsky,Crawley Wa,Thomas L. Griffiths (tom

"... A Bayesian Model of Rule Induction in Raven’s Progressive Matrices Daniel R. Little (daniel ..."

Abstract \- Cited by 1 (0 self) \- Add to MetaCart

&lt;em&gt;Raven’s&lt;/em&gt; Progressive Matrices (&lt;em&gt;Raven&lt;/em&gt;,&lt;em&gt;Raven&lt;/em&gt;,&amp; Court,1998) is one of the most prevalent assays</description></snippet><snippet id="FW14-e004-7015-07"><link cache="FW14-topics-docs/e004/7015_07.html" timestamp="2014-05-15 13:43:11">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.225.3291&amp;rank=7</link><title>A Structure-Mapping Model of Raven’s Progressive Matrices</title><description>A Structure-Mapping Model of Raven’s Progressive Matrices

by Andrew Lovett,Kenneth Forbus,Jeffrey Usher

"... A Structure-Mapping Model of Raven’s Progressive Matrices Andrew Lovett (andrew ..."

Abstract \- Cited by 5 (2 self) \- Add to MetaCart

We present a computational model for solving &lt;em&gt;Raven’s&lt;/em&gt; Progressive Matrices. This model combines</description></snippet><snippet id="FW14-e004-7015-08"><link cache="FW14-topics-docs/e004/7015_08.html" timestamp="2014-05-15 13:43:12">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.231.3664&amp;rank=8</link><title>RAVEN – Active Learning of Link Specifications</title><description>RAVEN – Active Learning of Link Specifications

by Axel-cyrille Ngonga Ngomo,Jens Lehmann,Sören Auer,Konrad Höffner

"... RAVEN – Active Learning of Link Specifications Axel-Cyrille Ngonga Ngomo,Sören Auer ..."

Abstract \- Cited by 7 (1 self) \- Add to MetaCart

for a link discovery problem is a tedious task that must still be carried out manually. We present &lt;em&gt;RAVEN&lt;/em&gt;</description></snippet><snippet id="FW14-e004-7015-09"><link cache="FW14-topics-docs/e004/7015_09.html" timestamp="2014-05-15 13:43:13">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.103.8446&amp;rank=9</link><title>RAVEN: Real-Time Analyzing and Verification Environment</title><description>RAVEN: Real-Time Analyzing and Verification Environment

by Jürgen Ruf \- Journal on Universal Computer Science (J.UCS),Springer,2001

"... RAVEN: Real-Time Analyzing and Verification Environment Jürgen Ruf (University of Tübingen ..."

Abstract \- Cited by 16 (3 self) \- Add to MetaCart

Abstract: In this paper we present the real-time verification and analysis tool &lt;em&gt;RAVEN&lt;/em&gt;. &lt;em&gt;RAVEN&lt;/em&gt;</description></snippet><snippet id="FW14-e004-7015-10"><link cache="FW14-topics-docs/e004/7015_10.html" timestamp="2014-05-15 13:43:13">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.39.9827&amp;rank=10</link><title>The Advantages of Evolutionary Computation</title><description>The Advantages of Evolutionary Computation

by David B. Fogel,1997

"... variants. Others (Atmar,1979; Raven and Johnson,1986,pp. 400-401) have suggested that it is more ..."

Abstract \- Cited by 396 (5 self) \- Add to MetaCart

Evolutionary computation is becoming common in the solution of difficult,realworld problems in industry,medicine,and defense. This paper reviews some of the practical advantages to using evolutionary algorithms as compared with classic methods of optimization or artificial intelligence. Specific advantages include the flexibility of the procedures,as well as the ability to self-adapt the search for optimum solutions on the fly. As desktop computers increase in speed,the application of evolutionary algorithms will become routine. 1 Introduction Darwinian evolution is intrinsically a robust search and optimization mechanism. Evolved biota demonstrate optimized complex behavior at every level: the cell,the organ,the individual,and the population. The problems that biological species have solved are typified by chaos,chance,temporality,and nonlinear interactivities. These are also characteristics of problems that have proved to be especially intractable to classic methods of o...</description></snippet></snippets></search_results></samples>
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
import java.util.Map.Entry;
import java.util.Vector;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.lucene.document.*;



public class SAXXMLDocument extends DefaultHandler{

    private StringBuilder elementBuffer = new StringBuilder();
    private Map<String,String> attributeMap = new HashMap<String,String>();

    private HashMap<String,String> vertical = new HashMap<String,String>();
    private HashMap<String,String> urls = new HashMap<String,String>();

    private Vector<Document> docs;
    private Document doc;

    public Document getDocument(InputStream is) throws Exception {
        SAXParserFactory spf = SAXParserFactory.newInstance();

        try {
            SAXParser parser = spf.newSAXParser();
            parser.parse(is,this);
        } catch (Exception e) {
            throw new Exception("Cannot parse XML document",e);
        }
        return doc;
    }

    public void startDocument() {
        //doc = new Document();
    }

    private String queryid,querytext,engineid,enginename,verticalid,snippetid,engineurl;

    public void startElement(String uri,String localName,String qName,Attributes atts) {
        elementBuffer.setLength(0);
        attributeMap.clear();
        int numAtts = atts.getLength();
        if(numAtts > 0) {
            for(int i=0; i<numAtts; i++) {
                attributeMap.put(atts.getQName(i),atts.getValue(i));
            }
        }
        if(qName.equals("snippet")) {
            //doc = new Document();
            snippetid = attributeMap.get("id");
        }
    }

    public void characters(char[] text,int start,int length) {
        elementBuffer.append(text,start,length);
    }



    public void endElement(String uri,String qName) {
        if(qName.equals("query")) {
            /* for(Entry<String,String> attribute : attributeMap.entrySet()) { String attName = attribute.getKey(); } */
            queryid = attributeMap.get("id");
            querytext = elementBuffer.toString();


            System.out.println(attributeMap.get("id"));
            System.out.println(elementBuffer.toString());
        }
        else if(qName.equals("engine")) {
            engineid = attributeMap.get("id");
            engineurl = urls.get(engineid);
            enginename = attributeMap.get("Name");
            verticalid = vertical.get(engineid);

            System.out.println(attributeMap.get("id"));
            System.out.println(elementBuffer.toString());
            System.out.println("v: "+ engineid + vertical.get(engineid));

        }
        else if(qName.equals("link")) {
            System.out.println("link: " + elementBuffer.toString());
        }
        else if(qName.equals("title")) {
            System.out.println("title: " + elementBuffer.toString());
        }
        else if(qName.equals("description")) {
            System.out.println("description: " + elementBuffer.toString());
        }
        else if(qName.equals("snippet")) {  
            //docs.add(doc);
            //文件结束
            System.out.println("________________________________________________");
            //System.out.println("snippet" + elementBuffer.toString());
            System.out.println("________________________________________________");
        }

    }

    public static void main(String[] args) throws FileNotFoundException,Exception {

        // TODO Auto-generated method stub 
        SAXXMLDocument handler = new SAXXMLDocument();
        handler.initResourceInfo();
        String input_file = "E:\\FW14-topics-search\\e004\\7015.xml";
        Document doc = handler.getDocument(new FileInputStream(new File(input_file)));
        System.out.println(doc);

    }

    public void initResourceInfo() throws FileNotFoundException {

        Scanner cin = new Scanner(new File("E:\\resources_fedweb2014.txt"));
        cin.nextLine();
        while(cin.hasNext()) {
            String line = cin.nextLine();
            String[] s = line.split("\t");
            String engineid = s[0];
            String urlid = s[2];
            String engineVertical = s[3];
            //System.out.println(engineid + " # " + engineVertical);
            vertical.put(engineid,engineVertical);
            urls.put(engineid,urlid);
        }
    }
}

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


php输出xml格式字符串
J2ME Mobile 3D入门教程系列文章之一
XML轻松学习手册
XML入门的常见问题(一)
XML入门的常见问题(三)
XML轻松学习手册(2)XML概念
xml文件介绍及使用
xml编程(一)-xml语法
XML文件结构和基本语法
第2章 包装类
XML入门的常见问题(二)
Java对象的强、软、弱和虚引用
JS解析XML文件和XML字符串详解
java中枚举的详细使用介绍
了解Xml格式
XML入门的常见问题(四)
深入SQLite多线程的使用总结详解
PlayFramework完整实现一个APP(一)
XML和YAML的使用方法
XML轻松学习总节篇