j-users@xerces.apache.org
[Top] [All Lists]

Re: Accessing xml and doctype declaration via SAX

Subject: Re: Accessing xml and doctype declaration via SAX
From: Michael Glavassevich
Date: Thu, 13 Mar 2008 21:27:37 -0400
Hi Daniel,

This is not a bug. The documentation for setDocumentLocator() [1] says:
"Note that the locator will return correct information only during the
invocation SAX event callbacks after startDocument returns and before
endDocument is called. The application should not attempt to use it at any
other time." You should never call methods on the Locator within
startDocument() or endDocument(). Try calling them later for instance in
the startElement() call for the root element:

    public void startElement(String uri, String local, String raw,
                             Attributes attrs) throws SAXException {
        // Root Element
        if (elementDepth++ == 0) {
            if (locator != null) {
                if (locator instanceof Locator2) {
                    Locator2 loc = (Locator2) locator;
                    loc.getXMLVersion();
                    loc.getEncoding();
                }
            }
            ...
        }
        ...
    }

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ContentHandler.html#setDocumentLocator(org.xml.sax.Locator)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@xxxxxxxxxx
E-mail: mrglavas@xxxxxxxxxx

"Daniel Yokomizo" <daniel.yokomizo@xxxxxxxxx> wrote on 03/13/2008 08:54:24
PM:

> On Thu, Mar 13, 2008 at 6:51 PM, Stanimir Stamenkov
> <s7an10@xxxxxxxxxxxx> wrote:
> > Wed, 12 Mar 2008 16:25:59 -0300, /Daniel Yokomizo/:
> >
> >
> >  > The only issue I still have
> >  > is getting the xml declaration info (e.g. version, encoding) but
right
> >  > now I can just ignore it.
> >
> >  That info you should be able to obtain through the Locator2 [1]
> >  interface.  For example, in your ContentHandler implementation:
> >
> >      Locator locator;
> >
> >      public void setDocumentLocator(Locator locator) {
> >          this.locator = locator;
> >      }
> >
> >      public void startDocument() {
> >          if (locator instanceof Locator2) {
> >               Locator2 loc = (Locator2) locator;
> >               loc.getXMLVersion();
> >               loc.getEncoding();
> >          }
> >      }
> >
> >  [1]
> >  http://xerces.apache.org/xerces2-
> j/javadocs/api/org/xml/sax/ext/Locator2.html
> >
> >  --
> >  Stanimir
>
> Thank you, that solved my problem. I got into some weird behavior,
> which I think it's a bug but I'm not certain. I created the
> InputSource using a Reader, didn't set the encoding property of the
> InputSource and tried to parse. Even if the document has a xml
> declaration with explicit encoding, the locator.getEncoding() returned
> null. Creating the InputSource with a InputStream worked, because the
> parser tried to discover the encoding based on the first bytes of the
> stream. I think this is a bug because the document has the encoding
> information and there are no other places with this information
> (either explicit, like in the InputSource, or implicit like in the
> InputStream case) that could possibly conflict, so the locator should
> have this info. Should I open a bug report (assuming that this isn't a
> known bug, I seached the JIRA but I couldn't find a thing)? Either way
> I changed my uses to InputStream and everything worked ok.
>
> Best regards,
> Daniel Yokomizo.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xxxxxxxxxxxxxxxxx
> For additional commands, e-mail: j-users-help@xxxxxxxxxxxxxxxxx


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: j-users-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>