j-users@xerces.apache.org
[Top] [All Lists]

Re: How to disable attribute normalization

Subject: Re: How to disable attribute normalization
From: "Daniel Yokomizo"
Date: Sun, 30 Mar 2008 23:05:30 +0000
Hi Michael,

On Sun, Mar 30, 2008 at 5:35 PM, Michael Glavassevich
<mrglavas@xxxxxxxxxx> wrote:
> Hi Daniel,
>
>  "Daniel Yokomizo" <daniel.yokomizo@xxxxxxxxx> wrote on 03/29/2008 04:45:24
>  PM:
>
>
>  > Hi,
>  >
>  >     I'm parsing (disabling validation) a document that declared a DTD
>  > but I would like to get the raw attribute values instead of the
>  > normalized values. In particular I need to keep entity references as
>  > they were written. I came up with this FAQ
>  > (http://xerces.apache.org/xerces-j/faq-write.html#faq-7) that seems to
>  > declare that it is impossible (i.e. attribute normalization happens if
>  > there's a DTD present) and I found the XMLScanner class that, via the
>  > method scanAttributeValue, does the attribute normalization. I noticed
>  > that we have a getNonNormalizedValue() method but the SAX parser layer
>  > uses AttributesProxy which hides the getNonNormalizedValue() method.
>
>  That method is part of XNI [1]. If you really need the non-normalized text
>  you'd need to change your application so that it uses XNI directly (rather
>  than SAX).

Thanks for your help (again). I was hoping to use the SAX interface
and not depend explicitly on Xerces, because I'm developing a library
which will be (hopefully) independent of the SAX implementation.
There's a hack I can do to "trick" Xerces, which will work with any
parser too, and I'll probably do it (essentially I'll decorate the
reader I'm giving to the parser transforming every & into &amp; but
after it's resolved by the parser it'll become & again, so &amp;
becomes &amp;amp; which the parser transform into &amp;.

>  >     Is there any way to configure Xerces to not normalize attribute
>  > values even when the DTD is declared?
>
>  Whether your document has a DTD or not is irrelevant. The FAQ (on the
>  Xerces 1.x site) you read is wrong. Normalization [2] is required for every
>  attribute value. You cannot disable this behaviour.
>
>  >     Best regards,
>  >     Daniel Yokomizo
>  >
>
>  Thanks.
>
>  [1] http://xerces.apache.org/xerces2-j/javadocs/xni/index.html
>  [2] http://www.w3.org/TR/2006/REC-xml-20060816/#AVNormalize
>
>  Michael Glavassevich
>  XML Parser Development
>  IBM Toronto Lab
>  E-mail: mrglavas@xxxxxxxxxx
>  E-mail: mrglavas@xxxxxxxxxx

Best regards,
Daniel Yokomizo.

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: j-users-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>