j-dev@xerces.apache.org
[Top] [All Lists]

[jira] Commented: (XERCESJ-1066) Restriction+choice+substitutionGroup er

Subject: [jira] Commented: (XERCESJ-1066) Restriction+choice+substitutionGroup error
From: "Sandy Gao (JIRA)"
Date: Thu, 28 Sep 2006 10:44:57 -0700 PDT
    [ 
http://issues.apache.org/jira/browse/XERCESJ-1066?page=comments#action_12438500 
] 
            
Sandy Gao commented on XERCESJ-1066:
------------------------------------

[Problem analysis]

First of all, this bug is *not* a duplicate of 1032. After applying the patch 
provided in 1032, the 1032 test schema passes, but the schema attached to 1066 
still fails.

There are 2 problems in Xerces' current implementation. The first one is, as 
Lucian correctly pointed out in 1032, that the order of sub-group-expansion is 
not specified (more a problem in the spec, as I mentioned in the first comment 
to this bug). 

The second problem (what's really causing 1066) is that "pointless particle 
removal" happens *before* sub-group-expansion, as opposed to *after*, as 
specified in the spec.

To be more specific about point 2 (and to correct what I said in the first 
comment). For the schema attached above, after removal and expansion:
Base = ((X|X1|X2|X3)|Y)*
Restriction = (X1|X2|Y)*

Note that Base has nested choice groups and Restriction doesn't. Now when the 
"RecurseLax" rule is invoked, the 3 particles in Restriction need to map to 2 
particles in the Base. Never possible, hence rejected.

So to me, the right fix needs to contain 2 parts:
1. expand sub-groups *before* pointless particle removal
2. disregard ordering for choices resulted from sub-group expansion.

[Patch analysis - 1032]

1032 patch sorts particles resulted from sub-group-expansion. This partially 
fixes the ordering problem, but not completely. It works when both base and 
restriction use sub-groups. After both are sorted, the "complete mapping" rule 
can be applied. But it doesn't work when one of the types uses sub-group and 
the other doesn't, because the type that doesn't use sub-group may have 
elements in arbitrary order.

Knowing the sorting strategy may help schema designers: when writing choices, 
try to sort them. This may or may not be appropriate for certain schema 
authors/designs, and may or may not work for different languages.

Overall, 1032 is a safe fix, it improves things, though doesn't fix the problem 
entirely. I'm willing to apply it unless a better/more complete solution is 
found.

[Patch analysis - 1066]

On the surface, Ignacio's patch works perfectly: both schemas from 1032 and 
1066 are now accepted. But careful looking at the details reveals some rather 
serious problems.

For the 1032 schema, this patch works because both sub-groups are turned into 
this special MODELGROUP_SUBSTITUTIONGROUP and are handled specially (without 
worrying about the order).

For the 1066 schema, it works because it treats X1 (the element) as restricting 
the sub-group (X|X1|X2|X3) and X2 as restriction (X|X1|X2|X3) again. I would 
consider this as "works by luck". :-)

The reason it's "luck" is because there are some schemas (valid and invalid) 
that this patch will give the wrong answer.

Case 1:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema";
            targetNamespace="urn:restrict" xmlns="urn:restrict"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

  <xsd:element name="X"/>
  <xsd:element name="X1" substitutionGroup="X"/>

  <xsd:complexType name="base">
    <xsd:sequence>
      <xsd:element ref="X" minOccurs="0"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="restriction">
    <xsd:complexContent>
      <xsd:restriction base="base">
        <xsd:choice minOccurs="0">
          <xsd:element ref="X1"/>
        </xsd:choice>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

</xsd:schema>

After expansion and removal,
Base = (X|X1|)?
Restriction = (X1|)?

(The last '|' is just to indicate it's a choice.)

The spec is clear that this is a valid restriction (RecurseLax). But 1066 patch 
would reject it, because now dType=choice and bType=subgroup, which is not 
handled by the big switch.

Case 2:

Similar to Case 1, but change the <choice> in "restriction" to <sequence>, it 
should still be valid (MapAndSum). But again, 1066 patch rejects it, for the 
same reason.

Case 3:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema";
            targetNamespace="urn:restrict" xmlns="urn:restrict"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

  <xsd:element name="X"/>
  <xsd:element name="Y"/>
  <xsd:element name="Z"/>

  <xsd:complexType name="base">
    <xsd:choice>
      <xsd:choice minOccurs="0">
        <xsd:element ref="X"/>
        <xsd:element ref="Y"/>
      </xsd:choice>
      <xsd:element ref="Z"/>
    </xsd:choice>
  </xsd:complexType>

  <xsd:complexType name="restriction">
    <xsd:complexContent>
      <xsd:restriction base="base">
        <xsd:choice>
          <xsd:choice>
            <xsd:element ref="X"/>
            <xsd:element ref="Y"/>
          </xsd:choice>
          <xsd:element ref="Z"/>
        </xsd:choice>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>

</xsd:schema>

However valid this looks like (I just changed something from optional to 
mandatory), this is actually invalid in schema 1.0. (Note that schema 1.1 [1] 
plans to replace the entire Particle Valid (Restriction) rule with a supposedly 
simple statement: it's a restriction as long as it accepts a subset.)

Base = ((X|Y|)?|Z)
Restriction = (X|Y|Z)

"X" and "Y" in restriction can not both map to (X|Y) in base, because 
RecurseLax requires an *order-preserving* mapping. So this should be invalid, 
but 1066 patch says it's valid. What causes this to fail to produce the correct 
result is actually the same as what was introduced to make the original 1066 
test case happy. Namely the change in the method "checkRecurseLax" to reuse the 
base particle.

Though not working as a charm, this patch actually involves some creative 
thinking and is somewhat similar to some of my thoughts back in 2005 when 1066 
was first opened (see below). Thanks for the effort and do keep trying. I 
sincerely hope that you beat me in finding the *perfect* solution. :-)

[My Attempts]

My first attempt in 2005 was similar to your approaches in different aspects. I 
mark sub-group choices as special, and have a special method to handle the 
RecurseLax case when either choice came from a sub-group. This does a better 
job than the 1032 patch, because the special method discards order entirely, 
instead of using a specific order.

This attempt would have fixed 1032, but my focus was 1066 and it didn't work 
for 1066, because of the reason I mentioned earlier: expansion happened after 
removal.

My second attempt was to move the expansion to happen before removal, but 
encounter a big problem where expansion and removal don't seem to work together 
happily. Consider a choice

(A|B|C|D)

where B has X in its sub-group and D has Y in its sub-group. After 
expansion/removal, it becomes

(A|B|X|C|D|Y)

Now we have to remember that the order between B and X doesn't matter, neither 
does that between D and Y. But the order does matter between A and B/X and so 
on.

This is where I stopped (it seemed too difficult to solve when no one was 
pressing :p).


Ouch, it takes almost an entire day to analyze the problems (again), look at 
the patches, and re-gather my thoughts from last year, and of course, write 
this long comment. I'm glad that I'm writing things down this time so that I 
don't have to go through the same process again in the future. I will 
definitely give it some more thoughts. The least we can do is to commit 
Lucian's patch (or my attempt 1). Or to make expansion happen before removal + 
Lucian's patch. Though not complete, the latter should make both test cases 
from 1032 and 1066 happy.

[1] http://www.w3.org/TR/xmlschema11-1/#cos-content-act-restrict

> Restriction+choice+substitutionGroup error
> ------------------------------------------
>
>                 Key: XERCESJ-1066
>                 URL: http://issues.apache.org/jira/browse/XERCESJ-1066
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema Structures
>    Affects Versions: 2.6.2
>         Environment: N/A
>            Reporter: Martin Thomson
>         Assigned To: Sandy Gao
>         Attachments: patch1.txt, patch2.txt
>
>
> When using a substitution group head in a choice, the head of the 
> substitition group is not correctly treated as a choice.
> Given a choice of X and Y where X is the head of a group with the members X1, 
> X2 and X3, the following SHOULD be true:
> Base = (X|Y)*
> ...according to clause 2.1 of Schema Component Constraint: Particle Valid 
> (Restriction) <http://www.w3.org/TR/xmlschema-1/#cos-particle-restrict> this 
> should be interpreted as:
> Base = ((X|X1|X2|X3)|Y)*
> Therefore the following should be a valid restriction, but Xerces does not 
> allow it:
> Restriction = ((X1|X2)|Y)*
> I am aware that some simplification of the choices is required by clause 2.2 
> of the above section, but this should not have the effect that it is.
> The following schema document demonstrates this:
> -----------------------------------------
> <?xml version="1.0"?>
> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema";
>             targetNamespace="urn:restrict" xmlns="urn:restrict"
>             elementFormDefault="qualified"
>             attributeFormDefault="unqualified">
>   <xsd:complexType name="base">
>     <xsd:complexContent>
>       <xsd:restriction base="xsd:anyType">
>         <xsd:choice minOccurs="0" maxOccurs="unbounded">
>           <xsd:element ref="X"/>
>           <xsd:element ref="Y"/>
>         </xsd:choice>
>       </xsd:restriction>
>     </xsd:complexContent>
>   </xsd:complexType>
>   <xsd:element name="X"/>
>   <xsd:element name="Y"/>
>   <xsd:complexType name="restriction">
>     <xsd:complexContent>
>       <xsd:restriction base="base">
>         <xsd:choice minOccurs="0" maxOccurs="unbounded">
>           <xsd:choice>
>             <xsd:element ref="X1"/>
>             <xsd:element ref="X2"/>
>           </xsd:choice>
>           <xsd:element ref="Y"/>
>         </xsd:choice>
>       </xsd:restriction>
>     </xsd:complexContent>
>   </xsd:complexType>
>   <xsd:element name="X1" substitutionGroup="X"/>
>   <xsd:element name="X2" substitutionGroup="X"/>
> </xsd:schema>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: j-dev-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>