Japan - JISC

Japan disapproves the proposed DIS 29500 (OOXML), but may change its vote to "approval" if the following comments are resolved satisfactorily.

 

DIS29500 appears to be competing and incompatible with OASIS sourced ISO/IEC 26300 "Open Document Format for Office Applications."

 

In order to secure interoperability between the International Standards, Japan believes that JTC 1 should play a leading role in the future in collaboration with Ecma and OASIS. Thus Japan would like to confirm Ecma and its core members' collaborative stance with JTC 1 for the maintenance and the future evolution of DIS29500 upon its approval under the JTC 1 process. In particular, JTC1 rather than Ecma should maintain OOXML.

 

 

Apart from MSXML, we find no other validators can handle the W3C XML Schema version of the OOXML schemas. One reason appears to be the import feature of W3C XML Schema. MSXML allows different schema modules to be imported for the same namespace depending on the context, while Xerces-J and MSV do not.

 

Reorganize the schemas so that they can be handled by Xerces-J and MSV at least.

 

 

The "pack" URI scheme is not endorsed by IETF yet.

 

Drop the "pack" URI scheme unless it is endorsed by IETF.

 

 

Part names are restricted to US-ASCII. Although Annex A describes the conversion from Unicode strings to US-ASCII part names, it is not clear which program provides this conversion. The OPC engine rather than user programs should support the conversion.

 

Allow Non-ASCII characters (or IRIs) as part names. Note that non-ASCII characters can be converted to %HH before constructing logical item names or physical package item names.

 

 

Custom schemas can be written only in W3C XML Schema, which is merely one of the several schema languages. SC34 has standardized RELAX NG, Schematron, and NVDL among others.

 

Allow schema languages other than W3C XML Schema for the validation of Custom XML and Structured Document Tags.

 

 

Behaviours of application programs are often described using “shall” or “must”. However, application conformance is defined to be purely syntactical in Section 2.4 of Part 1.

From the 6367 occurrences of the word "shall" in Part 4, here we quote some examples of inappropriate use.

- Line 2, Page 27, Part 4, "This background shall be displayed on all 3 pages of the document, behind all other document content".

- The third para in the description for "themeColor", Page 28, Part 4: "its value shall be ignored"

- Line 34, Page 34, Part 4: "This element specifies whether the right indent shall be automatically adjusted".

 

Use “shall” only for describing constraints on documents or data. Do not use “shall” for describing behaviours of application programs. It might be necessary to use “shall” for describing the behaviours of OPC engines, which should not be confused with application programs.

 

 

Often “must” is used as an alternative for “shall”. For example, in Part 4 Clause 2.15.3.6, "To faithfully replicate this behaviour, applications must imitate the behaviour of that application…"

 

Do not use this word. See Annex H of ISO/IEC directives Part 2. Moreover, do not mechanically replace “must” with “shall”. Se JP6.

 

 

Sometimes "may not" is used to express a prohibition.

 

Do not use this phrase. See Annex H of ISO/IEC directives Part 2.

 

 

Mandatory specifications such as XML, namespaces, Dublin Core Element Set, and RFCs for URIs and IRIs are only shown in the bibliography.

 

They should be added in the list of normative references. Note that ISO/IEC 15836:2003 Information and documentation - The Dublin Core metadata element set is equivalent to Dublin Core Metadata Element Set V1.1.

 

 

The phrase "element type" is used differently from W3C XML 1.0. Here is an example of confusion caused by the abuse of this phrase: in Clause 8 of Part 1, an element type is a local name rather than a qualified name, and is thus different from element types in XML 1.0.

 

Do not use the phrase "element type", which is a string in XML 1.0.

 

 

"Schema" and "XML schema" are ambiguous, since W3C XML Schema is merely one of the several XML schema languages.

 

Always write "W3C XML Schema" when this particular schema language is meant. "Schema" and "XML schema" refer to any schema language such as RELAX NG.

 

 

"name of an XML element" is not defined.

 

Use "the tag name of an XML element". The phrase "tag name" has been used since W3C DOM Level 1.

 

 

"XML element attribute value" is not defined.

 

Use "XML attribute value" or "attribute value".

 

 

"XML element attribute" is not defined.

 

Use "XML attribute" or "attribute".

 

 

"XML element type name" is extremely confusing. Those who have carefully read the W3C XML 1.0 specification will think that an "XML element type name" is a tag name.

 

If this phrase references to the name of a simple datatype, use "datatype name of an XML element".

 

 

"valid" is used in an ambiguous manner. For example what does "valid" in the third paragraph of 2.5.2.6 of Part 4 mean?

 

Do not use the word "valid" when it does not mention validity against some schema. When it does mention validity, always make clear which schema is in question.

 

 

The phrase "tag type" is undefined.

 

Do not use "tag type".

 

 

Positioning of this document is not clear. Part 1 Fundamentals does not list the document on page xi. It is unclear whether the document is normative or informative.

 

Clarify the positioning and the status of this document. Section 1 Introduction says, “This white paper summarizes OpenXML.” If it is not part of DIS 29500, it should not be included in DIS 29500.

 

 

Unfortunately, there are at least three different definitions of types (XML, W3C XML Schema, and MIME) already. OOXML (vaguely) introduces yet another definition of types. The confusion is demonstrated by the example at the bottom of Page 25 of Part 1.

The relationship in question specifies "http://schemas.openxmlformats.org /officeDocument /2006/relationsihps/officeDocument"

as the "type" of document.xml. However, we already have three types (shown below) for the root element of document.xml.

  • The element type (as defined by W3C XML) of this element is "w:document".

Note: Surprisingly enough, element types are never clearly defined by the XML recommendation. Here we follow the interpretation of one of the editors of the XML recommendation and assume that element types are strings.

  • The complex type of this element is "CT_Document", which is specified in wml.xsd.
  • The content type of this part is application/vnd.openxmlformats -officedocument .wordprocessingml.document .main+xml.
 

Do not introduce yet another definition of the word “type”.

 

 

The phrase "Application Conformance" is misleading, since people will think that behaviours of applications are standardized.

 

Use "Application Syntactical Conformance" instead.

 

 

Document conformance as defined in Part 1 looks too general and has nothing specific to WordprocessingML. However, for a document to conform to WordprocessingML, it has to satisfy requirements stated in Part1 Clause 11, Part 2, and Part 4 Clause 2, at the very least. It is not clear whether requirements stated in other places also have to be satisfied.

 

Define document conformance for Word-Processing in the part defining WordprocessingML. The same change request applies to document conformance for SpreadsheetML and PresentationML.

 

 

Normative statements do not exist in those clauses although clause 7 says, “Clauses 1-5, 7 and 9-14 form a normative part of this Part.” ISO/IEC Directives Part 2, 3.8 says that normative elements are those that describe the scope of the document and that set out provisions; 3.9 says that informative elements are those that identify the document, introduce its content and explain its background, its development and its relationship with other documents.

 

Change the status of clauses 2.1 and 2.2 to informative based on ISO/IEC Directives Part 2.

 

Line 22

 

The line specifies conformance to the Unicode standard and ISO/IEC 10646. There is discrepancy between the Unicode standards required by XML 1.0 standard and the Unicode standard version 4.0 specified in Part 1 Annex A.

 

Follow the generic reference as described in W3C recommendation “Referencing the Unicode Standards and ISO/IEC 10646” (http://www.w3.org/TR/ charmod/#sec-RefUnicode) as the versions do not have to be restricted in DIS 29500, and change line 22 as follows: “The document character set shall conform to the Unicode Standard and ISO/IEC 10646, with either the UTF-8 or UTF-16 encoding form.” Please note that “-1” should be removed from ISO/IEC 10646-1 as the part number is no longer assigned for ISO/IEC 10646, and that “as required by the XML 1.0 standard” should be removed since the way that the Unicode Standard is referred to in XML 1.0 standard does not follow the W3C recommendation. Also, change lines 19-20 in Annex A as follows:

Unicode

The Unicode Consortium, The Unicode Standard, Version 5, ISBN 0321480910, as updated from time to time by the publication of new versions. (See

http://www.unicode.org/unicode/ standard/versions for the latest version and additional information on versions of the standard and of the Unicode Character Database).

ISO/IEC 10646

ISO/IEC 10646:2003, Information technology - Universal Multiple-Octet Coded Character Set (UCS), as, from time to time, amended, replaced by a new edition or expanded by the addition of new parts. (See

http://www.iso.org/iso/en/ ISOOnline.openerpage for the latest version.)

 

Lines 33-34

 

It says, "For the guidelines to be meaningful, a software application should be accompanied by publicly available documentation that describes what subset of this Standard it supports..." “publicly available” is too strict for applications to follow since it is not always possible for a company to provide publicly available documentation, which sometimes will not be allowed by business needs.

 

Remove “publicly available” from the sentence.

 

Line 13 on page 6

 

The line says, “behavior, unspecified - Behavior where this Standard imposes no requirements,” which would imply that DIS 29500 specifies application behaviours to some extent. However, clause 2.3 says, “… it is not intended to predefine application behavior” meaning that DIS 29500 does not define application behaviors. Thus, the definition is conflicting.

 

Clarify the definition. It would probably mean that behavior for which this Standard does not make a recommendation.

 

 

SGML and XML already define “document type”, but OOXML defines it differently.

 

Do not use this phrase.

 

 

WorprocessingML is scattered into Part 1, 3, and 4.

 

Publish WordProcessingML as a single part of this standard.

 

 

SpreadsheetML is scattered into Part 1, 3, and 4.

 

Publish SpreadsheetML as a single part of this standard.

 

 

PresentationML is scattered into Part 1, 3, and 4.

 

Publish PresentationML as a single part of this standard.

 

 

Since VML is restricted to legacy features of one particular company (as demonstrated by the namespace of VML and the browser names), it should not be published normatively.

 

Publish VML as a technical report, which is a part of this multi-part standard. Note that a multi-part standard in JTC1 can contain a technical report.

 

Line 15 on page 16

 

The sentence “…, where H is hexadecimal digit.]” ends with a closing bracket, but, an opening bracket cannot be found.

 

Remove the closing bracket.

 

Line 10

 

The line says, “… where h represents a hexadecimal value.” However, h is a hexadecimal digit as it is specified in 9.1.1.

 

Correct the line by changing value to digit.

 

Line 8 on page 18

 

The line says, “Certain relationships shall be explicit..” A period has been duplicated.

 

Remove the last period.

 

Line 20

 

The line says, “Clause 12 of Part 5 specifies the ability for a markup language to define additional constructs for extensibility of a specific markup language” while clause 12 of Part 5 specifies, “Preprocessing Model for Markup Consumption”. Both are inconsistent.

 

Point to the correct clause.

 

Line 9

 

The reference to the clause of Thumbnail is incorrect. Currently, §15.2.14 is specified while the correct one is §15.2.15. The other incorrect references can be found in the following pages: line 32 on page 47, line 13 on page 60, line 7 on page 94, line 25 on page 98, line 24 on page 105, line 6 on page 107, line 25 on page 108, line 26 on page 110, line 10 on page 113, line 1 on page 115, line 18 on page 116, and in the table on page 137.

 

Correct the clause number referred from 15.2.14 to 15.2.15.

 

 

XSLT transformation is applied only on save ("an XSL Transformation which might be applied on" in 11.9 of Part 1). Why is it not applied on read?

 

 

 

Are arbitrary binary data allowed as custom property parts? If so, what is the content type fixed?

 

Allow any content type.

 

 

The section does discuss the size of Thumbnails. It lacks interoperability with ODF applications. In ISO/IEC 26300 17.6 says, “The required size for the thumbnails is 128x128 pixel.”

 

Clarify what size of thumbnails should be.

 

 

Since OPC is technically independent from WordprocessingML, PresentationML, SpreadsheetML, VML, DrawingML, and Shared MLs, it should become an independent standard.

 

Publish OPC as an independent standard.

 

 

Conformance to OPC Is not defined.

 

Define conformance to OPC. It probably shouldn’t be purely syntactical. Moreover, it might be necessary to separate OPC engines and application programs that rely on OPC engines.

 

 

It is not clear if restrictions in 8.1.4 (e.g., Unicode only and no DTDs) apply to only those parts defined in OPC or do they also apply to parts designed by format designers. Can users create a Shift_JIS XML document containing a DTD and incorporate it in a package?

 

If the restrictions do not apply only to those parts defined in OPC, explicitly state that format designers can design parts freely.

 

 

Assuming that part names are not restricted to US-ASCII, case-insensitive comparison does not work for non-ASCII characters.

 

Use case-sensitive comparison.

 

 

The element name "Types" is very confusing.

 

Use "ContentTypes". If this change is not possible, there should be some notes about this inappropriate name at the very least.

 

 

Why are "contentTypes" as summarized in Table 10-1 different from MIME content types?

 

If it is really different, choose a different name.

 

 

Some normative things are copied from the W3C Recommendations "XML-Signature Syntax and Processing" and even modified.

 

Do not copy or restate normative things but merely reference to them, since they may be updated in the future. There should be no modifications other than subsetting.

 

 

This informative annex cannot define conformance requirements. Moreover, it is not clear what "conformance requirements" on format designers mean.

 

Change "conformance requirements" to "guidelines", for example.

 

Line 1 on page 1

 

The usage of “part” in the clause title “Part Overview” is ambiguous whether it means Part 4 or each part of elements such as WordprocessingML, SpreadsheetML, etc.

 

Remove “part” from the clause title to read “Overview.” Change each part of the elements to subpart.

 

Line 8 and line 12 on page 27

 

Line 8 says, “… accent5 theme color” while line 12 shows the attribute value accent3, which is inconsistent. In addition, the table on page 1835 for accent3 does not include a sample of accent3 theme color. It cannot be understood what accent3 should be.

 

Correct the example in 2.2.1 on page 27 and include theme color samples in the table on page 1835.

 

Line 16 on page 1158

 

The line says, “Truncate the password to 15 characters.” If the password is less than 15 characters, what process, e.g. padding by zero, will be done?

 

Clarify the process if the password length is less than 15 characters.

 

Line 17 on page 1158

 

The line says, “Construct a new NULL-terminated string consisting of single-byte characters.” Do the single-byte characters mean UTF-8 strings which are equivalent to ASCII? Does it imply that Japanese passwords are not allowed? If the restriction exists, it should be clearly documented.

 

Clarify the meaning of “single-byte characters” and the implication.

 

crypt Algorithm Sid in the table on page 1167

 

The example says, “The cryptAlgorithmSid attribute value of 1 specifies that the SHA-1 hashing algorithm shall be used to generate a hash from the user-defined password.” However, the attribute value of 1 means MD2 and 4 means SHA-1 in the table above the example.

 

Change the attribute value from 1 to 4.

 

 

The layout caused by autoSpaceLikeWord95 (among others) is not clear.

 

Add an example illustrating the document layout, or add a note stating that this is for bug-for-bug compatibility.

 

THAILETTER on page 1505

 

The example shown for THAILETTER is not consistent with the thaiLetters example shown on page 1777.

 

Change the example in 2.16.4.3 as shown in the example on page 1777.

 

 

The section does not define how a picture is named. For example, is IRI allowed?

 

Define how a picture is named.

 

 

The section does not define how text is named while there exit only a couple of examples. For example, is IRI allowed?

 

Define how text is named.

 

Lines 28 through 30

 

The example of USERINITIALS shows USERNAME instead of USERINITIALS.

 

Correct the sample to show USERINITIALS.

 

earth1 and earth2 on page 1650

 

The images of earth1 and earth2 do not include Asia regions such as Japan, Korea, Vietnam, etc. It does not provide means for Asia users to customize those images. Also, it is not clear if those images are normative or informative. The similar comment can be addressed to the pattern images defined in 2.18.85.

 

Provide means to customize the images. Clarify the images defined in clause 2.18.4 and clause 2.18.85 are normative or informative.

 

Line 6 on page 1738

 

The line says, “This simple type’s contents must have a length of exactly 3 characters.” The hexBinary datatype defined in W3C XML schema is based on octet. One octet consists of 2 characters. “3 characters” is confusing. The similar errors can be found in the following sections: Part 4, 2.18.52, line 8 on page 1754; Part 4, 2.18.57, line 5 on page 1760; Part 4, 2.18.106, line 13 on page 1837;

 

Clarify the meaning of “character.” Change “3 characters” to “3 octets” or to “6 characters” in the specific clause 2.18.45. Make the similar corrections in the other clauses as well.

 

Line 22 on page 1747

 

The double quotation marks surrounding en-CA are different from the double quotation marks used in other example, e.g. line 2 on page 1744.

 

Correct the double quotation marks.

 

Lne2 on page 1748

 

The line says, “its contents will contain a two hexadecimal language code…” Two hexadecimal digits mean 8 bits, which allow up to 255 in decimal. The value actually allows the value above 255.

 

Clarify the meaning of hexadecimal. Change it to read “a four hexadecimal language code.”

 

aiueo in the table on page 1771

 

The example shows half-width katakana characters while the description says hiragana characters that must be distinguished from half-width katakana.

 

Replace the half-width katakana characters with hiragana characters.

 

aiueoFullWidth in the table on page 1771

 

The example shows full-width katakana characters while the description says full-width hiragana characters.

 

Replace the full-width katakana characters with full-width hiragana characters.

 

decimal Full Width, decimal Full Width2, decimal Half Width on page 1773

 

The expressions “double-byte” and “single-byte” are confusing while they are encoded in 3 bytes in UTF-8. For example, Arabic 1 used in example is encoded in 0xEFBC91 in UTF-8.

 

“double-byte” should be changed to “full-width” and “single-byte” should be changed to “half-width,” respectively, as Arabic 1 is defined as FULLWIDTH DIGIT ONE in Unicode standard.

 

aiueo, aiueo Full Width on page 1771

 

It is not clearly defined how to proceed with the numbering after the last alphabetical character.

 

Specify how to proceed with the numbering after the last alphabetical character.

 

numberInDash on page 1776

 

There are several dash characters whose shapes are similar each other in Unicode standard. For example, Unicode standard defines HYPHEN-MINUS (U+002D), HYPHEN (U+2010), NON-BREAKING HYPHEN (U+2011), FIGURE DASH (U+2012), EN-DASH (U+2013), HORIZONTAL BAR (or QUOTATION DASH U+2015), and MINUS SIGN (U+2212). It is not clear that all dash characters are allowed or there are any restrictions.

 

Clarify the Unicode code points that are allowed for numberInDash value in ST_NumberFormat.

 

Line 10 and line 21

 

Line 10 says “10 hexadecimal digits” while the example shows 20 hexadecimal digits in line 14. Line 21 says “10 characters” while the example shows 20 characters in line 14. They conflict with the hexBinary datatype defined in W3C XML schema. The similar errors can be found in the following sections: 3.18.86 on page 2930, 3.18.87 on page 2930, 5.1.12.28 line 7 on page 3700..

 

Change “10 hexadecimal digits” to “20 hexadecimal digits” in line 10, and change “10 characters” to “20 characters” in line 14, respectively. Make the similar corrections in other clauses.

 

Line 1 and line 5 on page 1809

 

Line 1 says “two hexadecimal octets” which means HHHH as one octet is represented as HH in hexadecimal. It is 4 characters while line 5 says “2 characters.” The definition is confusing.

 

Clarify the definitions. Change “two hexadecimal octets” in line 1 to “four hexadecimal digits” as other sections, e.g. 2.18.106, specify. Change “2 characters” in line 5 to “4 characters.”

 

revisions Password attribute on page 1916, workbook Password on page 1922

 

The pre-process that Unicode UTF-16 input code points are converted to an “ansi” single or double-byte code page has been specified. For example, code page 932 is used for Japanese. Converting input characters from UTF-16 to code page 932 will cause a character loss problem since code page 932 does not support level 3 and 4 Japanese characters defined in JIS X 0213. Code page 932 supports only JIS X 0208. And for the historical reasons, conversion from UTF-16 to code page 932 differs among vendors. The pre-process is harmful in interoperability. The similar problem will happen with Chinese support when UTF-16 characters are converted to code page 936 since GB 2312 is only supported and GB 18030 is not supported in code page 936. Also, the expression “16-bit Unicode” is confusing. It is not clear if it means that BMP is only supported and that surrogate Unicode characters are not supported. To support JIS X 0213 and GB 18030 characters, surrogate Unicode characters are to be supported. The similar pre-process is defined in 3.3.1.69 on page 2003.

 

Allow all UTF-16 characters to be used.

 

 

Validation against the VML schemas cannot be separated from that against the rest of OOXML. In other words, a WordprocessingML document cannot be validated without performing validation against the VML schemas.

 

Specify "skip" as the value of "processContents" attribute.

 

 

The RELAX NG version of the OOXML schemas is syntactically incorrect.

 

Replace it by the upcoming proposal from the Japanese member body for SC34.

 

 

Since W3C XML Schema very frequently exhibits interoperability problems, it is not safe to use it as the only normative schema language.

 

Use both RELAX NG and W3C XML Schema normatively.

 

 

While two-letter ISO 639-1 language codes are allowed, three-letter ISO 639-2 language codes are disallowed.

 

Use RFC 4646 for defining language codes.

 

 

The differences between smart tags, custom XML markups, and structured document tags are unclear. In particular, 2.5.1 (Custom XML and Smart Tags) and 2.5.2 (Structured Document Tags) of Part 4 are almost the same.

 

Clarify the differences. Some tutorial (e.g., as a non-normative appendix) would be very helpful.

 

 

The Custom XML Mappings Part of the SpreadsheetML is restricted to W3C XML Schema.

 

Allow the use of RELAX NG.

 

 

The import and include relationships among the DrawingML schemas are extremely complicated. (See http://www.asahi -net.or.jp/ ~eb2m-mrt/ ooxml/ dependencies.html.)

 

To make the relationships more understandable, it might make sense to divide DrawingML into sublanguages or introduce some text and diagrams on the relationships of DrawingML schemas.