MEG0002 XML 1.0 Profile
Version: 1.03
Last Updated: 12/17/2008
Status: Ready for Review
Content
C1. Content
C1.1 Summary of Guidelines
| Ref |
Guideline |
| 2.1 |
All messages attributed to be MISMO compliant MUST conform to the MISMO profile. |
| 2.1.1 |
MUST conform to XML 1.0 XML 1.0 |
| 2.1.2 |
MUST be limited to UTF-8, UTF-16 or ISO-8859-1 encoding |
| 2.1.3 |
MUST NOT use C0 control characters other than TAB, CR, LF |
| 2.1.4 |
SHOULD NOT use character refs for characters inside the ASCII range |
| 2.1.5 |
SHOULD use uppercase hexadecimal numeric character references for characters outside those of ISO 8859-1 |
| 2.1.6 |
If used, Namespace Declarations MUST be in the root node (V2) Namespace Declarations MUST be in the root node (V3) |
| 2.1.7 |
MUST NOT use unparsed entity references |
| 2.1.8 |
MUST NOT use external entity references |
| 2.1.9 |
MUST NOT use xml:base |
| 2.1.10 |
MUST NOT use relative URIs in namespace declarations |
| 2.1.11 |
MUST NOT require infoset augmentation. |
| 2.1.12 |
MUST NOT require annotation of the infoset by any validation process. |
| 2.2 |
Services that accept messages that are MISMO interoperable MUST be capable of processing MISMO Canonical Form (MCF) (V3 Only) |
| 2.2.1 |
MUST comply with the MISMO Profile |
| 2.2.2 |
MUST conform to W3C Canonical Form XML-C14N except for the use of uppercase hexadecimal character references |
| 2.2.3 |
MUST NOT contain white space before or after the root element |
| 2.2.4 |
MUST use only ASCII characters in markup |
| 2.2.5 |
MUST NOT include Processing Instructions |
| 2.2.6 |
MUST NOT include Comments |
| 2.2.7 |
MUST use lower case for xml:lang attribute values |
| 2.2.8 |
MUST NOT use numeric character references |
| 2.2.9 |
MUST NOT use parsed entity references |
| 2.3 |
Services MAY send messages in MISMO Canonical Form (MCF) (V3 Only) |
C1.2 Services can use local profiles behind service boundaries
This section describes the guidelines for conformance to the MISMO XML Profile. The guidelines are ordered in a sequence to broadly correspond to the structure of the W3C XML1.0 specification.
The fundamental nature of the exchange of MISMO interoperable messages is that of loosely coupled services that communicate using XML documents (messages). Services may potentially be used by many consumers each based on different technologies and XML applications. Underlying many of the following guidelines is the need for each message to be self-contained. This means that each message must contain within itself all the information required by a service provider, in a form that can be processed without recourse to the use of extended XML features.
As highlighted in Figure 2 MEG files and XML profile, the XML profile also encompasses XML Namespaces and XML Schema Languages, the guidelines for which are described in MEG0003 and MEG0004 respectively, and which should be read in conjunction with this document
C1.3 Guidelines
| Ref |
Guideline |
| 2.1 |
All messages attributed to be MISMO interoperable MUST comply with the MISMO profile as defined in guidelines 2.1.1 to 2.1.12
The MISMO Profile is described in guidelines 2.1.1 to 2.1.12 of this document. It represents a profile of the relevant XML standards which does not unduly constrain developers, while excluding certain characters and XML constructs that could cause interoperability difficulties. It also excludes characteristics that would not survive conversion to MISMO Canonical Form in a predictable, loss-less way. In the interests of on-going interoperability across a wide range of services, and to enable conversion to RCF, compliance with the MISMO Profile is mandatory for all Messages designed to be MISMO Interoperable |
| 2.1.1 |
MUST conform to XML 1.0 XML 1.0
The foundation of the MISMO Profile is the W3C XML 1.0 Specification that defines the syntax to create “well-formed” XML documents. XML V1.1 MUST NOT be used on the basis that XML 1.0 is more compatible with existing XML applications, robust, and provides all the required features of XML V1.1. Non-compliance with XML 1.0 MAY result in a message from the receiving party that the message has been rejected. |
| 2.1.2 |
MUST be limited to UTF-8, UTF-16 or ISO-8859-1 encoding
Services SHOULD use UTF-8, the most widely used Encoding Standard for XML mark-up, however UTF-16 or ISO-8859-1 encoding MAY be used. If these alternative encodings are used, note that they WILL be converted by the to UTF-8 by any canonicalization process used prior to XML Signature or XML Encryption to comply with the requirements of Canonical Form. Developers considering the use of ISO-8859-1 or UTF-16 MUST discuss their requirements with the trading partners of their transaction. MISMO standards and example files will ALWAYS be in UTF-8. Note: There are a lot of documents that are labeled as ISO-8859-1 but that actually are a platform specific variant of ISO-8869-1 known as Microsoft CP-1252 (code page 1252). See MEG0005 for further information. |
| 2.1.3 |
MUST NOT use C0 control characters other than TAB (U+0009), CR (U+000D) and LF (U+000A)
The first 32 Unicode characters with code points 0-31 are known as the C0 controls. They are a historical legacy defined to control dumb terminals and MUST NOT be used in XML documents since they may cause unpredictable behavior by XML processing applications. Non-compliance SHOULD result in a message from the receiving service and being rejected. |
| 2.1.4 |
SHOULD NOT use character references for characters inside the ASCII range
To preserve readability, character references SHOULD NOT be used for standard ASCII characters. |
| 2.1.5 |
SHOULD use uppercase hexadecimal numeric character references for characters outside those of ISO 8859-1
There are two types of character references: a numeric Character Reference that specifies a Unicode Code position and a character Parsed Entity reference which is a defined reference using a more meaningful symbolic name equated to a numeric Character Reference. For example, the Latin small letter a with tilde to be represented as either “ã” (numeric Character Reference) or to “ã” (character Parsed Entity reference). Numeric Character References SHOULD be used for to minimize interoperability issues with systems that may not be able to handle the literal characters. Numeric character references are a character encoding-independent mechanism for entering any Unicode character, typically those which hardware or software configurations do not allow users to input or display directly. Character Parsed Entity references other than those defined by the XML specification SHOULD be avoided unless the context provides assurance that the reference can be resolved. They have to be defined in a Document Type Declaration if they are not pre-defined in XML. When using non-standard encodings for XML there is a risk of mistranslation when a message is converted to UTF-8 as part of a canonicalization process prior to XML Signature or XML encryption. All non-ASCII characters (not in the range “ ”-“}”) SHOULD use a numeric Character Reference when the XML encoding is not UTF-8 or UTF-16. |
| 2.1.6 |
If used Namespace Declarations MUST be in the root node (V2) Namespace Declarations MUST be in the root node (V3)
The presence of namespace declarations anywhere other than the root node indicates that the message is not in MCF and will not be able to be placed in MCF unambiguously. Note: This refers to the root node of the message contained within the body of a MISMO Envelope. It does NOT refer to the root node of the MISMO envelope. See Namespaces & the MISMO Envelope section of MEG003 for further information. |
| 2.1.7 |
MUST NOT use unparsed entity references
Parsed entities store data that is parsed by an XML application, and therefore can contain only text. They are merged with the contents of the document when they are processed. Parsed entity references MAY be used, but will get converted as part of the transformation to MCF or other canonicalization process prior to application of an XML Digital Signature or XML encryption. Unparsed entities aren’t parsed and therefore can be either text or binary data. They cannot be processed by XML applications. Instead notations have to be used to identify the entity type to the XML application so that it can call the appropriate helper application – for instance an image viewer for a JPEG image entity. There is no truly interoperable way to predictably handle unparsed entities, or convert them to MCF or other canonicalization form, they MUST NOT be used. The MISMO specifications MEG0034 – Profile for use of EMBEDDED_FILE define that when one might consider using an unparsed entity reference use the EMBEDED FILE structure to place the material inside the document. |
| 2.1.8 |
MUST NOT use external entity references
External entities are stored outside of the XML document that references them (and can be parsed or unparsed). Typically they are used for binary files such as images. Many XML processors cannot resolve external entity references. Also, many applications will not have access to external networks to resolve external entities. In accordance with the requirement for messages to be self-contained, non-compliance SHOULD result in a message indicating the failure of the message and be rejected. The MISMO specifications «??need meg number» require when one might consider using an external entity reference use the EMBEDED FILE structure to place the material inside the document. Binary material in a file is encoded so as to allow it to be carried in an XML document. |
| 2.1.9 |
MUST NOT use xml:base
The XML Base specification W3C BASE defines an attribute “xml:base” that is intended to resolve the “base” to be used for relative URI processing. Support for xml:base cannot be guaranteed by XML processing applications, and it MUST NOT be used. The presence of xml:base MAY result in a message from the receiving system that it has rejected the message. |
| 2.1.10 |
MUST NOT use relative URIs in namespace declarations
All namespace declarations MUST be fully qualified and their processing be independent of relative location or context. This is to ensure that all messages are “self-contained”, and namespace declarations unique across the real estate finance community. Services that use relative URIs locally MUST resolve them prior to sending messages to other services. The processing SHOULD create a new document in which relative URIs have been converted to absolute URIs. In turn, a URI MUST refer or map to an Internet-resolvable URL. The MISMO namespaces and namespaces used in extensions MUST be of the URL format and not of the URN format. It is NOT required that a resource be available at the URL used in the namespace. It is however, highly recommended that an RDDL file be available at the namespace URL to provide documentation and links about the content of the namespace. The presence of a relative URI in a namespace declaration MAY result in a message failure and be rejected by the recipient. |
| 2.1.11 |
MUST NOT require infoset augmentation.
The XML Information Set allows applications to extend the Infoset. However in accordance with the requirement for messages to be self-contained, it must be assumed that all messages will be read/written on the integration framework by an XML application that is treating the document as a well-formed XML instance with an xml:standalone declaration set to “yes”. This indicates that there are no external markup declarations that affect the information passed from the XML processor to the application. |
| 2.1.12 |
MUST NOT require annotation of the infoset by any validation process.
The inclusion of information required for validation of a message within annotations of the XML infoset is also prohibited in MISMO interoperable messages. This is on the basis that interpretation of such information would require non-standard bipartisan agreement between communicating parties. That is not to say that any two communicating parties can choose whatever means required to accomplish their business goals. It is that in so doing the implementation cannot claim to be using MISMO Interoperable messages. |
| 2.2 |
Services that accept messages that are MISMO interoperable MUST be capable of processing MISMO Canonical Form (MCF) (V3 Only)
It is possible for logically equivalent XML documents to differ in their physical representation – e.g. structure, attribute ordering, insignificant white space. This means that equivalence testing cannot be done at the byte level. The W3C has defined the Canonical XML specification XML-C14N that aims to introduce a notion of equivalence between XML documents. Two logically equivalent documents that have been converted to canonical form will have the same byte-for-byte representation. MCF is the form of the MISMO Profile that has been processed to make it in by a canonical form, and a profile of W3C Canonical Form. The XML files used by MISMO to publish and explain the standards will be in MCF. Any transaction the used XML Digital signature or XML encryption first goes through a XML-C14N based canonicalization process. Therefore this guideline provides guidance to implementers of V2 transactions that use those technologies. It is strongly recommended that all services that accept MISMO interoperable messages accept them in MCF. MCF is described in guidelines 2.2.1 to 2.2.9 of this document. |
| 2.2.1 |
MUST comply with the MISMO Profile
Messages that are not compliant with the MISMO XML profile cannot be placed into MCF. |
| 2.2.2 |
MUST conform to W3C Canonical Form XML-C14N except for the use of uppercase hexadecimal character references
All the requirements of W3C Canonical Form apply to MCF except for the use of uppercase hexadecimal character references. A summary of these recommendations is included in . Note that character references will be resolved by processes that use the W3C Canonical Form transformation. |
| 2.2.3 |
MUST NOT contain white space before or after the root element
XML defines white space as the Unicode characters space (0×20), carriage return (0×0D), linefeed (0×0A) and tab (0×09). Although white space in the prolog or epilog is ignored by parsers, its use before or after the root element is handled less predictably. The first character MUST be a ’<’ and the last character MUST be a ’>’. When the XML document is carried by some other protocol, for example MIME there should be no additional white space between the end of the message separator and the first ’<’ that begins XML document. |
| 2.2.4 |
MUST use only ASCII characters in markup
The inclusion of non-ASCII characters within markup (for example element names) may result in a message being rejected by a receiving service. That is only Unicode characters between “ã” and “ã” should be used in names of elements and attributes. MEG0011- Names of Elements and Attributes contains additional information about the construction of MISMO names. The current LDD (Logical Data Dictionary) contains the data point names defined in the MISMO namespace. The current RM (Reference Model) contains the data point (attribute) and container (element) names. All MISMO names are composed of only characters defined in the ASCII set. If a service extended MISMO definitions by adding their own data point and container names and these extensions used non-ASCII characters the resulting messages would NOT be MISMO interoperable and would probable have interoperability issues when canonicalization is required. |
| 2.2.5 |
MUST NOT include Processing Instructions
Processing Instructions (PIs) MUST be removed as part of the conversion to MCF. None of the MISMO transaction standards use the PI entity type. If PI entity types are included in an extension to the MISMO standards there is a strong possibility that the canonicalization process used to form XML signatures or XML encryptions will have interoperability issues. Messages that contain PI entities will not be considered MISMO interoperable. |
| 2.2.6 |
MUST NOT include Comments
Comments MUST be removed as part of the conversion to MCF. There is inconsistent support for comments when various XML platforms process XML instance documents. Obtaining successful canonicalization for use in XML signatures and XML encryption requires removal of comments. MISMO will use comments in XSD and XSLT files distributed as part of documentation of standards. If MISMO chooses to distribute an encrypted version of an XSD or XSLT then the comments would be removed prior to the canonicalization step. If MISMO chooses to distribute an MD5 hash value for files that are part of a standard distribution then the has value would be calculated with the comments included, because a file level hash value does not make assumptions about the content. |
| 2.2.7 |
MUST use lower case for xml:lang attribute values
The specifications for the attribute xml:lang allow for example “en” or “EN” to represent English. Two documents with that same semantic meaning but different use of case in xml:lang would not produce the same XML Signature or XML encryption value. Therefore to meet our interoperability goals we must restrict xml:lang to only the lower case form. |
| 2.2.8 |
MUST NOT use numeric character references
Character references refer to Unicode code points and are independent of the encoding form. Typically they are used to refer to a character that is not directly accessible from available input devices. Numeric character references MUST be replaced with their corresponding character as part of the conversion to MCF. This is typical part of the canonicalization process available in XML platforms. For example
<?xml version=“1.0” encoding=“UTF-8”?> <x:sample xmlns:x=“http://www.x.com/sample/”> <section>¥</section> </x:sample>
Becomes <x:sample xmlns:x=“http://www.x.com/sample/”> <section>¥</section> </x:sample> |
| 2.2.9 |
MUST NOT use parsed entity references
Since Canonical Form does not allow Document Type Declarations, it is not possible to declare entities in MCF, and therefore a test that a document is well-formed by a parser will fail. All parsed entity references MUST be replaced as part for the conversion to MCF. Typically this occurs as part of the canonicalization process provided by most platforms. For example:
<?xml version=“1.0” encoding=“UTF-8”?> <!DOCTYPE sample<!ENTITY VAT “Value Added Tax”>/> <x:sample xmlns:x=“http://www.x.com/sample/”> <section> VAT </section> </x:sample>
Becomes <x:sample xmlns:x=“http://www.x.com/sample/”> <section> Value Added Tax </section> </x:sample> |
C1.4 W3C Canonical Form
The canonical form of an XML document is the physical representation of the document produced by the method described in W3C Canonical XML specification. The following table summarizes the main aspects of the W3C Canonical Form Recommendations. It does NOT include all aspects of W3C Canonical Form – for full details see XML-C14N
| Ref |
Guideline |
| A.1 |
MUST be UTF-8 encoded |
| A.2 |
MUST be a well formed XML document compliant with XML 1.0 |
| A.3 |
MUST have line breaks normalized to U+000A |
| A.4 |
MUST NOT contain CDATA sections |
| A.5 |
MUST NOT include a document type definition |
| A.6 |
MUST NOT have an XML declaration |
| A.7 |
MUST use end-tags for empty elements |
| A.8 |
White space in start- and end-tags MUST be normalized per the rules of XML 1.0 |
| A.9 |
MUST use double quotes only to delimit attribute values. Literal double quote characters must be escaped with the built-in “quot” entity : ““” |
| A.10 |
MUST specify attributes in ascending lexicographic order. |
| A.11 |
Namespace declarations MUST also be in ascending lexicographic order and must precede any attributes. |
| A.12 |
MUST NOT contain character or parsed entity references |
| A.13 |
MUST NOT contain superfluous namespace declarations |
| A.14 |
All occurrences of literal “>” MUST BE be escaped with “>” |
Metadata
M1. Metadata
| Element |
Description |
| Title |
XML Profile |
| Identifier |
MEG0002 |
| Category |
Foundation |
| Date Created |
10/19/2006 8:47:00 AM |
| Last Modified |
12/17/2008 06:34:00 PM |
| Publisher |
MISMO |
| Copyright |
©2008 MISMO. All Rights Reserved. |
M1.1 Release History
| Release Date |
Version No. |
Comments |
| 10/19/2006 |
0.01 |
Initial Version |
| 11/02/2003 |
1.0 |
First Reading - Approved |
| 11/15/2006 |
1.0.1 |
Typographical error correction |
| 12/21/2006 |
1.0.2 |
Leendert Bijnagte-2.1.5, 2.2.2, and 2.2.8 to permit numeric character references in MCF and 2.2.4 to correct t ASCII characters permitted in XML names |
| 12/17/2008 |
1.0.3 |
Global mods to MEG format; Disclaimer tab re-named ”Terms“ and contents updated. |
M1.2 Known Issues or Omissions
A report was received of several typographical issues that were corrected.
M1.3 Contacts
|
Name |
Organization |
Contact Details |
| <ENTER NAME> |
MISMO |
<ENTER CONTACT DETAILS> |
M1.4 References
Purpose
P1. Introduction
P1.1 Context
The eXtensible Markup Language (XML) is an open, platform independent language for structuring data that can be used for any number of reasons. The standards associated with XML can be interpreted and implemented in different ways, which can cause interoperability issues. The MISMO approach to ensuring interoperability amongst its constituent participants is to define profiles of the relevant de facto and de jure XML standards. The profiles retain full compliance with the standards while excluding those aspects that could cause interoperability issues within the context of the reviewing MISMO standards and exchanging XML instance documents that use or extend those standards.
P1.2 Purpose
This MEG describes the specification of the MISMO XML Profile.
P1.3 Scope
This MEG is concerned both with the XML that flows between services and the representation used to define the MISMO transaction standards. It is not concerned with what goes on behind service boundaries, and places no restrictions on what XML features or technologies are used within the confines of a service.
P1.4 Audience
Software developers working with the MISMO transaction specification who are familiar with XML and the MISMO Interoperability Principles described in rig0012 MISMO Interoperability Principles
P1.5 Terminology
The key words ”MUST“, ”MUST NOT“, ”REQUIRED“, ”SHALL“, ”SHALL NOT“, ”SHOULD“, ”SHOULD NOT“, ”RECOMMENDED“, ”MAY“, and ”OPTIONAL" in this document are to be interpreted as described in RFC 2119 . See «add reference to MISMO glossary for terminology used in this MEG».
P1.6 Assumptions
The information supplied in this document reflects the MISMO interoperability principles at the time of writing. It is a living document, which will be updated as required to reflect the evolving nature of XML technologies and service requirements identified by MISO constituency. Comments on this document should be sent to the MISMO designated contact identified in the document preface.
P2. Rationale
Similar to other “standards”, those associated with XML can be interpreted and used in different ways, and it is possible to construct XML messages that are not interoperable. The approach used by MISMO to ensure interoperability among the constituent services that are implemented by its constituency is to define profiles of de facto and de jure standards. By profiling, MISMO seeks to get the benefits of conformance to standards without the interoperability headaches that accrue from less constrained standards adoption methods. MISMO XML profiling can be conceptualized as the “layers” shown in Figure 1 and described below.

- Data Definition & Validation Layer. Services exchanging information must share a common understanding of its structure and meaning. A mutually agreed contract is required by the sender can use when constructing a message, and by the receiver when validating it. Within XML, a schema can be considered as the contract that is published and used by the sender and receiver. There are multiple published schema standards, each of which can be implemented in a variety of different ways. Profiling this layer seeks to ensure that services using XML messages read and interpret them in the same way. • Syntax Layer. An XML message and its associated schema (contract) must be written in a language that all services using the message understand. The language of MISMO is XML. Profiling this layer seeks to ensure that services use the same syntax rules to “read and write” XML messages – analogous to grammar rules such as “all sentences must end with a full stop” in English. • Characterset Layer. A language is comprised of characters. Profiling this layer defines how each character within an XML message is represented to the underlying computer systems handling the message. • Encoding Layer. Finally the character representations must be translated to / from the actual byte-level data stream used by the underlying computer systems. Profiling this layer defines how the character representations are encoded.
Of the many XML-related standards, three are of primary importance with the context of the MISMO Profile. Between them, they encompass the aforementioned layers:
- W3C XML 1.0 Underpins the entire XML standards family, defining the strict rules for text format and thereby encompassing the “Syntax Layer”. Furthermore, XML 1.0 is built on Unicode for character handling (“Characterset layer”) and also specifies the allowed mechanisms for encoding Unicode characters (“Encoding layer”). This document, MEG0002, describes the MISMO Profile of XML 1.0. For clarity, profiling of Unicode Characters is described separately in MEG0005.
- W3C Namespaces in XML . In isolation, the XML 1.0 standard allows multiple vocabularies to be created independently, each of which are structurally valid, but which may “clash” when used within a shared system context. The Namespaces in XML standard provides a mechanism for universal naming of elements and attributes in XML documents. The profiling of the Namespaces in XML standard is described separately in MEG00030.
- XML Schema. Schemas provide a means to define and validate the structure, content and semantics of XML documents, and thus form the “Data Definition and Validation Layer”. Profiling of schemas is described in MEG0004.
MEG files defining the MISMO XML profile are collectively termed “Baseline MEGs”. The relationship between these MEGs and the conceptual layers from figure 1 is shown diagrammatically in Fig. 2.

The family of XML standards encompasses several other areas, notably those describing how data within XML documents can be accessed, transformed and formatted for presentation purposes. Since these functions primarily occur behind service boundaries they are not included within the MISMO Profile.
Comments
2006
2007
Terms
T1. Terms and Conditions
T1.1 Disclaimer
MISMO accepts no liability for the accuracy, adequacy, or completeness of the information contained in this MISMO Engineering Guideline (MEG).
T1.2 Reproduction
Material in this MEG may be reproduced free of charge without obtaining explicit permission from MISMO, provided that the source is acknowledged, the document title given, and the material used in context.
T1.3 Copyright
©2008 MISMO. All material in this MEG is the property of MISMO. All Rights Reserved