Class XMLScanner

  • All Implemented Interfaces:
    XMLComponent
    Direct Known Subclasses:
    XMLDocumentFragmentScannerImpl, XMLDTDScannerImpl

    public abstract class XMLScanner
    extends Object
    implements XMLComponent
    This class is responsible for holding scanning methods common to scanning the XML document structure and content as well as the DTD structure and content. Both XMLDocumentScanner and XMLDTDScanner inherit from this base class.

    This component requires the following features and properties from the component manager that uses it:

    • http://xml.org/sax/features/validation
    • http://xml.org/sax/features/namespaces
    • http://apache.org/xml/features/scanner/notify-char-refs
    • http://apache.org/xml/properties/internal/symbol-table
    • http://apache.org/xml/properties/internal/error-reporter
    • http://apache.org/xml/properties/internal/entity-manager
    Version:
    $Id$
    Author:
    Andy Clark, IBM, Arnaud Le Hors, IBM, Eric Ye, IBM
    • Field Detail

      • NOTIFY_CHAR_REFS

        protected static final String NOTIFY_CHAR_REFS
        Feature identifier: notify character references.
        See Also:
        Constant Field Values
      • DEBUG_ATTR_NORMALIZATION

        protected static final boolean DEBUG_ATTR_NORMALIZATION
        Debug attribute normalization.
        See Also:
        Constant Field Values
      • fValidation

        protected boolean fValidation
        Validation. This feature identifier is: http://xml.org/sax/features/validation
      • fNamespaces

        protected boolean fNamespaces
        Namespaces.
      • fNotifyCharRefs

        protected boolean fNotifyCharRefs
        Character references notification.
      • fParserSettings

        protected boolean fParserSettings
        Internal parser-settings feature
      • fSymbolTable

        protected SymbolTable fSymbolTable
        Symbol table.
      • fEntityDepth

        protected int fEntityDepth
        Entity depth.
      • fCharRefLiteral

        protected String fCharRefLiteral
        Literal value of the last character refence scanned.
      • fScanningAttribute

        protected boolean fScanningAttribute
        Scanning attribute.
      • fReportEntity

        protected boolean fReportEntity
        Report entity boundary.
      • fVersionSymbol

        protected static final String fVersionSymbol
        Symbol: "version".
      • fEncodingSymbol

        protected static final String fEncodingSymbol
        Symbol: "encoding".
      • fStandaloneSymbol

        protected static final String fStandaloneSymbol
        Symbol: "standalone".
      • fAmpSymbol

        protected static final String fAmpSymbol
        Symbol: "amp".
      • fLtSymbol

        protected static final String fLtSymbol
        Symbol: "lt".
      • fGtSymbol

        protected static final String fGtSymbol
        Symbol: "gt".
      • fQuotSymbol

        protected static final String fQuotSymbol
        Symbol: "quot".
      • fAposSymbol

        protected static final String fAposSymbol
        Symbol: "apos".
    • Constructor Detail

      • XMLScanner

        public XMLScanner()
    • Method Detail

      • reset

        public void reset​(XMLComponentManager componentManager)
                   throws XMLConfigurationException
        Description copied from interface: XMLComponent
        Resets the component. The component can query the component manager about any features and properties that affect the operation of the component.
        Specified by:
        reset in interface XMLComponent
        Parameters:
        componentManager - The component manager.
        Throws:
        SAXException - Throws exception if required features and properties cannot be found.
        XMLConfigurationException
      • setFeature

        public void setFeature​(String featureId,
                               boolean value)
                        throws XMLConfigurationException
        Description copied from interface: XMLComponent
        Sets the state of a feature. This method is called by the component manager any time after reset when a feature changes state.

        Note: Components should silently ignore features that do not affect the operation of the component.

        Specified by:
        setFeature in interface XMLComponent
        Parameters:
        featureId - The feature identifier.
        value - The state of the feature.
        Throws:
        XMLConfigurationException - Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
      • reset

        protected void reset()
      • scanXMLDeclOrTextDecl

        protected void scanXMLDeclOrTextDecl​(boolean scanningTextDecl,
                                             String[] pseudoAttributeValues)
                                      throws IOException,
                                             XNIException
        Scans an XML or text declaration.

         [23] XMLDecl ::= ''
         [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ")
         [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' |  "'" EncName "'" )
         [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
         [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'")
                         | ('"' ('yes' | 'no') '"'))
        
         [77] TextDecl ::= ''
         
        Parameters:
        scanningTextDecl - True if a text declaration is to be scanned instead of an XML declaration.
        pseudoAttributeValues - An array of size 3 to return the version, encoding and standalone pseudo attribute values (in that order). Note: This method uses fString, anything in it at the time of calling is lost.
        Throws:
        IOException
        XNIException
      • scanPseudoAttribute

        public String scanPseudoAttribute​(boolean scanningTextDecl,
                                          XMLString value)
                                   throws IOException,
                                          XNIException
        Scans a pseudo attribute.
        Parameters:
        scanningTextDecl - True if scanning this pseudo-attribute for a TextDecl; false if scanning XMLDecl. This flag is needed to report the correct type of error.
        value - The string to fill in with the attribute value.
        Returns:
        The name of the attribute Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.
        Throws:
        IOException
        XNIException
      • scanPI

        protected void scanPI()
                       throws IOException,
                              XNIException
        Scans a processing instruction.

         [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
         [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
         
        Note: This method uses fString, anything in it at the time of calling is lost.
        Throws:
        IOException
        XNIException
      • scanPIData

        protected void scanPIData​(String target,
                                  XMLString data)
                           throws IOException,
                                  XNIException
        Scans a processing data. This is needed to handle the situation where a document starts with a processing instruction whose target name starts with "xml". (e.g. xmlfoo) Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
        Parameters:
        target - The PI target
        data - The string to fill in with the data
        Throws:
        IOException
        XNIException
      • scanComment

        protected void scanComment​(XMLStringBuffer text)
                            throws IOException,
                                   XNIException
        Scans a comment.

         [15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
         

        Note: Called after scanning past '<!--' Note: This method uses fString, anything in it at the time of calling is lost.

        Parameters:
        text - The buffer to fill in with the text.
        Throws:
        IOException
        XNIException
      • scanAttributeValue

        protected boolean scanAttributeValue​(XMLString value,
                                             XMLString nonNormalizedValue,
                                             String atName,
                                             boolean checkEntities,
                                             String eleName)
                                      throws IOException,
                                             XNIException
        Scans an attribute value and normalizes whitespace converting all whitespace characters to space characters. [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
        Parameters:
        value - The XMLString to fill in with the value.
        nonNormalizedValue - The XMLString to fill in with the non-normalized value.
        atName - The name of the attribute being parsed (for error msgs).
        checkEntities - true if undeclared entities should be reported as VC violation, false if undeclared entities should be reported as WFC violation.
        eleName - The name of element to which this attribute belongs.
        Returns:
        true if the non-normalized and normalized value are the same Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.
        Throws:
        IOException
        XNIException
      • scanExternalID

        protected void scanExternalID​(String[] identifiers,
                                      boolean optionalSystemId)
                               throws IOException,
                                      XNIException
        Scans External ID and return the public and system IDs.
        Parameters:
        identifiers - An array of size 2 to return the system id, and public id (in that order).
        optionalSystemId - Specifies whether the system id is optional. Note: This method uses fString and fStringBuffer, anything in them at the time of calling is lost.
        Throws:
        IOException
        XNIException
      • scanPubidLiteral

        protected boolean scanPubidLiteral​(XMLString literal)
                                    throws IOException,
                                           XNIException
        Scans public ID literal. [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] The returned string is normalized according to the following rule, from http://www.w3.org/TR/REC-xml#dt-pubid: Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.
        Parameters:
        literal - The string to fill in with the public ID literal.
        Returns:
        True on success. Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
        Throws:
        IOException
        XNIException
      • normalizeWhitespace

        protected void normalizeWhitespace​(XMLString value)
        Normalize whitespace in an XMLString converting all whitespace characters to space characters.
      • normalizeWhitespace

        protected void normalizeWhitespace​(XMLString value,
                                           int fromIndex)
        Normalize whitespace in an XMLString converting all whitespace characters to space characters.
      • isUnchangedByNormalization

        protected int isUnchangedByNormalization​(XMLString value)
        Checks whether this string would be unchanged by normalization.
        Returns:
        -1 if the value would be unchanged by normalization, otherwise the index of the first whitespace character which would be transformed.
      • startEntity

        public void startEntity​(String name,
                                XMLResourceIdentifier identifier,
                                String encoding,
                                Augmentations augs)
                         throws XNIException
        This method notifies of the start of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.
        Parameters:
        name - The name of the entity.
        identifier - The resource identifier.
        encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
        augs - Additional information that may include infoset augmentations
        Throws:
        XNIException - Thrown by handler to signal an error.
      • endEntity

        public void endEntity​(String name,
                              Augmentations augs)
                       throws XNIException
        This method notifies the end of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.
        Parameters:
        name - The name of the entity.
        augs - Additional information that may include infoset augmentations
        Throws:
        XNIException - Thrown by handler to signal an error.
      • scanCharReferenceValue

        protected int scanCharReferenceValue​(XMLStringBuffer buf,
                                             XMLStringBuffer buf2)
                                      throws IOException,
                                             XNIException
        Scans a character reference and append the corresponding chars to the specified buffer.

         [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
         
        Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
        Parameters:
        buf - the character buffer to append chars to
        buf2 - the character buffer to append non-normalized chars to
        Returns:
        the character value or (-1) on conversion failure
        Throws:
        IOException
        XNIException
      • isInvalid

        protected boolean isInvalid​(int value)
      • isInvalidLiteral

        protected boolean isInvalidLiteral​(int value)
      • isValidNameChar

        protected boolean isValidNameChar​(int value)
      • isValidNameStartChar

        protected boolean isValidNameStartChar​(int value)
      • isValidNCName

        protected boolean isValidNCName​(int value)
      • isValidNameStartHighSurrogate

        protected boolean isValidNameStartHighSurrogate​(int value)
      • versionSupported

        protected boolean versionSupported​(String version)
      • getVersionNotSupportedKey

        protected String getVersionNotSupportedKey()
      • scanSurrogates

        protected boolean scanSurrogates​(XMLStringBuffer buf)
                                  throws IOException,
                                         XNIException
        Scans surrogates and append them to the specified buffer.

        Note: This assumes the current char has already been identified as a high surrogate.

        Parameters:
        buf - The StringBuffer to append the read surrogates to.
        Returns:
        True if it succeeded.
        Throws:
        IOException
        XNIException