4020 Week 3
4020 Week 3
◆ XML Examples
)
SGML
HTML - Disadvantages
◆ Limited tag set
◆ Can’t separate the presentation from content
◆ Can’t define structure of contents
XHTML
)
XHTML Basics
◆ Very few real changes from HTML
◆ But more strict
<body>
<p>Welcome to XHTML!</p>
</body>
</html>
Images
The value of the src attribute
)
XML Introduction
◆ The Extensible Markup Language (XML) is a document
processing standard proposed by the World Wide Web
Consortium (W3C), which is related to Standard
Generalised Markup Language (SGML).
◆ Possible to search, sort, manipulate and render XML
using Extensible Markup Language (XSL).
◆ Highly portable
◆ Files end in the .xml extension.
XML & W3C
• XML has been in development since the 1960s through its parent called
SGML (Standard Generalized Markup Language) which is also the parent for
HTML
- www.w3.org/xml
- www.xml.com/axml/axml.html (annotated version)
XML-related Technologies
◆ DTD (Document Type Definition) and XML Schemas are
used to define legal XML tags and their attributes for
particular purposes
<passport_details>
<last_name>Smith</last_name>
<first_name>Jo</first_name>
<first_name>Stephen</first_name>
<address>
<street>1 Great Street</street>
<city>GreatCity</city>
<state>GreatState</state>
<postal_code>1234</postal_code>
<country>GreatLand</country>
<email>jo@theworldaccordingtojo.com</email>
</address>
</passport_details>
XML Examples
◆ XML Source File
➢ http://www.yorku.ca/jhuang/xml/04.adhoc.topics.xml
https://www.w3schools.com/xml/xml_parser.asp
B2C
• Business-to-Consumer involves sending XML directly to the client
• Data sent directly to the client needs a style (XSL) applied
• Applying style is best accomplished on the server side
Document Structure
• Three distinct parts
- Prolog <?xml version=“1.0” encoding=“UTF-8”?>
- Root Element
- Miscellaneous Section
Child
Xml document element
Child
element Child
element
Root element
Child
element
Child
element Child
element
XML Elements
- have the same overall structure
- can contain sub-elements
PCDATA
(Parsed Character Data)
ELEMENT
NAME
Element vs. Attribute based XML
<student> <student id = “9906789”> 2
1
<id> 9906789 </id> <name>Adam</name>
<name>Adam</name> <email>adam@unl.ac.uk</email>
<email>adam@unl.ac.uk</email> </student>
</student>
3
<student id = “9906789” name=“Adam email=“adam@yorku.ca”> </student>
• The author grammar indicates that it is made up of four elements defined as below:
<!ELEMENT date (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT lname (#PCDATA)>
• Each element may have attributes that contains information about its content
e.g. <date month="February” >2000</date>
CDATA in non-
• An element’s attribute list can be defined using ATTLIKST tag: parsed
syntax: <!ATTLIST element_name attribute_name type default_value>
<books>
<author>
<date>1995</date>
<title> Introduction to Computer Graphics </title>
<fname>James</fname>
<lname>Foley</lname>
</author> inLineDtdExample.xml
…..
</books>
DTDs - Disadvantages
• Notoriously hard to read
• Difficult to create (written in non-XML syntax; uses EBNF - Extended Backus-Naur
Form - grammar)
• No support for namespaces etc. Also study ANY, EMPTY,
• Limited data types (PCDATA, CDATA) Mixed Content
• First, create XML document that the contains content character data and
marked up with XML tags.
• Second, build Document Type Definition (DTD). The DTD specifies rules
such as ordering of elements, default values, and so on.
• Third, use XML Parser that checks the XML document against the DTD and
then splits the document up into markup regions and character-data regions.
• After processing with the XML parser, the data now is in a structured format
and can be processed by any XML application.
XML Parsers (or Processors)
• one of the most important layers to an XML-aware application (e.g.Firefox, IE 5+)
• input - raw XML document
• parses to ensure that the document is well formed and/or valid (if a DTD exists),
report errors and allows programmatic access to the document contents
• output - a data structure (XML document is transformed)
<books> books
<author>
<date>1995</date>
<title> Web IR </title> author
<fname>Jimmy</fname>
<lname>Huang</lname>
</author>
</books> 1995 Web IR Jimmy Huang
Parsing XML Documents
• Parsers can support the Document Object Model (DOM) and Simple API
for XML (SAX) for accessing document’s content programmatically using
languages such as Java, C, C++, Python etc.
• A SAX based parser processes the document and generates events (I.e.
notifications to the application) when tags, comments etc. are
encountered. These events return data from the XML document.
(used to read XML documents only;
SAX is attractive for handling large documents because it is not required
to load the entire document)
DOM (Document Object Model)
• A DOM-based parser exposes a programmatic library called the DOM
API that allows data in an XML document to be accessed and modified by
manipulating the nodes in a DOM tree. DOM API is available in many
languages e.g. JavaScript.
• Data can be accessed quickly as all the document’s data is in memory.
• The DOM interfaces for creating and manipulating XML documents are
platform and language dependant. DOM parsers exist for Java, C, C++,
Python and Perl.
• JDOM provides a higher-level API than the W3C DOM for working with
XML documents in Java. See www.jdom.org
- provides full tree representation of the XML document
- allows random access to any node
- provides a variety of output formats
- less memory intensive than DOM API
• In order to use DOM API, programming experience is required.
SAX (Simple API for XML)
• Developed by the members of the XML-DEV mailing list
• Released in May 1998
• SAX and DOM are totally different APIs for accessing information in
XML documents.
• SAX based parsers invoke methods when markup (e.g. a start tag,
end tag etc.) is encountered. With this event based model, no tree
structure is created to store data. Instead, data is passed to the
application from the XML document as it is found.
=> greater performance and less memory overhead than with DOM
• Many DOM parsers use a SAX parser to retrieve data for building the
DOM tree.
• SAX parsers are typically used for reading documents that will not be
modified.
Parsing (msxml) and rendering XML
with IE
• XML document contains data, NOT formatting information.
• When XML document is loaded into IE5+, the document is
parsed by msxml.
• If the document is well-formed, the parser makes the
document’s data available to the application (I.e. IE5).
• The application can format and render the data and also
perform other processing.
• IE5 renders data by applying a stylesheet that formats and
colours the markup identically to the original document.
• Notice the - sign. It indicates that child elements are visible.
When clicked, it becomes + hiding the children.
• This behaviour is similar to viewing disk directory structure
using a program such as Windows Explorer.
Using XML:
How does browser read XML ?
◆ XML parser: A tool for reading XML documents.
◆ To manipulate an XML document, you need an XML
parser. The parser loads the document into your
computer's memory. Once the document is loaded,
its data can be manipulated using the DOM. The
DOM treats the XML document as a tree.
◆ Once you have installed Internet Explorer 5.0, the
Microsoft XML parser is available.
◆ http://www.w3schools.com/xml/xml_parser.asp
◆ https://developer.mozilla.org/en-
US/docs/Archive/Mozilla/XML_in_Mozilla (XML in
Mozilla)
Using XML: Presenting Data
➢ <lastname>Smith</lastname>
➢ <b>Smith</b> Smith
Extensible Stylesheet Language (XSL)
• XML is just data - no presentation information
• To present the data on the screen or paper or any media - apply appropriate style
• Style sheets contain rules that instruct the processor how to present elements
• Two style languages: CSS (Cascading Style Sheets) and XSL
• XSL is powerful than CSS and an excellent solution to control the presentation of
data
- resource intensive: memory and processing power
- complex to write
• transforms and translates XML data from one format into another
same document needed to be displayed in HTML, PDF and postscript form
CSS and XSL
◆ CSS - Cascading Style Sheets
➢ can predefined HTML display (font etc)
➢ these are shared and reused
<xsl:template match=”EmployeeRecord/Name">
<Bold>
<xsl:apply-templates select=“FirstName”/>
</Bold>
</xsl:template> The templates is applied only to the
`FirstName’ element of the `Name’
element contained in `EmployeeRecord’.
Options for Displaying XML
XSL XSL HTML
Transformation Transformation Document Web Browser
spec
Boeing
<student_list>
<student> Only data
<id> 9906789 </id>
<name>Adam</name>
<email>adam@unl.ac.uk</email>
• Data is self-describing
<bsc level=“final”>yes</bsc>
</student>
• custom tags describe content
(define your own tags)
<student>
<id> 9806791 </id>
• easy to locate data
<name>Adrian</name>
(e.g. all BSC students)
<email>adrian@unl.ac.uk</email>
<bsc>no</bsc>
</student>
</student_list>
The Framework of WWW
HTML
Web Designer External Applications
Authoring Non-HTTP objects
& Publisher
Tools/Editors
• JAVA Servlet
• CGI (Perl)
• ASP & ASP.NET
• Java Server Pages
• Java Applet
• JavaScript
Web Programmer
Web
Browser
Internet
Global Reach
Broad Range Web
Server
Client
End User Web Master
Why Build Pages Dynamically?
◆ The Web page is based on data submitted by the user
➢ E.g., results page from search engines and order-
confirmation pages at on-line stores
◆ The Web page is derived from data that changes
frequently
➢ E.g., a weather report or news headlines page
◆ The Web page uses information from databases or
other server-side sources
➢ E.g., an e-commerce site could use a servlet to build a
Web page that lists the current price and availability of
each item that is for sale