CH4 WEB Lecture2
CH4 WEB Lecture2
Part II
Document Object Model (DOM)
1
Introduction
• Web sites to include Formatting and images that can
be updated without the user having to reload a Web
page from the server
2
Introduction (Cont.)
3
Creating Dynamic Web Pages
• Dynamic:
4
Creating Dynamic Web Pages
(Cont.)
6
Creating Dynamic Web Pages
(Cont.)
7
The Document Object Model(DOM)
• Is at the core of DHTML
• Represents the Web page displayed in a window
• Each element on a Web page is represented in the
DOM by its own object
• This makes it possible for a JavaScript program to:
• Access individual elements on a Web page
• Change elements individually, without having to reload
the page from the server
8
…
• An object-based, language-neutral API for
XML and HTML documents
• Allows programs and scripts to build, access, and modify
documents
• Supports designing of querying, filtering, transformation,
formatting etc. applications on top of DOM implementations
• Instead of “Serial Access XML” could think as
“Directly Obtainable in Memory”
9
DOM structure model
• Based on DOM concepts:
• objects (encapsulation of data and methods)
• methods (to access or change object’s state)
• interfaces (declaration of a set of methods)
• Somewhat similar to the XPath data model (to be
discussed with XSLT and XQuery) syntax-tree
• Tree structure implied by abstract relationships
defined by the API; Data structures of an
implementation may differ
10
<invoice form="00"
type="estimated"> DOM structure model
<addressdata>
<name>John Doe</name>
<address>
<streetaddress>Pyynpolku 1
</streetaddress>
form="00"
<postoffice>70460 KUOPIO invoice type="estimated"
</postoffice>
</address>
</addressdata>
... addressdata ...
address
name
Document
John Doe streetaddress postoffice
Element
Pyynpolku 1 70460 KUOPIO
Text
NamedNodeMap 11
Structure of DOM Level 1
12
DOM Level 2
• Level 1: basic representation and manipulation of
document structure and content
(No access to the contents of a DTD)
• DOM Level 2 adds
• support for namespaces
• Document.getElementById("id_val"),
to access elements by ID attr values
• optional features (we’ll skip these)
• interfaces to document views and style sheets
• an event model (for user actions on elements)
• methods for traversing the document tree and manipulating
regions of document (e.g., selected in an editor)
13
DOM Language Bindings
• Language-independence:
• DOM interfaces are defined using OMG Interface
Definition Language (IDL).
• Language bindings (implementations of
interfaces) defined in the Recommendation
for
• Java (See the Java API doc) and
• ECMAScript (standardised JavaScript)
14
Core Interfaces: Node & its variants
Node
CharacterData
EntityReference ProcessingInstruction
15
Node
getNodeType, getNodeName, DOM interfaces: Node
getNodeValue
getOwnerDocument
getParentNode
hasChildNodes, getChildNodes
form="00"
getFirstChild, getLastChild invoice type="estimatedbill"
getPreviousSibling, getNextSibling
hasAttributes, getAttributes
appendChild(newChild)
insertBefore(newChild,refChild)
replaceChild(newChild,oldChild) ...
removeChild(oldChild) addressdata
name address
Document
John Doe streetaddress postoffice
Element
NamedNodeMap 16
Type and Name of a Node
• node.getNodeType():
short int constants 1, 2, …, 12 for
Node.ELEMENT_NODE,
Node.ATTRIBUTE_NODE,
Node.TEXT_NODE, …
• node.getNodeName()
• for an Element = element.getTagName()
• for an Attr: the name of the attribute.
17
The Value of a Node
• node.getNodeValue()
• content of a text node,
value of attribute, …;
null for an Element (Notice !)
• (C.f. XPath, where node’s value is its full textual
content)
• DOM 3 provides full text content with method
node.getTextContent()
18
Object Creation in DOM
19
Node DOM interfaces: Document
Document
getDocumentElement
getElementById(IdVal) form="00"
invoice type="estimated"
getElementsByTagName(tagName)
createElement(tagName)
createTextNode(data) ...
addressdata
address
name
Document
John Doe streetaddress postoffice
Element
Pyynpolku 1 70460 KUOPIO
Text
NamedNodeMap 20
Node
DOM interfaces: Element
Element
getTagName()
hasAttribute(name) invoice
getAttribute(name)
form="00"
setAttribute(attrName, value) invoicepage type="estimatedbill"
removeAttribute(name)
addressee
getElementsByTagName(name)
addressdata
Document
name address
Element
John Doe streetaddress postoffice
Text
22
DOM CharacterData
C.substringData(6, 5) = ?
C.substringData(0, C.getLength()) = ?
23
Interfaces to node collections (1)
1 E .getElementsByTagName(“E")=
2 3 4
E A E
5 6
A E
24
Typical child-node access pattern
Accessing specific nodes, or iterating over a
NodeList:
– to process all children of node:
for (i=0;
i<node.getChildNodes().getLength();
i++)
process(node.getChildNodes().item(i));
25
Interfaces to node collections (2)
26
NodeLists are “live”
cList A B C D
i=0
i=1
i=2
27
DOM: XML Implementations
• Java-based parsers
e.g. Apache Xerces, Apache Crimson, …
• In MS IE browser: COM programming interfaces for
C/C++ and Visual Basic; ActiveX object programming
interfaces for script languages
28
A Java-DOM Example
• Command-line tool RegListMgr for
maintaining a course registration list
• with single-letter commands for listing, adding,
updating and deleting student records
• Example:
29
Registration list: the XML file
<?xml version="1.0" ?>
<!DOCTYPE reglist SYSTEM "reglist.dtd">
<reglist lastID="41">
<student id="RDK1">
<name><given>Juho</given>
<family>Ahopelto</family></name>
<branchAndYear>TKT4</branchAndYear>
<email>juho@fake.addr.fi</email>
<group>2</group>
</student>
<!-- … and the other students … -->
</reglist>
30
Listing student records (1)
NodeList students =
doc.getElementsByTagName("student");
for (int i=0; i<students.getLength(); i++)
showStudent((Element) students.item(i));
private void showStudent(Element student) {
// Collect relevant sub-elements:
Node given =
student.getElementsByTagName("given").item(0);
Node family = given.getNextSibling();
Node bAndY = student.
getElementsByTagName("branchAndYear").item(0);
Node email = bAndY.getNextSibling();
Node group = email.getNextSibling();
31
Listing student records (2)
student.getAttribute("id").substring(3));
System.out.print(": " +
given.getFirstChild().getNodeValue() );
// or given.getTextContent() with DOM3
// .. similarly access and display the
// value of family, bAndY, email, and group
// …
} // showStudent
32
Lessons of accessing DOM
33
Adding New Records
• Example:
add students
> a
First name (or <return> to finish): Antti
Last name: Ahkera
Branch&year: tkt3
email: antti@fake.addr.fi
group: 2
First name (or <return> to finish):
Finished adding records
> l
…
41: heli viinikainen, tkt5, heli@fake.addr.fi, 1
42: Antti Ahkera, tkt3, antti@fake.addr.fi, 2
34
Implementing addition of records (1)
35
Implementing addition of records (2)
Element newStudent =
newStudent(doc, ID, firstName, lastName,
bAndY, email, group);
rootElem.appendChild(newStudent);
System.out.print(
"First name (or <return> to finish): ");
firstName = terminalReader.readLine().trim();
} // while firstName.length() > 0
// Update the last ID used:
String newLastID =
java.lang.Integer.toString(lastIDnum);
rootElem.setAttribute("lastID", newLastID);
System.out.println("Finished adding records");
36
Creating new student records (1)
private Element
newStudent(Document doc, String ID,
String fName, String lName, String bAndY,
String email, String grp) {
Element stu = doc.createElement("student");
stu.setAttribute("id", ID);
Element newName = doc.createElement("name");
Element newGiven = doc.createElement("given");
newGiven.appendChild(doc.createTextNode(fName));
Element newFamily = doc.createElement("family");
newFamily.appendChild(doc.createTextNode(lName));
newName.appendChild(newGiven);
newName.appendChild(newFamily);
stu.appendChild(newName);
37
Creating new student records (2)
38
Lessons of modifying DOM
• Each node must be created with
• Document.create...(“nameOrValue”)
• Attributes of an element more easily with
setAttribute(“name”, “value”)
• ... and connected to the structure
• Normally with parent.appendChild(newChild)
39
SAX
A parser for XML Documents
XML Parsers
• What is an XML parser?
• Software that reads and parses XML
• Passes data to the invoking application
• The application does something useful
with the data
XML Parsers
• Why is this a good thing?
• Since XML is a standard, we can write
generic programs to parse XML data
• Frees the programmer from writing a
new parser each time a new data
format comes along
XML Parsers
• Two types of parser
• SAX (Simple API for XML)
• Event driven API
• Sends events to the application as the
document is read
• DOM (Document Object Model)
• Reads the entire document into
memory in a tree structure
Simple API for XML
SAX Parser
• When should I use it?
• Large documents
• Memory constrained devices
• When should I use something else?
• If you need to modify the document
• SAX doesn’t remember previous events unless
you write explicit code to do so.
SAX Parser
• Which languages are supported?
• Java
• Perl
• C++
• Python
SAX Parser
• Versions
• SAX 1 introduced in May 1998
• SAX 2.0 introduced in May 2000 and
adds support for
• namespaces
• filter chains
• querying and setting properties in the
parser
SAX Parser
• Some popular SAX APIs
• Apache XML Project Xerces Java Parser
http://xml.apache.org/xerces-j/index.html
• IBM’s XML for Java (XML4J)
http://www.alphaworks.ibm.com/formula/xml
• For a complete list, see
http://www.megginson.com/SAX
SAX Implementation in Java
Import org.xml.sax.*;
import org.xml.sax.helpers.ParserFactory;