0% found this document useful (0 votes)
20 views110 pages

Lecture 09

The document discusses XML syntax, semantics, and its use as semistructured data. XML is presented as a flexible syntax for data that can be used for configuration files, document markup, and data exchange. Key XML concepts discussed include elements, attributes, and its tree-like structure.

Uploaded by

K N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views110 pages

Lecture 09

The document discusses XML syntax, semantics, and its use as semistructured data. XML is presented as a flexible syntax for data that can be used for configuration files, document markup, and data exchange. Key XML concepts discussed include elements, attributes, and its tree-like structure.

Uploaded by

K N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Lecture 9

XML/Xpath/XQuery

Tuesday, May 26, 2009

1
XML Outline
•  XML
–  Syntax
–  Semistructured data
–  DTDs
•  Xpath
•  XQuery

2
Additional Readings on XML
•  http://www.w3.org/XML/
–  Main source on XML, but hard to read

•  http://www.w3.org/TR/xquery/
–  Authority on Xquery

•  http://www.galaxquery.org/
–  An easy to use, complete XQuery
implementation

Note: XML/XQuery is NOT covered in the textbook 3


XML
•  A flexible syntax for data
•  Used in:
–  Configuration files, e.g. Web.Config
–  Replacement for binary formats (MS Word)
–  Document markup: e.g. XHTML
–  Data: data exchange, semistructured data
•  Roots: SGML - a very nasty language

We will study only XML as data 4


XML as Semistructured Data
•  Relational databases have rigid schema
–  Schema evolution is costly
•  XML is flexible: semistructured data
–  Store data in XML
•  Warning: not normal form ! Not even
1NF

5
From HTML to XML

HTML describes the presentation


6
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
7
XML Syntax
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>

</bibliography>

XML describes the content 8


XML Terminology
•  tags: book, title, author, …
•  start tag: <book>, end tag: </book>
•  elements: <book>…</book>,<author>…</author>
•  elements are nested
•  empty element: <red></red> abbrv. <red/>
•  an XML document: single root element

well formed XML document: if it has matching tags


9
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>

<year> 1995 </year>
</book>

10
Attributes v.s. Elements
<book price = “55” currency = “USD”> <book>
<title> Foundations of DBs </title> <title> Foundations of DBs </title>
<author> Abiteboul </author> <author> Abiteboul </author>
… …
<year> 1995 </year> <year> 1995 </year>
</book> <price> 55 </price>
<currency> USD </currency>
</book>

attributes are alternative ways to represent data


11
Comparison

Elements Attributes

Ordered Unordered

May be repeated Must be unique

May be nested Must be atomic

12
XML v.s. HTML
•  What are the differences between XML
and HTML ?

In class

13
That’s All !
•  That’s all you ever need to know about
XML syntax
–  Optional type information can be given in
the DTD or XSchema (later)
–  We’ll discuss some additional syntax in the
next few slides, but that’s not essential
•  What is important for you to know:
XML’s semantics
14
More Syntax: Oids and References
<person id=“o555”>
<name> Jane </name>
</person> Are just keys/ foreign keys design
by someone who didn’t take 444
<person id=“o456”>
Don’t use them: use your own
<name> Mary </name> foreign keys instead.
<mother idref=“o555”/>
</person>

oids and references in XML are just syntax


15
More Syntax: CDATA Section
•  Syntax: <![CDATA[ .....any text here...]]>

•  Example:

<example>
<![CDATA[ some text here </notAtag> <>]]>
</example>

16
More Syntax: Entity
References
•  Syntax: &entityname;
•  Example:
<element> this is less than &lt; </
element> &lt; <
•  Some entities: &gt; >
&amp; &
&apos

;
&quot; “
17
&#38; Unicode char
More Syntax: Comments
•  Syntax <!-- .... Comment text... -->

•  Yes, they are part of the data model !!!

18
XML Namespaces
just a unique
•  name ::= [prefix:]localpart name

<book xmlns:bookStandard=“www.isbn-org.org/def”>
<bookStandard:title> … </bookStandard:title>
<bookStandard:publisher> . . .</bookStandard:publisher>

</book>

19
XML Semantics: a Tree !
Element
Attribute node
<data> node data
<person id=“o555” >
person
<name> Mary </name>
<address> person
<street>Maple</street>
id
<no> 345 </no>
<city> Seattle </city> name address
address
</address> name
phone
</person> o555
<person> street no city
Mary Thai
<name> John </name> John
<address>Thailand 23456
</address> Maple 345 Text
<phone>23456</phone> Seattle
node
</person>
</data> Order matters !!! 20
XML as Data
•  XML is self-describing
•  Schema elements become part of the data
–  Reational schema: persons(name,phone)
–  In XML <persons>, <name>, <phone> are part of
the data, and are repeated many times
•  Consequence: XML is much more flexible
•  XML = semistructured data

21
Mapping Relational Data to XML
The canonical mapping: XML: persons

row row row

phone
Persons name phone name phone name
“John” 3634 “Sue” 6343 “Dick” 6363

Name Phone <persons>


John 3634 <row> <name>John</name>
<phone> 3634</phone></row>
Sue 6343 <row> <name>Sue</name>
<phone> 6343</phone>
Dick 6363
<row> <name>Dick</name>
<phone> 6363</phone></row>
</persons>
22
Mapping Relational Data to XML
XML
Natural mapping <persons>
<person>
Persons <name> John </name>
<phone> 3634 </phone>
Name Phone <order> <date> 2002 </date>
<product> Gizmo </product>
John 3634 </order>
<order> <date> 2004 </date>
Sue 6343 <product> Gadget </product>
</order>
Orders </person>
<person>
<name> Sue </name>
PersonName Date Product <phone> 6343 </phone>
John 2002 Gizmo <order> <date> 2004 </date>
<product> Gadget </product>
John 2004 Gadget </order>
Sue 2002 Gadget </person>
</persons> 23
XML is Semi-structured Data
•  Missing attributes:
<person> <name> John</name>
<phone>1234</phone>
</person>
<person> <name>Joe</name>
</person> no phone !

•  Could represent in
a table with nulls name phone
John 1234
Joe -
24
XML is Semi-structured Data
•  Repeated attributes
<person> <name> Mary</name>
<phone>2345</phone>
<phone>3456</phone>
</person>
Two phones !

•  Impossible in tables:
name phone
Mary 2345 3456 ???

25
XML is Semi-structured Data
•  Attributes with different types in different objects
<person> <name> <first> John </first>
<last> Smith </last>
</name>
<phone>1234</phone>
</person> Structured
name !

•  Nested collections (no 1NF)


•  Heterogeneous collections:
–  <db> contains both <book>s and <publisher>s

26
Document Type Definitions
DTD
•  part of the original XML specification
•  an XML document may have a DTD
•  XML document:
Well-formed = if tags are correctly closed
Valid = if it has a DTD and conforms to it
•  validation is useful in data exchange

27
DTD
Goals:
•  Define what tags and attributes are
allowed
•  Define how they are nested
•  Define how they are ordered

Superseded by XML Schema


•  Very complex: DTDs still used widely 28
Very Simple DTD
<!DOCTYPE company [
<!ELEMENT company ((person|product)*)>
<!ELEMENT person (ssn, name, office, phone?)>
<!ELEMENT ssn (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT office (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT product (pid, name, description?)>
<!ELEMENT pid (#PCDATA)>
<!ELEMENT description (#PCDATA)>
]>
29
Very Simple DTD
Example of valid XML document:
<company>
<person> <ssn> 123456789 </ssn>
<name> John </name>
<office> B432 </office>
<phone> 1234 </phone>
</person>
<person> <ssn> 987654321 </ssn>
<name> Jim </name>
<office> B123 </office>
</person>
<product> ... </product>
...
</company>
30
DTD: The Content Model
<!ELEMENT tag (CONTENT)>
content
•  Content model: model
–  Complex = a regular expression over other elements
–  Text-only = #PCDATA
–  Empty = EMPTY
–  Any = ANY
–  Mixed content = (#PCDATA | A | B | C)*

31
DTD: Regular Expressions
DTD XML
sequence
<!ELEMENT name <name>
<firstName> . . . . . </firstName>
(firstName, lastName))> <lastName> . . . . . </lastName>
</name>

optional
<!ELEMENT name (firstName?, lastName))>
<person>
<name> . . . . . </name>
Kleene star <phone> . . . . . </phone>
<phone> . . . . . </phone>
<!ELEMENT person (name, phone*))> <phone> . . . . . </phone>
......
</person>
alternation
<!ELEMENT person (name, (phone|email)))> 32
SKIPPED MATERIAL:
XSchema
•  Generalizes DTDs

•  Uses XML syntax

•  Two parts: structure and datatypes

•  Very complex
–  criticized
–  alternative proposals: Relax NG
33
DTD v.s. XML Schemas
DTD:
<!ELEMENT paper (title,author*,year, (journal|conference))>
XML Schema:
<xs:element name=“paper” type=“paperType”/>
<xs:complexType name=“paperType”>
<xs:sequence>
<xs:element name=“title” type=“xs:string”/>
<xs:element name=“author” minOccurs=“0”/>
<xs:element name=“year”/>
<xs: choice> < xs:element name=“journal”/>
<xs:element name=“conference”/>
</xs:choice>
</xs:sequence>
</xs:element> 34
Example

A valid XML Document:

<paper>
<title> The Essence of XML </title>
<author> Simeon</author>
<author> Wadler</author>
<year>2003</year>
<conference> POPL</conference>
</paper>

35
Elements v.s. Types

<xs:element name=“person”> <xs:element name=“person”


<xs:complexType> type=“ttt”>
<xs:sequence> <xs:complexType name=“ttt”>
<xs:element name=“name” <xs:sequence>
type=“xs:string”/> <xs:element name=“name”
<xs:element name=“address” type=“xs:string”/>
type=“xs:string”/> <xs:element name=“address”
</xs:sequence> type=“xs:string”/>
</xs:complexType> </xs:sequence>
</xs:element> </xs:complexType>

Both say the same thing; in DTD:

<!ELEMENT person (name,address)> 36


•  Types:
–  Simple types (integers, strings, ...)
–  Complex types (regular expressions, like in DTDs)

•  Element-type Alternation:
–  An element has a type
–  A type is a regular expression of elements

37
Local v.s. Global Types
•  Local type:
<xs:element name=“person”>
[define locally the person’s type]
</xs:element>
•  Global type:
<xs:element name=“person” type=“ttt”/>

<xs:complexType name=“ttt”>
[define here the type ttt]
</xs:complexType>
38
Global types: can be reused in other elements
Local v.s. Global Elements
•  Local element:
<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element name=“address” type=“...”/>...
</xs:sequence>
</xs:complexType>
•  Global element:
<xs:element name=“address” type=“...”/>

<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element ref=“address”/> ...
</xs:sequence>
</xs:complexType>

Global elements: like in DTDs


39
Regular Expressions
Recall the element-type-element alternation:
<xs:complexType name=“....”>
[regular expression on elements]
</xs:complexType>
Regular expressions:
•  <xs:sequence> A B C </...> =ABC
•  <xs:choice> A B C </...> =A|B|C
•  <xs:group> A B C </...> = (A B C)
•  <xs:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*
•  <xs:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?

40
Local Names
<xs:element name=“person”>
<xs:complexType>
name has . . . . .
<xs:element name=“name”>
different meanings <xs:complexType>
<xs:sequence>
in person and <xs:element name=“firstname” type=“xs:string”/>
in product <xs:element name=“lastname” type=“xs:string”/>
</xs:sequence>
</xs:element>
. . . .
</xs:complexType>
</xs:element>

<xs:element name=“product”>
<xs:complexType>
. . . . .
<xs:element name=“name” type=“xs:string”/>

</xs:complexType>
</xs:element> 41
Subtle Use of Local Names
<xs:element name=“A” type=“oneB”/> <xs:complexType name=“oneB”>
<xs:choice>
<xs:element name=“B” type=“xs:string”/>
<xs:complexType name=“onlyAs”> <xs:sequence>
<xs:choice> <xs:element name=“A” type=“onlyAs”/>
<xs:sequence> <xs:element name=“A” type=“oneB”/>
<xs:element name=“A” type=“onlyAs”/> </xs:sequence>
<xs:element name=“A” type=“onlyAs”/> <xs:sequence>
</xs:sequence> <xs:element name=“A” type=“oneB”/>
<xs:element name=“A” type=“xs:string”/> <xs:element name=“A” type=“onlyAs”/>
</xs:choice> </xs:sequence>
</xs:complexType> </xs:choice>
</xs:complexType>

Arbitrary deep binary tree with A elements, and a single B element

Note: this example is not legal in XML Schema (why ?)


Hence they cannot express all regular tree languages 42
Attributes in XML Schema
<xs:element name=“paper” type=“papertype”>
<xs:complexType name=“papertype”>
<xs:sequence>
<xs:element name=“title” type=“xs:string”/>
......
</xs:sequence>
<xs:attribute name=“language" type="xs:NMTOKEN" fixed=“English"/>
</xs:complexType>
</xs:element>

Attributes are associated to the type, not to the element


Only to complex types; more trouble if we want to add attributes
to simple types.
43
“Mixed” Content, “Any” Type
<xs:complexType mixed="true">
. . . .
•  Better than in DTDs: can still enforce the type, but
now may have text between any elements

<xs:element name="anything" type="xs:anyType"/>


....
•  Means anything is permitted there

44
“All” Group
<xs:complexType name="PurchaseOrderType">
<xs:all> <xs:element name="shipTo" type="USAddress"/>
<xs:element name="billTo" type="USAddress"/>
<xs:element ref="comment" minOccurs="0"/>
<xs:element name="items" type="Items"/>
</xs:all>
<xs:attribute name="orderDate" type="xs:date"/>
</xs:complexType>

•  A restricted form of & in SGML


•  Restrictions:
–  Only at top level
–  Has only elements
–  Each element occurs at most once
•  E.g. “comment” occurs 0 or 1 times

45
Derived Types by Extensions
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>

<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>

46
Corresponds to inheritance
Derived Types by Restrictions

<complexContent>
<restriction base="ipo:Items“>
… [rewrite the entire content, with restrictions]...
</restriction>
</complexContent>

•  (*): may restrict cardinalities, e.g.


(0,infty) to (1,1); may restrict choices;
other restrictions…
Corresponds to set inclusion 47
Simple Types
•  String •  Time
•  Token •  dateTime
•  Byte •  Duration
•  unsignedByte
•  Date
•  Integer
•  ID
•  positiveInteger
•  Int (larger than integer)
•  IDREF
•  unsignedInt •  IDREFS
•  Long
•  Short
•  ...

48
Facets of Simple Types
Facets = additional properties restricting a simple type
15 facets defined by XML Schema

Examples •  maxInclusive
•  length •  maxExclusive
•  minLength
•  minInclusive
•  maxLength
•  pattern •  minExclusive
•  enumeration •  totalDigits
•  whiteSpace •  fractionDigits

49
Facets of Simple Types
•  Can further restrict a simple type by
changing some facets
•  Restriction = subset

50
Not so Simple Types
•  List types:
<xs:simpleType name="listOfMyIntType">
<xs:list itemType="myInteger"/>
</xs:simpleType>

<listOfMyInt>20003 15037 95977 95945</listOfMyInt>

•  Union types
•  Restriction types
51
END OF SKIPPED MATERIAL
Discussion 1
What kinds of applications might use
XML ?

52
Discussion 1
What kinds of applications might use
XML ?
•  Data exchange
–  Take the data, don’t worry about schema
•  Property lists
–  Many attributes, most are NULL
•  Evolving schema
–  Add quickly a new attribute
53
Discussion 2
How is XML processed ?

54
Discussion 2
How is XML processed ?
•  Via API
–  Called DOM
–  Navigate, update the XML arbitrarily
–  BUT: memory bound
•  Via some query language:
–  Xpath or Xquery
–  Stand-alone processor OR embedded in SQL
55
Querying XML Data
Will discuss next:

•  XPath = simple navigation on the tree

•  XQuery = “the SQL of XML”

56
Sample Data for Queries
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib> 57
Data Model for XPath
The root

bib The root element

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul


58
XPath: Simple Expressions
/bib/book/year

Result: <year> 1995 </year>


<year> 1998 </year>

/bib/paper/year

Result: empty (there were no


papers)
/bib What’s the difference ? / 59
XPath: Restricted Kleene
Closure
//author
Result:<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<author> Jeffrey D. Ullman </author>

Result: <first-name> Rick </first-name>

/bib//first-name

60
Xpath: Attribute Nodes
/bib/book/@price
Result: “55”

@price means that price is has to be an


attribute

61
Xpath: Wildcard
//author/*
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>

* Matches any element


@* Matches any attribute

62
Xpath: Text Nodes
/bib/book/author/text()
Result: Serge Abiteboul
Victor Vianu
Jeffrey D. Ullman

Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:
–  text() = matches the text value
–  node() = matches any node (= * or @* or text())
–  name() = returns the name of the current tag

63
Xpath: Predicates
/bib/book/author[first-name]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>

64
Xpath: More Predicates
/bib/book/author[first-name][address[.//zip][city]]/last-name

Explain how this is evaluated !

65
Xpath: More Predicates
/bib/book/author[first-name][address[.//zip][city]]/last-name
Result: <lastname> … </lastname>
<lastname> … </lastname>

How do we read this ?


First remove all qualifiers (predicates):
/bib/book/author/last-name

Then add them one by one:


/bib/book/author[first-name][address]/last-name 66
Xpath: More Predicates

/bib/book[@price < 60]

/bib/book[author/@age < 25]

/bib/book[author/text()]

67
Xpath: More Axes

. means current node /bib/book[.//review]

/bib/book[./review] Same as /bib/book[review]

/bib/book/. /author Same as /bib/book/author


68
Xpath: More Axes

.. means parent node

/bib/book/author/../author Same as

/bib/book/author

/bib/book[.//first-name/../last-name] Same as

/bib/book[.//*[first-name][last-name]]
69
Xpath: Brief Summary
bib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book[@price<“55”]/author/lastname matches…

70
XQuery
•  Based on Quilt, which is based on XML-QL

•  Uses XPath to express more complex queries

71
FLWR (“Flower”) Expressions

FOR ...
LET...
WHERE...
RETURN...

72
FOR-WHERE-RETURN
Find all book titles published after 1995:
for $x in document("bib.xml")/bib/book
where $x/year/text() > 1995
return $x/title

Result:
<title> abc </title>
<title> def </title>
<title> ghi </title> 73
FOR-WHERE-RETURN
Equivalently (perhaps more geekish)

for $x in document("bib.xml")/bib/book[year/text() > 1995] /title


return $x

And even shorter:

document("bib.xml")/bib/book[year/text() > 1995] /title


74
FOR-WHERE-RETURN
•  Find all book titles and the year when
they were published:
for $x in document("bib.xml")/ bib/book
return <answer>
<title> { $x/title/text() } </title>
<year>{ $x/year/text() } </year>
</answer>

Result:
<answer> <title> abc </title> <year> 1995 </year > </answer>
<answer> <title> def </title> <year> 2002 </year > </answer>
<answer> <title> ghk </title> <year> 1980 </year > </answer>
75
FOR-WHERE-RETURN
•  Notice the use of “{“ and “}”
•  What is the result without them ?
for $x in document("bib.xml")/ bib/book
return <answer>
<title> $x/title/text() </title>
<year> $x/year/text() </year>
</answer>

76
FOR-WHERE-RETURN
•  Notice the use of “{“ and “}”
•  What is the result without them ?
for $x in document("bib.xml")/bib/book
return <answer>
<title> $x/title/text() </title>
<year> $x/year/text() </year>
</answer>

<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>


<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
77
Nesting
For each author of a book published in
1995, list all books she published:
for $b in document(“bib.xml”)/bib,
$a in $b/book[year/text()=1995]/author
return <result>
{ $a,
for $t in $b/book[author/text()=$a/text()]/title
return $t
}
</result>

In the RETURN clause comma concatenates XML fragments


78
Result
<result>
<author>Jones</author>
<title> abc </title>
<title> def </title>
</result>
<result>
<author> Smith </author>
<title> ghi </title>
</result>
79
Aggregates
Find all books with more than 3 authors:

for $x in document("bib.xml")/bib/book
where count($x/author)>3
return $x

count = a function that counts


avg = computes the average
sum = computes the sum
distinct-values = eliminates duplicates 80
Aggregates
Same thing:

for $x in document("bib.xml")/bib/book[count(author)>3]
return $x

81
Aggregates
Print all authors who published more than
3 books

for $b in document("bib.xml")/bib,
$a in distinct-values($b/book/author/text())
where count($b/book[author/text()=$a])>3
return <author> { $a } </author>

82
Flattening
•  “Flatten” the authors, i.e. return a list of
(author, title) pairs
for $b in document("bib.xml")/bib/book, Result:
$x in $b/title/text(), <answer>
$y in $b/author/text() <title> abc </title>
<author> efg </author>
return <answer> </answer>
<title> { $x } </title> <answer>
<author> { $y } </author> <title> abc </title>
</answer> <author> hkj </author>
</answer>
83
Re-grouping
•  For each author, return all titles of her/
his books Result:
<answer>
for $b in document("bib.xml")/bib <author> efg </author>
let $a:=distinct-values($b/book/author/text()) <title> abc </title>
for $x in $a <title> klm </title>
....
return </answer>
<answer>
<author> { $x } </author>
{ for $y in $b/book[author/text()=$x]/title
return $y }
</answer> 84
Re-grouping
•  Same thing:
for $b in document("bib.xml")/bib,
$x in distinct-values($b/book/author/text())
return
<answer>
<author> { $x } </author>
{ for $y in $b/book[author/text()=$x]/title
return $y }
</answer>

85
SQL and XQuery Side-by-side
Find all product names, prices,
Product(pid, name, maker, price)
sort by price

SELECT x.name, for $x in document(“db.xml”)/db/product/row


x.price order by $x/price/text()
FROM Product x return <answer>
ORDER BY x.price { $x/name, $x/price }
</answer>

SQL
XQuery

86
Xquery’s Answer
<answer>
<name> abc </name>
<price> 7 </price>
</answer>
<answer>
<name> def </name>
<price> 23 </price>
</answer>
....

87
SQL and XQuery Side-by-side
Product(pid, name, maker, price)
Find all products made in Seattle
Company(cid, name, city, revenues)
for $r in document(“db.xml”)/db,
$x in $r/product/row,
SELECT x.name
$y in $r/company/row
FROM Product x, Company y
where
WHERE x.maker=y.cid
$x/maker/text()=$y/cid/text()
and y.city=“Seattle”
and $y/city/text() = “seattle”
return { $x/name }
SQL XQuery
for $y in /db/company/row[city/text()=“seattle”],
Cool $x in /db/product/row[maker/text()=$y/cid/text()]
XQuery return { $x/name } 88
<product>
<row> <pid> 123 </pid>
<name> abc </name>
<maker> efg </maker>
</row>
<row> …. </row>

</product>
<product>
...
</product>
....

89
SQL and XQuery Side-by-side
For each company with revenues < 1M, count how many
products with price > $100 they make
SELECT y.name, count(*)
FROM Product x, Company y
WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000
GROUP BY y.cid, y.name
for $r in document(“db.xml”)/db,
$y in $r/company/row[revenue/text()<1000000]
return
<proudcompany>
<companyname> { $y/name/text() } </companyname>
<numberofexpensiveproducts>
{count($r/product/row[maker/text()=$y/cid/text()][price/text()>100])}
</numberofexpensiveproducts>
</proudcompany> 90
SQL and XQuery Side-by-side
Find companies with at least 30 products, and their average price
SELECT y.name, avg(x.price)
FROM Product x, Company y $r=element
WHERE x.maker=y.cid
GROUP BY y.cid, y.name
HAVING count(*) > 30
for $r in document(“db.xml”)/db,
$y in $r/company/row
let $p := $r/product/row[maker/text()=$y/cid/text()]
$y=collection where count($p) > 30
return
<thecompany>
<companyname> { $y/name/text() }
</companyname>
<avgprice> avg($p/price/text()) </avgprice>
</thecompany> 91
FOR v.s. LET

FOR
•  Binds node variables  iteration

LET
•  Binds collection variables  one value

92
FOR v.s. LET

Returns:
for $x in /bib/book <result> <book>...</book></result>
return <result> { $x } </result> <result> <book>...</book></result>
<result> <book>...</book></result>
...

let $x := /bib/book Returns:


<result> <book>...</book>
return <result> { $x } </result> <book>...</book>
<book>...</book>
...
</result>
93
XQuery
Summary:
•  FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses

List of tuples

WHERE Clause

List of tuples

RETURN Clause

94
Instance of Xquery data model
XML in SQL Server 2005
•  Create tables with attributes of type XML

•  Use Xquery in SQL queries

•  Rest of the slides are from:


Shankar Pal et al., Indexing XML data stored in
a relational database, VLDB’2004

95
CREATE TABLE DOCS (
ID int primary key,
XDOC xml)

SELECT ID, XDOC.query(’


for $s in /BOOK[@ISBN= “1-55860-438-3”]//SECTION
return <topic>{data($s/TITLE)} </topic>')
FROM DOCS

96
XML Methods in SQL
•  Query() = returns XML data type
•  Value() = extracts scalar values
•  Exist() = checks conditions on XML
nodes
•  Nodes() = returns a rowset of XML
nodes that the Xquery expression
evaluates to

97
Examples
•  From here:
http://msdn.microsoft.com/library/
default.asp?url=/library/en-us/dnsql90/
html/sql2k5xml.asp

98
XML Type

CREATE TABLE docs (


pk INT PRIMARY KEY,
xCol XML not null
)

99
Inserting an XML Value

INSERT INTO docs VALUES (2,


'<doc id="123">
<sections>
<section num="1"><title>XML Schema</title></section>
<section num="3"><title>Benefits</title></section>
<section num="4"><title>Features</title></section>
</sections>
</doc>')

100
Query( )

SELECT pk, xCol.query('/doc[@id = 123]//section')


FROM docs

101
Exists( )

SELECT xCol.query('/doc[@id = 123]//section')


FROM docs
WHERE xCol.exist ('/doc[@id = 123]') = 1

102
Value( )

SELECT xCol.value(
'data((/doc//section[@num = 3]/title)[1])', 'nvarchar(max)')
FROM docs

103
Nodes( )

SELECT nref.value('first-name[1]', 'nvarchar(50)')


AS FirstName,
nref.value('last-name[1]', 'nvarchar(50)')
AS LastName
FROM @xVar.nodes('//author') AS R(nref)
WHERE nref.exist('.[first-name != "David"]') = 1

104
Nodes( )

SELECT nref.value('@genre', 'varchar(max)') LastName


FROM docs CROSS APPLY xCol.nodes('//book') AS R(nref)

105
Internal Storage
•  XML is “shredded” as a table
•  A few important ideas:
–  Dewey decimal numbering of nodes; store in clustered B-
tree indes
–  Use only odd numbers to allow insertions
–  Reverse PATH-ID encoding, for efficient processing of
postfix expressions like //a/b/c
–  Add more indexes, e.g. on data values

106
<BOOK ISBN=“1-55860-438-3”>
<SECTION>
<TITLE>Bad Bugs</TITLE>
Nobody loves bad bugs.
<FIGURE CAPTION=“Sample bug”/>
</SECTION>
<SECTION>
<TITLE>Tree Frogs</TITLE>
All right-thinking people
<BOLD> love </BOLD>
tree frogs.
</SECTION>
</BOOK>
107
108
109
Infoset Table
/BOOK[@ISBN = “1-55860-438-3”]/SECTION

SELECT SerializeXML (N2.ID, N2.ORDPATH)


FROM infosettab N1 JOIN infosettab N2 ON (N1.ID = N2.ID)
WHERE N1.PATH_ID = PATH_ID(/BOOK/@ISBN)
AND N1.VALUE = '1-55860-438-3'
AND N2.PATH_ID = PATH_ID(BOOK/SECTION)
AND Parent (N1.ORDPATH) = Parent (N2.ORDPATH)

110

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy