Lecture 17 XML and XPATH and XQUERY
Lecture 17 XML and XPATH and XQUERY
XPath
Xquery
Slides from the textbook webpage:
http://infolab.stanford.edu/~ullman/dscb.html
1
documents.
2. Elements are pieces of a document
consisting of some opening tag, its
matching closing tag (if any), and
everything in between.
3. Attributes names that are given
values inside opening tags.
6
Document Nodes
Formed by doc(URL) or document(URL).
Example: doc(/usr/class/cs475/bars.xml)
All XPath (and XQuery) queries refer to a
doc node, either explicitly or implicitly.
Example: key definitions in XML Schema
have Xpath expressions that refer to the
document described by the schema.
7
Example Document
An element node
<BARS>
<BAR name = JoesBar>
<PRICE theBeer = Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
<BEER name = Bud soldBy = JoesBar
SuesBar />
An attribute node
</BARS>
Document node is all of this, plus
the header ( <? xml version ).
PRICE
2.50
name =
JoesBar
theBeer
= Bud
BEER
PRICE
3.00
theBeer =
Miller
name =
Bud
SoldBy
=
Rose =document
Green = element
Gold = attribute
Purple = primitive
value
10
11
Path Expressions
Simple path expressions are sequences
of slashes (/) and tags, starting with /.
Example: /BARS/BAR/PRICE
12
Example: /BARS
<BARS>
<BAR name = JoesBar>
<PRICE theBeer = Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
<BEER name = Bud soldBy = JoesBar
SuesBar />
</BARS>
One item, the
BARS element
14
Example: /BARS/BAR
<BARS>
<BAR name = JoesBar>
<PRICE theBeer =Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
<BEER name = Bud soldBy = JoesBar
SuesBar />
This BAR element followed by
</BARS>
all the other BAR elements
15
Example: /BARS/BAR/PRICE
<BARS>
<BAR name = JoesBar>
<PRICE theBeer =Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
<BEER name = Bud soldBy = JoesBar
SuesBar />
These PRICE elements followed
</BARS>
by the PRICE elements
of all the other bars.
16
Attributes in Paths
Instead of going to subelements with a
given tag, you can go to an attribute of
the elements you already have.
An attribute is indicated by putting @ in
front of its name.
17
Example:
/BARS/BAR/PRICE/@theBeer
<BARS>
<BAR name = JoesBar>
<PRICE theBeer = Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
<BEER name = Bud soldBy = JoesBar
SuesBar /> These attributes contribute
Bud Miller to the result,
</BARS>
followed by other theBeer
18
values.
19
20
Example: //PRICE
<BARS>
<BAR name = JoesBar>
<PRICE theBeer =Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
<BEER name = Bud soldBy = JoesBar
SuesBar />
These PRICE elements and
</BARS>
any other PRICE elements
in the entire document
21
Wild-Card *
A star (*) in place of a tag represents
any one tag.
Example: /*/*/PRICE represents all
price objects at the third level of
nesting.
22
Example: /BARS/*
This BAR element, all other BAR
elements, the BEER element, all
other BEER elements
<BARS>
<BAR name = JoesBar>
<PRICE theBeer = Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
<BEER name = Bud soldBy = JoesBar
SuesBar />
</BARS>
23
Selection Conditions
A condition inside [] may follow a tag.
If so, then only paths that have that
tag and also satisfy the condition are
included in the result of a path
expression.
24
The current
element.
<BARS>
<BAR name = JoesBar>
<PRICE theBeer = Bud>2.50</PRICE>
<PRICE theBeer = Miller>3.00</PRICE>
</BAR>
The condition that the PRICE be
< $2.75 makes this price but not
the Miller price part of the result.
25
26
Axes
In general, path expressions allow us to
start at the root and execute steps to
find a sequence of nodes at each step.
At each step, we may follow any one of
several axes.
The default axis is child:: --- go to all the
children of the current set of nodes.
27
Example: Axes
/BARS/BEER is really shorthand for
/BARS/child::BEER .
@ is really shorthand for the attribute::
axis.
Thus, /BARS/BEER[@name = Bud ] is
shorthand for
/BARS/BEER[attribute::name = Bud]
28
More Axes
Some other useful axes are:
1. parent:: = parent(s) of the current
node(s).
2. descendant-or-self:: = the current
node(s) and all descendants.
Note: // is really shorthand for this axis.
XPath Syntax
Expression
Result
users
/users
users/user
//users
In this case, only the first part of the XPath needs to be true.
The password part becomes irrelevant, and the UserName part will
match ALL users because of the "1=1" condition.
This injection will allow the attacker to bypass the authentication
system.
Note that the big difference between XML files and SQL databases
is the lack of access control.
XPath does not have any restrictions when querying the XML file.
Therefore it is possible to retrieve data from the entire document.
32
Summary
- What is XPath?
- XPath Syntax
- XPath Injection
33
Exercise
We want to export this data into an XML file. Write a DTD describing the
following structure for the XML file:
- there is one root element called stores
- the stores element contains a sequence of store sub elements, one for each
store in the database
- each store element contains one name, and one phone subelement, and a
sequence of product subelements, one for each product that the store sells.
Also, it has an attribute sid of type ID.
- each product element contains one name, one price, one description, and
one markup element, plus an attribute pid of type ID.
<!DOCTYPE CommodityData [
<!ELEMENT stores (store*)>
<!ELEMENT store (name, phone, product+)>
<!ELEMENT product (name, price, description, markup)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT markup (#PCDATA)>
<!ATTLIST product
pid
ID
#REQUIRED
>
<!ATTLIST store
sid
ID
#REQUIRED
>
]>
Review
- What is XPath?
- XPath Syntax
- XPath Injection
37
XQuery Motivation
XPath expressivity insufficient
no
no
no
no
join queries
changes to the XML structure possible
quantifiers
aggregation and functions
38
XQuery
XQuery extends XPath to a query
language that has power similar to SQL.
Uses the same sequence-of-items data
model.
XQuery is an expression language.
Like relational algebra --- any XQuery
expression can be an argument of any other
XQuery expression.
39
40
FLWR Expressions
1. One or more for and/or let clauses.
2. Then an optional where clause.
3. A return clause.
42
FOR Clauses
for <variable> in <expression>, . . .
Variables begin with $.
A for-variable takes on each item in the
sequence denoted by the expression, in
turn.
Whatever follows this for is executed
once for each value of the variable.
44
Our example
BARS document
Example: FOR
for $beer in
document(bars.xml)/BARS/BEER/@name
return
<BEERNAME> {$beer} </BEERNAME>
$beer ranges over the name attributes of all
beers in our example document.
Result is a sequence of BEERNAME elements:
<BEERNAME>Bud</BEERNAME>
<BEERNAME>Miller</BEERNAME> . . .
45
Use of Braces
When a variable name like $x, or an
expression, could be text, we need to
surround it by braces to avoid having it
interpreted literally.
Example: <A>$x</A> is an A-element
with value $x, just like <A>foo</A> is
an A-element with foo as value.
46
47
LET Clauses
let <variable> := <expression>, . . .
Value of the variable becomes the
sequence of items defined by the
expression.
Note let does not cause iteration; for
does.
48
Example: LET
let $d := document(bars.xml)
let $beers := $d/BARS/BEER/@name
return
<BEERNAMES> {$beers} </BEERNAMES>
Returns one element with all the names of
the beers, like:
<BEERNAMES>Bud Miller </BEERNAMES>
49
Order-By Clauses
FLWR is really FLWOR: an order-by clause
can precede the return.
Form: order by <expression>
With optional ascending or descending.
Example: Order-By
List all prices for Bud, lowest first.
let $d := document(bars.xml)
for $p in
$d/BARS/BAR/PRICE[@theBeer=Bud]
order by $p
Generates bindings
Order those bindings
for $p to PRICE
by the values inside
return $p
elements.
the elements.
Each binding is evaluated
for the output. The
result is a sequence of
PRICE elements.
51
Predicates
Normally, conditions imply existential
quantification.
Example: /BARS/BAR[@name] means all
the bars that have a name.
Example: /BARS/BEER[@soldAt =
JoesBar] gives the set of beers that are
sold at Joes Bar.
52
Example: Comparisons
Let us produce the PRICE elements (from all
bars) for the beers that are sold by Joes Bar.
53
Strategy
1. Create a triple for-loop, with variables
ranging over all BEER elements, all BAR
elements, and all PRICE elements within
those BAR elements.
2. Check that the beer is sold at Joes Bar and
that the name of the beer and theBeer in
the PRICE element match.
3. Construct the output element.
54
The Query
let $bars = doc(bars.xml)/BARS
for $beer in $bars/BEER
True if JoesBar
appears anywhere
for $bar in $bars/BAR
in the sequence
for $price in $bar/PRICE
where $beer/@soldAt = JoesBar and
$price/@theBeer = $beer/@name
return <BBP bar = {$bar/@name} beer
= {$beer/@name}>{$price}</BBP>
55
Strict Comparisons
To require that the things being
compared are sequences of only one
element, use the Fortran comparison
operators:
eq, ne, lt, le, gt, ge.
59
60
Example: data()
Suppose we want to modify the return
for find the prices of beers at bars that
sell a beer Joe sells to produce an empty
BBP element with price as one of its
attributes.
61
Previous Query
let $bars = doc(bars.xml)/BARS
for $beer in $bars/BEER
for $bar in $bars/BAR
for $price in $bar/PRICE
where $beer/@soldAt = JoesBar and
$price/@theBeer = $beer/@name
return <BBP bar = {$bar/@name} beer
= {$beer/@name}>{$price}</BBP>
62
Modified Query
let $bars = doc(bars.xml)/BARS
for $beer in $bars/BEER
for $bar in $bars/BAR
for $price in $bar/PRICE
where $beer/@soldAt = JoesBar and
$price/@theBeer = $beer/@name
return <BBP bar = {$bar/@name} beer =
{$beer/@name} price = {data($price)} />
63
Eliminating Duplicates
Use function distinct-values
applied to a sequence.
Subtlety: this function strips tags away
from elements and compares the string
values.
But it doesnt restore the tags in the result.
64
Exercise
We want to export this data into an XML file. Write a DTD describing the
following structure for the XML file:
- there is one root element called stores
- the stores element contains a sequence of store sub elements, one for each
store in the database
- each store element contains one name, and one phone subelement, and a
sequence of product subelements, one for each product that the store sells.
Also, it has an attribute sid of type ID.
- each product element contains one name, one price, one description, and
one markup element, plus an attribute pid of type ID.
<!DOCTYPE CommodityData [
<!ELEMENT stores (store*)>
<!ELEMENT store (name, phone, product+)>
<!ELEMENT product (name, price, description, markup)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT markup (#PCDATA)>
<!ATTLIST product
pid
ID
#REQUIRED
>
<!ATTLIST store
sid
ID
#REQUIRED
>
]>
Solutions
1. Let $d = document(stores.xml)
FOR $x IN $d//store[./product/price>50]/@sid
RETURN {$x}
69
XQuery Motivation
XPath expressivity insufficient
no
no
no
no
join queries
changes to the XML structure possible
quantifiers
aggregation and functions
70
XQuery Variables
FOR $x in expr -- binds $x to each
value in the list expr
LET $x := expr -- binds $x to the
entire list expr
Useful for common subexpressions and for
aggregations
72
Basic FLWR
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
74
Result Structuring
Find all book titles and the year when
they were published:
FOR $x IN document("bib.xml")/ bib/book
RETURN <answer>
{$x/title}
{$x/year}
</answer>
75
Result Structuring
Notice the use of { and }
What is the result without them ?
FOR $x IN document("bib.xml")/bib/book
RETURN <answer>
$x/title
$x/year
</answer>
76
FOR $x IN
document("bib.xml")/bib/book
RETURN <result> {$x} </result>
LET $x:=
document("bib.xml")/bib/book
RETURN <result> {$x} </result>
<result> <book>...</book></result>
<result> <book>...</book></result>
<result> <book>...</book></result>
...
Returns:
<result> <book>...</book>
<book>...</book>
<book>...</book>
...
</result>
77
Aggregates
Find all books with more than 3 authors:
FOR $x IN document("bib.xml")/bib/book
WHERE count($x/author)>3
RETURN $x
count = a function that counts
avg = computes the average
sum = computes the sum
distinct-values = eliminates duplicates
78
LET
Find all publishers that published more than 100
books:
FOR $p IN distinct-values(//publisher)
LET $b := /db/book[./publisher = $p]
WHERE count($b) > 100
RETURN <publisher> {$p} </publisher>
$b is a collection of elements, not a single element
79
Branching Expressions
if (E1) then E2 else E3 is evaluated by:
EBV Examples
1. @name=JoesBar has EBV TRUE or FALSE,
depending on whether the name attribute is
JoesBar.
2. /BARS/BAR[@name=GoldenRail] has EBV
TRUE if some bar is named the Golden Rail,
and FALSE if there is no such bar.
82
Boolean Operators
E1 and E2, E1 or E2, not(E ), apply to
any expressions.
Take EBVs of the expressions first.
Example: not(3 eq 5 or 0) has value
TRUE.
Also: true() and false() are functions
that return values TRUE and FALSE.
83
Quantifier Expressions
some $x in E1 satisfies E2
1. Evaluate the sequence E1.
2. Let $x (any variable) be each item in
the sequence, and evaluate E2.
3. Return TRUE if E2 has EBV TRUE for at
least one $x.
Analogously:
every $x in E1 satisfies E2
84
Example: Some
The bars that sell at least one beer for
less than $2.
for $bar in
doc(bars.xml)/BARS/BAR
where some $p in $bar/PRICE
satisfies $p < 2.00
return $bar/@name
85
Example: Every
The bars that sell no beer for more than
$5.
for $bar in
doc(bars.xml)/BARS/BAR
where every $p in $bar/PRICE
satisfies $p <= 5.00
return $bar/@name
86
Document Order
Comparison by document order: << and
>>.
Example: $d/BARS/BEER[@name=Bud]
<< $d/BARS/BEER[@name=Miller] is
true iff the Bud element appears before
the Miller element in the document $d.
87
Set Operators
union, intersect, except operate on
sequences of nodes.
Meanings analogous to SQL.
Result eliminates duplicates.
Result appears in document order.
88
XQuery Injection
XQuery Injection is a variant of the classic SQL
injection attack against the XML XQuery Language.
XQuery injection can be used to enumerate elements
on the victim's environment, inject commands to the
local host, or execute queries to remote files and
data sources.
89
Summary
Xquery
Assignment 5 is posted.
Next Topic: OLAP
91
Solutions
3.
FOR $p IN distinct(document(stores.xml)//product)
WHERE
EVERY $m IN (document(stores.xml)//product[./name = $p/name]/markup)
SATISFIES $m >= 15%
RETURN <result>{$p/name} {$p/price}</result>
93