Structured, Semistructured, and Unstructured Data (cont’d.) Schema information mixed in with data values Self-describing data May be displayed as a directed graph • Labels or tags on directed edges represent: • Schema names • Names of attributes • Object types (or entity types or classes) • Relationships
Structured, Semistructured, and Unstructured Data (cont’d.) Unstructured data Limited indication of the of data document that contains information embedded within it HTML tag Text that appears between angled brackets: <...> End tag Tag with a slash: </...>
Structured, Semistructured, and Unstructured Data (cont’d.) HTML uses a large number of predefined tags HTML documents Do not include schema information about type of data Static HTML page All information to be displayed explicitly spelled out as fixed text in HTML file
XML Hierarchical (Tree) Data Model (cont’d.) XML attributes Describe properties and characteristics of the elements (tags) within which they appear May reference another element in another part of the XML document Common to use attribute values in one element as the references
XML Documents, DTD, and XML Schema (cont’d.) Valid Document must be well formed Document must follow a particular schema Start and end tag pairs must follow structure specified in separate XML DTD (Document Type Definition) file or XML schema file
XML Documents, DTD, and XML Schema (cont’d.) Notation for specifying elements XML DTD Data types in DTD are not very general Special syntax • Requires specialized processors All DTD elements always forced to follow the specified ordering of the document • Unordered elements not permitted
XML Schema XML schema language Standard for specifying the structure of XML documents Uses same syntax rules as regular XML documents • Same processors can be used on both
Storing and Extracting XML Documents from Databases (cont’d.) Designing a specialized system for storing native XML data • Called Native XML DBMSs Creating or publishing customized XML documents from preexisting relational databases • Use a separate middleware software layer to handle conversions
XML Languages Two query language standards XPath • Specify path expressions to identify certain nodes (elements) or attributes within an XML document that match specific patterns XQuery • Uses XPath expressions but has additional constructs
XPath: Specifying Path Expressions in XML XPath expression Returns a sequence of items that satisfy a certain pattern as specified by the expression Either values (from leaf nodes) or elements or attributes Qualifier conditions • Further restrict nodes that satisfy pattern Separators used when specifying a path: Single slash (/) and double slash (//)
XPath: Specifying Path Expressions in XML (cont’d.) Attribute name prefixed by the @ symbol Wildcard symbol * Stands for any element Example: /company/*
XPath: Specifying Path Expressions in XML (cont’d.) Axes Move in multiple directions from current node in path expression Include self, child, descendent, attribute, parent, ancestor, previous sibling, and next sibling
XPath: Specifying Path Expressions in XML (cont’d.) Main restriction of XPath path expressions Path that specifies the pattern also specifies the items to be retrieved Difficult to specify certain conditions on the pattern while separately specifying which result items should be retrieved
Extracting XML Documents from Relational Databases Creating hierarchical XML views over flat or graph-based data Representational issues arise when converting data from a database system into XML documents UNIVERSITY database example
Other Steps for Extracting XML Documents from Databases Create correct query in SQL to extract desired information for XML document Restructure query result from flat relational form to XML tree structure Customize query to select either a single object or multiple objects into document
Summary Three main types of data: structured, semi- structured, and unstructured XML standard Tree-structured (hierarchical) data model XML documents and the languages for specifying the structure of these documents XPath and XQuery languages Query XML data
XML: Introduction To XML, Defining XML Tags, Their Attributes and Values, Document Type Definition, XML Schemas, Document Object Model, XHTML. Parsing XML Data - DOM and SAX Parsers in Java