Sunday, January 13, 2013

Databases (Course / Tutorial), Part 2 - XML Documents

XML DATA

XML can be considered an alternative for the Relational Data Model. There are situations in which using an XML rather than Ralational Model is more suitable. In general, you can say that XML is much more flexible because it was originally designed to share data. For example, a Relational Model requires a schema, which means you have to plan and design based on your understanding future of the data model your are building.

Today in many cases XML is use as an intermediary layer between users and database. For example, eBay has an XML that can be used to access their data and build applications from it, without directly accessing their database. Therefore, it could be said that XML is an excellent way to have computers talk to each other. For example airlines have XML Documents at specific locations in the internet that allow programmers and/or travel agencies to access their schedules and availability to create services like cheapoair or kayak.

XML Structure (Well-Formed)

 
 <Bookstore>  
      <Book ISBN="978-456321789" Price="15" Edition="2nd">  
           <Title>Learn Python</Title>  
           <Authors>  
                <Author>  
                     <First_Name>Pythonian</First_Name>  
                     <Last_Name>Pythoner</Last_Name>  
                </Author>  
                <Author>  
                     <First_Name>Ricky</First_Name>  
                     <Last_Name>Rich</Last_Name>  
                </Author>  
           </Authors>  
      </Book>  
      <Book ISBN="4218-127894413" Price="12">  
           <Title>Swimming</Title>  
           <Authors>  
                <Author>  
                     <First_Name>Michael</First_Name>  
                     <Last_Name>Phelps</Last_Name>  
                </Author>  
           </Authors>  
           <Remark>  
                Buy this book bundled with a CD - Great Deal  
           </Remark>  
      </Book>  
 </Bookstore>  

As seen above the document is indented and every tag has a closing tag. The tags can have any name you like or your project needs. Notice that the second book, doesn't have an Edition attribute. This is part of the great flexibility of XML. The structure of the XML can then be thought as:
  • Tagged Elements or Nested Elements
  • Attributes (in the elements)
  • Text
Well-Formed is the most flexible XML. The basic Structural requirements of Well-Formed XML are:
  1. Single Root element (In our example we start with  <Bookstore> , so we must have all our other XML content before the closing </Bookstore> ).
  2. Matched tags and proper Nesting (Above as you see all tags are indented depending to their level of hierarchy. For example, it would be wrong to have <Title>  at the same level of indentation as <Book>).
  3. Unique attributes within elements(For example, having 2 ISBN attributes in the <Book>  tag is not correct and therefore not valid for a valid Well-Formed XML)

XML Parser are used to validate the 3 structural requirements of an Well-Formed Document. When the document is parsed is returns a parsed XML in different standards DOM (Document Object Model), or SAX.

Well-Formed is useful when data is not structured, meaning that it is irregular. If there is a uncertainty about what kind of data a document may carry, then a more structured approach for the design of it could be to complex to use( for example DTD/XSD, explained below).

Dsiplay/Render XML

There are two options to diplay XML documents in a nice organized way. You can use CSS or XSL. CSS is the most know for it's usefulness for HTML styling, however, in XML XSL(Extensible style-sheet Language is often used).

A CSS/XSL interpreter is required. The interpreter has some rules for the conversion to the HTML output generated. The document (XML) should still be parsed to check for structural consistence.

Valid XML

The previously mentioned Well-Formed XML is suitable for many different purposes. However, there are cases in which considering content-specific rules for an XML document is required. Adhering content-specifics to an XML is achieved using DTD's or XML Schema(XSD), from which XML Schema is more powerful and widely used.

The validation process is similar to the Well-Formed XML. The only new difference, is that the parser now takes into account the requirements of the content-specific specification and verifies if the content of the document follows the specification for the document.

DTD

Since DTD is a specification language, what DTD allows you to define is:
  • Elements
  • Attributes
  • Nesting
  • Ordering
  • Frequency of Occurrences
  • ID and IDREF(S): Which are special attributes types that act as pointers within and XML Document
Why using DTD or XSD? This question can be considered in light of flexibility and your validation requirements. For example, if there is data coming in the XML that has DTD/XSD it is possible have the data pre-validated due to usage of structural standard for the XML. Thus, saving processing time and analysis time for consistency. Furthermore, working with documents that have a  defined structure make easier the task of using the information in the Document. It is also useful for documenting the data exchanging processes.

Conclusion

In the next post I will include example for both XSD and DTD. From this post the most important take, is to understand the what an XML is, how it structures make it useful. As you've seen in the code above, it is really simple. Tags can be named according to your needs and there are some basic principles to follow for validation if you are working with data that does not require or needs a structure. If the data requires a structure, you now know that there are ways of giving a the document a sort of schema, like in Relational DB's in the next post I will includes examples of this more structured content-specific satisfaction documents.

Databases (Course / Tutorial), Part 1 - Relational Model

This is the first article of a series of where I will explain the basics of Relational Databases. In this series I will cover Relational Database theory, XML, JSON, SQL, Querying, UML, Indexes, Views, Transactions, Constrains and Triggers, Authorization, Recursion in SQL, OLAP, and NoSQL Systems.

There will be a small section for Relational Algebra, which I consider you can skip. However, I highly recommend to go through it, it helps you to structure your relational analysis when designing a relation model.

There is no need to know a programming language. However, it would be best if you know one. I won't be providing any instructions on any language since this series deals only with databases. At some point I may use Python or Java to illustrate a db real life implementation, but the usage of the programming language is not required to understand the essence of the database theory.

If you have any question feel free to post them on the comments. I will try to answer them to the best of my knowledge, at the same time, I invite you to solve any questions that you may know the answer to in the comments.

Ok, so let's get started.

Introduction

Todays digital world is data driven. While programs are useful to manipulate, gather, analyze and present raw data in a way that it makes sense. Databases run in the background making the our interactions with websites, software, even shopping experience at the supermarket easier.

So in short we can think of databases as useful for:

  • Handling small and/or massive
  • Store data in a way that it outlives the programs that executes on that data. This means that data is persistent
  • Managing data safely following ACID (Atomicity, Consistency, Isolation, Durability). This means that databases should be:
    • Concurrent (multiple users)
    • Reliable (99.99999% up time)
    • Speed/Efficiency to query data
    • Transactions atomicity (i.e that data is process fully not partially)
It is important to know that Data-Intensive applications are not always using Database systems. Examples of data not stored in DB's is files like spreadsheets.

Relational Model Terminology

Database: Set of named Tables(or Relations)
Table/Relation: has a set of named Columns(or attributes) 
Row/Tuple: has a value/data for each Column(or attribute)
Column/Attribute: has a type(or domain)

Schema: Structure (description) of the relations in the database
Instance: Actual contents of the Table(Relation)

key: is an attribute whose values/fields should be unique for each Row/Tuple. The key values/fields uniquely identify a Row.
Null: value for undefined or unknown fields/value in the database.

Querying Relational Databases

To start I will not include any code for how to query a DB. Let's just focus on idea of how queries take place in a DB System.







To Query data we need:
1. Design (Scheme) 
2. Initial Data
3. Query language like SQL

When we perform a query we get a new relation. This makes it possible to query an initial query, that is known as composition.

Query Languages:
  • Relational Algebra - formal: in the example below we are querying employees id's that have a salary greater than 3500 and that have a job title of 'Assistant' to determine a promotion. This creates a natural join (⋈ ) between employee and promotion.
π (id) = σ Salary > 3500 ^ (job_title ="Assistant" (employee ⋈ promotion)
  • SQL = actual/implemented: The same query for the employee id for promotion in SQL would be:
 SELECT Employee.ID  
 FROM Employee, Promotion  
 WHERE Employee.ID = Promotion.ID  
 AND Salary>3500 and title="Assistant"  

Conclusion

From the above there might be a lot that seems confusing at first. Don't be dis-encouraged by the complexity of the Algebraic expressions used for the query. In real life applications you will not use them unless your are working on creating a database. 

From this first post, you learn the terminology and how an SQL looks like. You now know that a Relational Database is nothing more than a group of tables that are related with other tables. Later in this series you will see how relationships are established by using the Keys as the means to create relationships.

Links:

Additional to what I present here, I recommend the following link:

This video may be a little more graphic on the concepts that introduced here: