Why a post that goes about Datalog?
For those who do not have an idea about Datalog, I will start by answering this question “What is Datalog?”. And here I believe I won’t re-express things differently; to my knowledge Datalog contains a deductive database system. Queries and database updates are expressed using Datalog—a declarative logic language in which each formula is a function-free clause and every variable in the head of a clause must appear in the body of the clause. It is a non-procedural query language based on the logic-programming language Prolog. Mentioning Prolog, I would like to quote “There is a flaw in the very foundations of Logic Programming: Prolog is non declarative”.
Then “Why Datalog?”:
- Datalogis a language you will fall in love with
- if you come with a fresh programming mind; you will believe that that’s the way programming should be
- Or even if you come with a strong object oriented or procedural programming background, then you will be amazed by the power of Datalog and how things are done differently.
- Somehow, I think Datalog is sexy; a user describes the information desired without giving a specific procedure for obtaining that information.
- Also Datalog simplifies writing recursive queries and makes query optimization easier.
- And finally because as of today, if you use your favorite search engine and search for “Datalog” you probably won’t find what you were looking for.
The outline of this post is as follow:
- Introduction to logic programming
- Basic Structure/Terminology
- Syntax of Datalog
- Safety & recursion
A. Introduction to logic programming
In logic programming languages, programs are made up not of commands to be executed, but of definitions and statements about the problem to be solved. Declarative programs do not contain explicit instructions to be followed by the executing computer. Instead, the role of the computer is to manipulate the program’s information so as to derive the solution to a given problem. The goal of this post is to give a simple introduction to logic programming, and go in some details through Datalog.
B. Basic Structure/Terminology
Before we get too much further, let’s define a few terms.
- Variables: A variable in Datalog is like a variable in a mathematical equation, it represents some constant that has yet to be determined.
- Constants: A constant is an attribute value.
- Facts: A fact is the same things a Tuple in a relation.
- Rules: A rule is a way to derive new facts. It is part of a query.
- Predicates: A predicate is a statement that may be true or false depending on the values of its variables; a predicate can take the role as either a property or a relation between entities.
C. Syntax of Datalog
Datalog syntaxes vary slightly depending on the implementation. Mainly Datalog is syntactically a subset of Prolog. This section contains a summary of the syntax of Datalog; my choice of the Datalog implementation goes for Datalog LB , — a LogicBlox’ commercial implementation of Datalog used for web-based retail planning and insurance applications
|family:person(p) -> .|
|declares person in the name-space family|
|declares is_male and is_female, can be read as males and females are persons|
|a person cannot be a male and female at the same time
a person must have a defined gender
|this is also equivalent to saying that a person is male or a female, but it does not prevent a person from being both.|
- Relational predicates
|we define relations between two persons (one to many)|
- Functional predicates
|relations of type one to one are defined using functional predicates|
|family:person(p), family:person:id(p : id) -> string(id).
|functional predicates are used to define properties; the primary Key for person is the id. Datalog LB offers variety of simple types such as strings, integers, floats, dates …|
- Derivation rules
|In Datalog LB flipped arrows distinguish Integrity Constraints from derivation rules.|
- Run-time queries
|+family:person:id(p : id),+family:person:is_male(p)<- id = “00-11-22-33-44”.
^family:person:first_name[p] = f_name<- f_name = “John”, family:person:id(p:id)<- id = “00-11-22-33-44”.
-family:person:id(p : id)<- id = “00-11-22-33-44”.
|Run-time queries are not installed in the database, they can only be executed at a define run-time; The “+” is used to insert a new record in the database, The “^” is used to update a record, the “-” is used for removing the record.
The general form is
+|-|^R0(to,…tn ) <− R1(ko,…km),,, Rs(Jo,…Jp), tc=constT,kc=constK,…Jc=constJ .
.in SQL this is equivalent to (INSERT | DELETE | UPDATE ) XXX WHERE YYY.
- Delta rules
|Delta rules are transactional rules (events) similar to queries but they can be installed in the system. For instance this rule removes the person p if he is killed by somebody.|
D. Safety and recursion in Datalog programs
Recursive rules contains the same predicate in the head and the body or the rule. The power of Datalog resides in the recursion; Without recursion, Datalog can express all and only the queries of core relational algebra.
To ensure safety in Datalog programs, every rule should be safe in a way such as any variable occurring in the head of the rule must appear in the body (and vice versa).
Datalog is a restricted form of logic programming with predicates declarations, rules and constraints.
A Datalog rule has the form
R0(t0,1, . . . , t0,k0 ) <− R1(t1,1, . . . , t1,k1 ), . . . , Rn(tn,1, . . . , tn,kn).
Where R0 … Rn are predicate (relation) symbols.
Each term ti,j is either a constant or a variable (0 <= i <= n and 1 <= j <= ki).
The formula R0(t0,1, . . . , t0,k0 ) is called the head of the rule.
The sequence R1(t1,1, . . . , t1,k1 ) … Rn(tn,1, . . . , tn,kn) the body.
If n = 0, then the body is empty and the rule is called a fact.
A rule is safe if all variables occurring in the head also appear in the body.
In logic programming, the software engineer’s role is to create a real life model by declaring a collection of predicates and relations between these predicates and at a second stage to write a set of constraints and rules to solve the problem. The more rules we have, the more fun we get with Datalog since the executing computer uses these rules of inference to derive new formulas from the ones given in the program until it finds the one that expresses the solution.