The goal of this tutorial is to give a quick introduction how to use the relational algebra calculator and its concepts. It assumes that you already know the relational algebra or are learning it from other sources.
There is no real standard for the relational algebra like there is for SQL. So every book or teacher might have its slightly different interpretation and notation.
The goal of this progam was to support the most commonly used "mathematical" notation used by Database Systems The Complete Book 2nd Edition by Hector GarciaMolina, Jeff Ullman, and Jennifer Widom, Datenbanksysteme: Eine Einführung by Alfons Kemper and André Eickler and Wikipedia  Relational Algebra and others.
The core element of the calculator is the relation (or table) which consists of a fixed number of attributes (or columns) in a fixed order (this is called the schema of the relation) and a set of tuples or rows containing the specific values.
Each attribute has three distinct properties: its type, its position and its name.
The type or domain of an attribute is either string, number, date or boolean.
The type is used for example to determine if two values can be compared in a boolean expression or if two schemas are union compatible. In most cases the type of the attributes are obvious if you look at the values.
The position of each attribute in a schema is fixed and can be used to adress the attributes.
An example would be the projection of the first and third attribute or column of an arbitrary relation R: π [1], [3] ( R )
The full qualified name of the attribute is a unique identifier of the attribute within the schema of its relation.
It consists of the name itself and a relation qualifier and are written like in SQL as R.a
where a is the name and R is the relation qualifier.
An example would be the projectoin of the attributes a, b from a relation R: π R.a, R.b ( R )
The default relation qualifier of each attribute is the name of its relation.
If the attributes name without the qualifier is unique within the relation's schema, it also can be used to address a specific attribute.
The previous example could also be written as π a, b ( R )
.
Each relation has a set of tuples (or rows). This means that there are no duplicate tuples within one relation and the duplicateelimination is implicitly executed after every single step of the calculation.
The tuples in the calculator have a defined order and unlike a normal database system all operations are implemented to preserve that order. This should help the users to see what has changed from one step to the next.
so far we covered that
π [1], [2] ( R )
, π a, b ( R )
π R.a, S.a ( R x S )
Before we introduce how to use the operators this should be a quick introduction of a very handy feature of the relational algebra calculator: the alternative plain text notation
The "classic" mathematical notation uses greek letters like π, σ for the unary operations and special symbols like the join symbol ⋈ or the union symbol ∪ for some binary operations.
This symbols can be entered using the toolbar at the top of the editor.
This calculator also supports a alternative syntax for all this symbols that follows two very simple rules: Every greek letter can be substituted with its name spelled out ("pi" for π, "gamma" for γ) and every other symbol has an equivalent name that is borrowed from SQL, programming languages like C and Set theory.
This substitutions should be easy to read and much more important very easy to write because you don't need any toolbar or mouse. The calculator also supports autocomplete: just press [CTRL]+[SPACE] to complete the current keyword.
This feature should help you to write your statements more quickly and fluently.
π R.a, S.a, S.b
σ R.a = S.a ∧ ( R.a > 5 ∨ R.a < 0 ) (
R ⨯ S
)
is equivalent to:
pi R.a, S.a, S.b
sigma R.a = S.a and ( R.a > 5 or R.a < 0 ) (
R cross join S
)
In the following table you can see a list of all supported substitutions:
classical notation  alternative notation 

π  pi 
σ  sigma 
ρ  rho 
τ  tau 
γ  gamma 
∩  intersect 
∪  union 
  \ 
÷  / 
⨯ 

⋈ 

⟕ 

⟖ 

⟗  full outer join 
⋉  left semi join 
⋊  right semi join 
▷ 

←  <
pi new_name < a ( R )

→  >
pi a > new_name ( R )

For this Part we use the "bank example" Dataset with 3 relations: Customers, Accounts and PremiumCustomers. By convention relations start with a uppercase letter and attributes with a lower case letter.
Open the dataset used in this tutorial using the following link to the "bank example" Dataset.
You find the relations and their attributes listed on the side and if you hover a relations name a preview of the first view tuples is displayed.
After you have found the Dataset you can formulate the very first and most basic query in relational algebra: a relation without any further manipulation.
Just enter the name of a relation into the code editor or click on the relation/attribute names to insert them into the code editor.
Note that the editor supports auto completing the relation/attribute names of the current dataset and the operators with [CTRL]+[SPACE]
So if you want all tuples of the relation Customer you should have the following statement: Customer
. And get all the tuples if you press the execute button or press [CTRL]+[RETURN].
All unary operations have the same basic syntax FUNCTION ARGUMENT ( CHILD_EXPRESSION )
.
The braces around the CHILD_EXPRESSION
can be omitted. In this case the predefined operator precedence for relational algebra applies.
A complete list of the supported relalg operations can be found here: general syntax, unary operations and binary operations.
The projection is one of the basic operations that allow to choose which of the attributes of the parent relations should be included in the new one and in which order they should be.
Renaming a relation (ρ) changes the qualifiers of all the relations attributes but does not touch the tuples.
Renaming an attribute (ρ) only changes the name of a specific attribute (and leaves his relationqualifier unchanged).
The statement pi balance ( Accounts )
returns a new relation with only the balance attribute.
The next statement gets the balance with the accountid after renaming the relation to A and renames one of the attributes.
rho account_number < aid (
pi aid, A.balance (
rho A (
Accounts
)
)
)
Like in SQL or most programming languages you can format your statement and use SQL like comments (with  ...
or /* ... */
) to increase the readability.
The next statement uses a selection to filter the tuples of a relation based on a boolean expression. The calculator supports complex boolean expression with functions and built in operator precedence.
The attributes in the boolean expression can be given as name or numeric position like with the projection.
The next statement selects all customersids of customers who have the same value for their firstname and lastname.
 this should return an relation with no tuples:
pi cid (
sigma firstname=lastname (
Customers
)
)
The next example uses a more complex expression to get all accounts with a balance over 100 or under 100.
sigma balance > 100 or (balance*1 > 100)  (balance < 100) would also be correct
Accounts
As a shorter alternative you can use a function in your expression to get the same result:
sigma abs(balance) > 100
Accounts
Everybody can provide datasets that can be used in the relational algebra calculator and share them with others.
We assume the scenario of a teacher wanting to provide a dataset for his/her students for this short tutorial.
The datasets are specified in a simple text format and can be shared with others via GitHub Gists (a simple and free platform to share snippets).
The shared gist gets an unique ID and the relational algebra calculator can load the dataset directly using this ID.
The fist step is the creation of the dataset which is actually only a group of relation definitions with some additional information and is therefore refered as group in the specified format.
The relations can then be used by the students to formulate the there statements.
Lets assume we want to create a dataset of employees of a company.
group: bank example
every header field starts with the name of the field and is followed by a colon for single line values.
The next (optional) header field we should specify is the description. It should contain information like who is maintaining this dataset or where to find additional information.In our example we add a description that takes more than a single line and therefore we enclose the value in double brackets instead of using the colon.
group: bank example
description[[ the data for this dataset was generated using <http://www.generatedata.com/>
* the relation _Customers_ contains basic information about the customers of the bank.
The relation _Accounts_ contains the basic information of a single account.
Note that a customer can have any number of accounts.
* the relation _PremiumCustomers_ contains
the customerids of all customers with a total balance over 1000
]]
The next step is to actually add the relations the students can use for their queries.
The relation definitions are use the relational algebra syntax that can be used in this tool.
Every relation is defined by a single variable assignment where the name of the variable is used as the relations name and the result of the expression defines the relation.
The relalg expression can use all features that can be used in the tool and can also use other relations defined within the same dataset.
Note that the name of the relation is used as the qualifier of each attribute/column.
For the relation Customers
relation we use the inline relation syntax as the most basic method to define the relation and can also be edited using the relation editor which is a simple spreadsheet like editor.
The inline relations in combination with the editor should be very easy to use if you enter the data directly or if you have them as a csv/spreadsheet file.
group: bank example
description[[ the data for this dataset was generated using <http://www.generatedata.com/>
* the relation _Customers_ contains basic information about the customers of the bank.
* the relation _Accounts_ contains the basic information of a single account. Note that a customer can have any number of accounts.
* the relation _PremiumCustomers_ contains the customerids of all customers with a total balance over 1000
]]
Customers = { cid firstname lastname
1 Dexter Simpson
2 Kaseem Gallagher
3 Kuame Hamilton
4 Robert Thompson
5 Rhiannon Valentine
6 Calvin Mays
}
Accounts = {
aid, cid, balance:number
1, 1, 66
2, 1, 304
3, 2, 272
4, 3, 3472
5, 4, 975.7
6, 4, 93
7, 5, 534
8, 5, 75.5
}
As we can see the name of the relations are defined by the assignments.
The inlinerelations are enclosed by curly brackets and contain the names of the attributes/columns in the first line and then a tuple/row per line where the values are simply separated by whitespace. You can also other seperators and can define the types explicitly as we can see at the Accounts relation. For further information can be found at inline relation description.
The last relation we need to add in this example is the relation containing the banks premium Customers.
They are specified by using the other two relations in a simple relational algebra statement that selects all customers with a total balance over 1000.
group: bank example
description[[ the data for this dataset was generated using <http://www.generatedata.com/>
* the relation _Customers_ contains basic information about the customers of the bank.
* the relation _Accounts_ contains the basic information of a single account. Note that a customer can have any number of accounts.
* the relation _PremiumCustomers_ contains the customerids of all customers with a total balance over 1000
]]
Customers = { cid firstname lastname
1 Dexter Simpson
2 Kaseem Gallagher
3 Kuame Hamilton
4 Robert Thompson
5 Rhiannon Valentine
6 Calvin Mays
}
Accounts = {
aid, cid, balance:number
1, 1, 66
2, 1, 304
3, 2, 272
4, 3, 3472
5, 4, 975.7
6, 4, 93
7, 5, 534
8, 5, 75.5
}
PremiumCustomers =
pi cid (
sigma sum > 1000 (
gamma cid; sum(balance)>sum (
Accounts join Customers
)
)
)
We have now seen how to define a dataset with its header containing the name and the description followed by relational algebra assignments defining the relations of the dataset.
We can paste this definition directly in the Group Editor to load and use it.
In the next section we want to look at how we can publish this definition so that we give our students a single url or id to directly load the dataset.
If we want to share the definition of a dataset with other people we could simply give them the definition and they load it using the group editor, but that would not be that practical for most cases.
The simpler solution (for the users) is to publish the definition as a GitHub Gist and share the ID of the gist with others.
Just create a gist with the definition as its content (the filename does not matter) and publish it. The ID of the Gist can now be found at the top of the page as gist:xxxxxxxxxxxx or in the url after the last slash.
This ID can then be shared and loaded in the interface or the calculator can be called directly with a specific ID by using using the parameter ?data=gist:xxxxxxxxxxxxxx
.
For example the simple bank definition of this tutorial has been published as a Gist with the ID 2cfb981fbc5676182d64 and can therefore be loaded directly by appending ?data=gist:2cfb981fbc5676182d64
to the url of the calculator.
syntax  NAME = EXPRESSION 

Defines a new local variable with the name NAME; its content is defined by EXPRESSION
The name of the new relation must be unique.
The definition has to be executed with the statement.
TestA = π a,b R
TestB = σ d > 0 S
 statement using the variable
TestA join TestB
An assignment (= definition of a variable) is no valid relational algebra expression on its own. If you miss the acutal query a error is thrown (Error: only assignments found; query is missing).
 this is the definition of the variable
Test = π Customer.firstname, surname ( Customer )
 this is the actual query/statement using the variable
Test
There is a known problem when the last assignment ends with a natural join and the query consists of a single relation:
A = S join R
A  this is the query
The statement is ambiguous and the parser interprets it as A = (S join R A)
where R is interpreted as a column argument for the thetajoin and therefore detects a
cyclic usage of the variable A.
Solution: To get the expected behaviour you have to set braces around the assigned expression like A = (S join R)
A
syntax   COMMENT_TEXTEXPRESSION 

the '' must be followed by at least one whitespace charater! inserts a comment; its text goes until the end of the line
comments are recognized as whitespace
Test =
 This is the expression that is assigned to Test
π Customer.firstname, surname ( Customer )
syntax  RELATION_NAME 

Uses a pre defined relation with the name RELATION_NAME
The code completion only works for this relations.
( Customers ) cross join ( Accounts )
syntax  /* COMMENT_TEXT */
EXPRESSION 

inserts a comment that can span multiple lines
comments are recognized as whitespace
/* This is
a very
very
long comment */
Test = π Customer.firstname, surname ( Customer )
syntax  {COLUMN_NAME_1:COLUMN_TYPE_1 ...
ROW_1
ROW_2
...
}


The inlinerelation is a temporary relation that can be defined directly in the statement. It is only valid in the defining statement
Every inlinerelation is a valid expression and thus can be used at any position a EXPRESSION is expected.
The inlinerelation is defined by a header, that specifies the schema of the relation and the rows containing the values and is surrounded by curly brackets.
QUALIFIER.COLUMN_NAME:COLUMN_TYPE
separated by any whitespace, comma or semicolon.
The QUALIFIER is optional. Also the COLUMN_TYPE can be omitted if the type is well defined by the values of that column. The first non null value of a column defines its type.
The rows of the relation are defined by a list of values per row with the type of the coresponding column. The values are separated by whitespace comma or semicolon.
Simple strings only containing letters, numbers, hyphens, underlines, dots or periods ([09azAZ\_\.]+) can be written without single quotes. NULL and null are constant values. If null, true or false should be used as string they have te be quoted.
More complex strings must be surrounded by single quotes: 'content' or content but '' or 'long content containing spaces and special characters like !' or 'null'.
Dates are written in ISOformat: YYYYMMDD without single quotes
A nullvalue can be written as null or NULL (without single quotes).
Numbers could be integers in the form (?[09]+) or floats in the form (?[09]+\.[09]+).
Numbers in single quotes are recognized as numbers if the column type is defined as number or has been detected to be number from a previous value; otherwise it will be a string value..
A boolean value is denoted as either true or false (case insensitive).
The header and rows can be indented if needed.
 type for column b is defined by the first value
rho A {a:number, b
1, 2
3, 4
}
cross join
{
a:string X.b:date c:number
Alpha 19700101 1
'Beta 2' 19700102 3
'' 19700103 4
}
A valid relational algebra expression is built by connecting relationname or inlinerelation as atoms with the defined unary and binary operators.
So a relational algebra expression is recursively defined as follows:FUNCTION ARGUMENT ( CHILD_EXPRESSION )
symbol  π 

alternaive syntax  pi 
example 
pi a, b ( R )

The argument is a subset of columns of the schema of the CHILD_EXPRESSION or a value expression
π Customer.firstname, surname ( Customer )
pi c.id, [1] ( ρ c ( Customer ) )
pi c.id, lower(username)>user, concat(firstname, concat(' ', lastname))>fullname (
ρ c (
Customer
)
)
rownum()
expression can be used to get the same information. And it can also be used
directly in the boolean condition of a selection or join.
pi firstname, lastname
sigma rownum() <= 5
tau countOrders desc
Customer
symbol  σ 

alternaive syntax  pi 
example 
sigma a > 2 ( R )

The argument is a boolean expression that each row of CHILD_EXPRESSION is checked on
σ firstname = 'Bob' or firstname = 'Alice' ( Customer )
σ (id > 10 and id < 100) or id = 42 ( Customer )
σ mod(length(firstname),2) = 0 ( Customer )
symbol  ρ 

alternaive syntax  rho 
example 
( R ) join R.a = X.b (rho X ( R ))

π a.id, a.firstname ( ρ a ( Customer ) )
symbol  ρ 

alternaive syntax  rho 
example 
the old and the new column names in a list (see example) "←" can be substituted with "<" pi x, b rho a>x {a, b
1, 2
3, 4
}

ρ myId←id, foobar←firstname (π id, firstname ( Customer ) )
symbol  τ 

alternaive syntax  tau 
example 
tau a asc, b desc ( R )

τ [1], firstname desc (π id, firstname ( Customer ) )
symbol  γ 

alternaive syntax  gamma 
example 
gamma a, count(*) ( R )

γ a, b ; sum(c)>x ( Customer )
If no grouping columns are provided the entire relation is the group.
number  string  date  

COUNT( * )  yes  yes  yes 
COUNT( column )  yes  yes  yes 
MIN( column )  yes  yes  yes 
MAX( column )  yes  yes  yes 
SUM( column )  yes  no  no 
AVG( column )  yes  no  no 
( CHILD_EXPRESSION ) FUNCTION ARGUMENT ( CHILD_EXPRESSION )
symbol  ∩ 

alternaive syntax  intersect 
( Customer ) ∩ ( Customer )
symbol  ∪ 

alternaive syntax  union 
( Customer ) ∪ ( Customer )
symbol  ÷ 

alternaive syntax  / 
( Customer ) ÷ ( Customer )
  ∪ 

alternaive syntax  \ except 
( pi firstname ( Customer ) )  ( rho test<lastname (
pi lastname ( Customer )
) )
symbol  ⨯ 

alternaive syntax  cross join 
symbol  ⋈ 

alternaive syntax  join inner join 
Special case:
The name of a single boolean column (like R join a S
) can not be used directly
as a join condition due to ambiguities in the relational algebra syntax.
The column must either be specified with its qualifier (R join R.a S
) or wrapped in parentheses (R join (a) S
).
This is not necessary for more complex boolean expressions. The problem is only that the single column name
can not be distinguished from a relation name: the expression X=R join S X
could be interpreted as A=(R join S A)
instead of A=(R join S) A
.
symbol  ⋈ 

alternaive syntax  join natural join 
( ρ a ( Customer ) ) ⋈ a.name < b.name ( ρ b ( Customer ) )
symbol  ⟕ 

alternaive syntax  left outer join left join 
symbol  ⟖ 

alternaive syntax  right outer join right join 
symbol  ⟗ 

alternaive syntax  full outer join 
symbol  ⋉ 

alternaive syntax  left semi join 
symbol  ⋊ 

alternaive syntax  right semi join 
symbol  ▷ 

alternaive syntax  anti semi join anti join 
The operator precedence allows to obmit most of braces.
The used precedence is shown in the table below.
All operators are left associative.
order of precedence  Operator 

0  relationname, inlinerelation 
1  projection, selection, rename (columns), rename (relations), group, order by 
2  cross product, theta join, natural join, left outer join, right outer join, full outer join, left semijoin, right semijoin, anti semijoin, division 
3  intersection 
4  union, subtraction 
A join B x C
((A join B) x C)
sigma true A join sigma true B
(sigma true (A)) join (sigma true (B))
rownum()
expression always represents the index of the lefthand relation
syntax  returns type  description 

a AND b 
boolean  logical and 
a OR b 
boolean  logical or 
a XOR b 
boolean  logical exclusive or 
NOT b 
boolean  logical not 
a = b
a != b
a < b
a > b
a <= b
a >= b
a != b

boolean  compares two values of the same type 
a:string LIKE 'PATTERN' 
boolean  returns true if expression evaluating to a string a matches the pattern
given as the second operand.
The pattern has to be given as a string literal;
An underscore ( 
a:string ILIKE 'PATTERN' 
boolean 
same as LIKE but matches caseinsensitive.
This is not in the SQL standard but is a PostgreSQL extension. 
a + b
a  b
a * b
a / b
a % b

number  arithmetic addition, subtraction, multiplication, division, modulo 
rand() 
number  returns a random number in the interval [0, 1] 
rownum() 
number 
returns the index of the current row (starting with 0).
If the function is used in a binary relational algebra expression (e.g. a join) it always represents the index of the left operand. 
length(a:string) 
number  returns the length of the string 
date(a:string) 
date  parses the given string to a date object. The string must be in the form YYYYMMDD 
adddate(a:date, b:number) 
date  adds the given number of days to date a 
subdate(a:date, b:number) 
date  subtracts the given number of days from date a 
now()
transaction_timestamp()
statement_timestamp() 
date  returns a timestamp representing the start of the query execution
transaction and statement start are the very same value due to the lack of a transaction concept 
clock_timestamp() 
date  returns the current timestamp while executing the query 
year(a:date) 
number  returns the year component of a given date 
month(a:date) 
number  returns the month component of a given date as a number. (1 = Sunday, 2 = Monday, ..., 7 = Saturday) 
day(a:date)
dayofmonth(a:date) 
number  returns the day component of a given date as a number in the range 1 to 31 
hour(a:date) 
number  returns the hour component of a given date as a number in the range 0 to 23 
minute(a:date) 
number  returns the minute component of a given date as a number in the range 0 to 59 
second(a:date) 
number  returns the second component of a given date as a number in the range 0 to 59 
concat(a:string [, ...]) 
string  concatenates the given strings 
upper(a:string)
ucase(a:string) 
string  converts the given string to uppercase 
lower(a:string)
lcase(a:string) 
string  converts the given string to lowercase 
strlen(a:string) 
number  number of characters of the string 
abs(a:number) 
number  the absolute value of the given number 
add(a, b)
sub(a, b)
mul(a, b)
div(a, b)
mod(a, b)

number  arithmetic addition, subtraction, multiplication, division or modulo of the given numbers 
round(a)
floor(a)
ceil(a)

number  round to nearest integer, largest integer not greater than the argument or smallest integer not less than the argument 
coalesce(value [, ...]) 
type of value  returns the first of its arguments that is not null or null if all arguments are null. Note that all arguments must have the same datatype. 
CASE WHEN condition THEN result
[WHEN ...]
[ELSE result]
END 
type of result  returns the first result where the condition evaluates to true. If all conditions are false the else part is executed or null is returnt if the else part is missing. Note that all results must have the same datatype. 
order of precedence  Operators 

0  functions, constants, columns 
1  ! (boolean not) 
2   (unary minus) 
3  *, /, % 
4  , + 
5  = (comparison), >=, >, <=, <, <>, !=, LIKE, ILIKE 
6  CASE, WHEN, THEN, ELSE 
7  AND 
8  XOR 
9  OR,  
The goal of the SQL mode of the relational algebra calculator is to provide a translation from SQL to relational algebra to show how they are related. It does not support all features a real database system like PostgreSQL or MySQL does because the goal is to provide a translation into relational algebra. This means that many features like correlatedsubstatements are not supported because the translation into relational algebra is not trivial and modern database systems use an extended set of operators internally that do not require a onetoone translation into "classical" relational algebra. Therefore the learning effect for users of this tool would not be that big.
All keywords are case insensitv.
The following Synopsis is a adapted version of PostgreSQL and shows the general syntax of the supported SQL. Brackets indicate optional parts. Braces and vertical lines indicate that one of the alternatives has to be chosen. Dots mean that the preceding element can be repeated.
[ WITH with_query [, ...] ]
SELECT [ DISTINCT ]
*  expression [ [ AS ] output_name ] [, ...]
FROM from_item [, ...]
[ WHERE condition ]
[ GROUP BY column [, ...] ]
[ HAVING condition ]
[ { UNION  INTERSECT  EXCEPT } [ ALL  DISTINCT ] select ]
[ ORDER BY column [ ASC  DESC ] [, ...] ]
[ LIMIT { count  ALL } ]
[ OFFSET start [ ROW  ROWS ] ]
[ FETCH { FIRST  NEXT } [ count ] { ROW  ROWS } ONLY ]
where from_item can be one of:
table_name [ AS alias ]
with_query_name [ AS alias ]
( select ) AS alias
from_item CROSS JOIN from_item
from_item NATURAL JOIN from_item
from_item [ INNER ] JOIN from_item ON join_condition
from_item [ INNER ] JOIN from_item NATURAL
from_item [ INNER ] JOIN from_item USING ( join_column [, ...] ) from_item
from_item { LEFT  RIGHT  FULL } [ OUTER ] JOIN ON join_condition from_item
from_item { LEFT  RIGHT  FULL } [ OUTER ] JOIN NATURAL from_item
from_item { LEFT  RIGHT  FULL } [ OUTER ] JOIN USING ( join_column [, ...] ) from_item
and with_query is:
with_query_name AS ( select )
The SQL statement is translated directly into relational algebra. To understand some of the effects of the tool it might be helpful to understand the steps of the translation process. As mentioned before, real database systems might take a different (more complex) aproach but this should help to get an idea of how SQL could be translated into "classical" relational algebra.
The following list shows the translation from SQL into relational algebra starting with the inner most relational algebra expression at the top.
,
is a cross join)The direct translation into relational algebra with implicit duplication elimination requires the distinct
keyword to be equivalent. A warning is shown if you omit it.
The selectclause is translated into up to two relalg operators.
select * ...
where no changes are made to the schema of the result. Therefore no projection is needed. You can use the optional tablealiasprefix if the columns of a single table/relation should be returned only: select distinct R.* from R inner join S
select a, b from ...
) then a single projection is used.as foo
. e.g. select a as foo from ...
The allowed expressions are the same as for the projection.
So it can be either the name of the column (with optional renaming using as
) or a complex value expression. In the latter case a outputname must be given using as
because the tool requires every column to have a name.
In its simplest form the fromclause holds a single table/relation name that is used directly in the relalg statement. If the optional tablealias is specified with table as foo
this is reflected by wrapping the relation in a renamerelation operator with the given outputname.
select distinct x.a, x.b
from R as x
Joins can be expressed using the ANSI join syntax
select distinct *
from A, B
inner join C on ...
inner join D natural
natural join E
left outer join F on ...
left outer join G natural
right outer join H on ...
right outer join I natural
full outer join J on ...
full outer join K natural
where ...
The comma is part of the old join syntax and is translated into a cross join.
select distinct *
from R, S as s, T
where s.a = R.a
Instead of the name of relation a noncorrelated substatement can be used. A table alias must be provided with (...) as foo
.
A noncorrelated substatement can be directly translated into relational algebra by just translate the substatement into relational algebra and use the resulting operator tree instead of the relation.
Noncorrelated means that it must not reference/use any columns of tables defined in the outer scope.
This limitation is intentionally because the translation into relational algebra is not trivial and modern database systems use an extended set of operators internally that do not require a onetoone translation into "classical" relational algebra. Therefore the learning effect for users of this tool would not be that big.
select distinct *
from R, (select * from S where a > 0) as x
where x.a = R.a
The boolean condition in the whereclause can be any expression evaluating to boolean.
The where clause is directly translated to an relational algebra selection with the very same condition. This selection is applied after joining relations of the fromclause therefore has to use the original column names.
Subquery Expressions like EXISTS
, IN
, ANY/SOME
or ALL
are not supported because their translation into relational algebra is not trivial and modern database systems use an extended set of operators internally that do not require a onetoone translation into "classical" relational algebra. Therefore the learning effect for users of this tool would not be that big.
The GROUPBYclause takes a list of column names only argument.
The GROUPBYclause is directly translated to the relational algebra groupby operation and is executed directly after the selection built from the WHEREclause and before the projection/renaming build from the SELECTclause. Therefore the column names that can be used are the ones available after all joining all tables.
The aggregations used in the relational algebra groupby operation are taken from the SELECTclause and an outputname must be given using as
because the tool requires every column to have a name.
If no aggregations are present in the SELECTclause a projection is used instead of the groupby operation because sigma without aggregation has the very same effect.
Every nonaggregationcolumn in the SELECTclause must be present in the group by clause because the would not be available after the grouping.
The HAVINGClause represents an optional relational algebra selection. The boolean condition can be any expression evaluating to boolean.
The resulting selection is executed directly after the relational algebra groupby operation.
Unlike PostgreSQL the HAVINGclause is only allowed when either a aggregation or grouping is present.
Order by takes a list of column names or indices of columns (starting with 1) as its argument.
It is directly translated to the extended relational algebra operation order by (tau).
The LIMITclause can be either specified with the LIMITOFFSET syntax used by PostgreSQL and MySQL or the FETCHFIRST syntax introduced in SQL:2008.
It is translated into a relational algebra selection using the rownum()
function to limit the number of rows returned.
The SetOperators UNION, INTERSECT and EXCEPT directly map to the relational algebra operators union, intersection and subtraction.
The keyword DISTINCT
is optional because it represents the default behavior. The keyword ALL
is ignored and a warning is shown because the targeted relational algebra has a implicit elimination duplicate rows.
Parentheses can be used to create more complex statements:
(
select distinct * from S
union
select distinct * from T
)
except select distinct * from T
order by 1
limit 1
The WITHclause (also known as common table expressions) provides a way to define subqueries for single or multiple use in a statement. This can be thought as defining a temporary table for that query in SQL terminology or creating variables in relational algebra. Recursive evaluation is not supported.
Each subquery can be referenced by the name from the WITHclause. The subquery is automatically renamed to the name used in the WITHclause.
Value expressions are used for boolean expressions for WHERE and HAVINGclause, the boolean conditions of joins and calculated values in the SELECTclause. The type of a expression is either string, number, date or boolean and is determined by the used operations and columns.
The supported functions and operations are the same for SQL and relational algebra: value expression