I recently studied SQL as part of an introductory course on databases. SQL itself is not particularly difficult to grasp, yet compared to relational algebra, the division operation is much more complex. In relational algebra, there is a division operator, which has no direct equivalent in SQL. This means that you’ll have to find a workaround. There are a number of ways to express division in SQL, and with the exception of one, they are all quite complex.

Division identifies attribute values from a relation that are paired with all of the values from another relation. Popular textbook examples are the identification of suppliers who deliver all parts of a particular color. An intuitive solution would be to count the number of distinct red parts, and then look at every distributor to find out which of those deliver all those parts. To express this in SQL, you have to use the set theoretic operators “having” and “group by”, and then you simply count the tuples meeting certain criteria.

Let’s say you have table T1 in front of you and want to find out which A’s have both b2 and b3. You can assume that b2 and b3 are the red parts. If you’ve only been exposed to standard textbook treatments of division in SQL, you may be surprised that the problem can be solved as simply as this:

SELECT A FROM T1 WHERE B IN ( SELECT B FROM T2 ) GROUP BY A HAVING COUNT(*) = ( SELECT COUNT (*) FROM T2 );

Of course you can add a Where clause to the last expression.

The previous example is quite easy to grasp. The same can’t be said about how SQL division is commonly taught. Standard database theory textbooks expose you to a statement that is doubly nested and peppered with two negations. No matter how smart you are, it takes longer to parse than the previous example. I think you can safely call the following a monstrosity:

SELECT DISTINCT x.A FROM T1 AS x WHERE NOT EXISTS ( SELECT * FROM T2 AS y WHERE NOT EXISTS ( SELECT * FROM T1 AS z WHERE (z.A=x.A) AND (z.B=y.B) ) );

Which one would you prefer?

The examples above are taken from the paper “A Simpler (and Better) SQL Approach to Relational Division” by Matos and Grasser, published in *Journal of Information Systems Education*, Vol. 13(2). I was quite happy to have come across that paper. Unfortunately, theirs is not a very well-known approach to SQL division. This is unfortunate, since it’s not only easier to grasp, but, as Matos and Grasser write, it also exhibits better computational performance.

NikNo doubt. Your article really begs the question – do textbook authors teach the double negation way because they believe it superior or because they themselves don’t know any better.

-Cheers

Gregor UlmPost authorIt would be easy to say that they just don’t care. However, the reality in the textbook industry is that the person whose name is on the cover of the textbook is not necessarily the person who wrote it. In that case, the real authors arguably just didn’t know better. Further, at university there is the tendency to obfuscate the material to make it more difficult to grasp. This is particularly true if the material itself would not be overly challenging. The more trivial the field, the more complex it will be presented. Just open a textbook on sociology or human resources, if you need an example. Database theory has a reputation of being one of the easier subjects in the CS curriculum, so it could well be that teaching SQL division with double nesting and double negation is an attempt to make the subject appear to be more ‘esteemed’.

KhizarThe first solution is not what relational division does.

Gregor UlmPost authorI’m not sure what you want to express, and it would arguably helped if you cared to elaborate. Both queries result in the same output. In other words, they are equal. To be more precise: both queries exhibit the property of being extensionally equal. Since you seem to say that the second, convoluted, query performs relational division, while the first one does not, we therefore arrive at a contradiction, because, as a matter of fact, the first, much simpler expression, also performs relational division. Allow me to say that it would be incredibly foolish to prefer a more complex method (and slower one!) over a simpler one. Then again, foolishness has been in fashion in professional software development for decades, so you can enjoy the warm and fuzzy feeling of belonging to the majority.

AshokThe guy is right, the first one does not work with what you proposed.

JLGCREATE TABLE T1 (A integer, B integer);

CREATE TABLE T2 (B integer);

INSERT INTO T1 VALUES (1,1),(1,1);

INSERT INTO T2 VALUES (1),(2);

Simple division -> 1

Real division -> nothing

The solution you propose makes the (very strong) assumption that the couple (A,B) is unique in T1 (probably a PK). Otherwise you will count duplicates… In the paper that you cite (section 2.1), ” T1 represents a list of customers and the options they bought for their new cars”. Given their database design, a customer cannot buy two cars with the same option!

danielThe problem I have with the first querty is that it fails if u have “repeated elements”:

ej:

T1 T2

A B C B

1 1 1 1

1 1 2 2

1 2 3

2 1 4

I’ll get no results with the first one, because the first count(*) will return 3

but A=1 will pass in the second one.

danielsry those are the tables:

T1

A B C

1 1 1

1 1 2

1 2 3

2 1 4

T2

B

1

2

anonymousI’d be surprised if the first “simple” query is doing relational division. For A(a,b)/B(b), if there is an ‘a’ having multiple identical ‘b’s in the table A, this query will simply fail because the having count(*) will be larger than count in T2. I’ve verified this on a dataset. I think a more proper description of SQLRA division is here: https://www.simple-talk.com/sql/learn-sql-server/high-performance-relational-division-in-sql-server/

TevesThe problem with the first statement is that it doesn’t consider the duplicates so you will need some pre-processing like Relational Algebra does. The performance issue in the second statements is due to a double nested-query. It could possibly be solved by using NOT EXISTS (… EXCEPT …) instead.

For example, Table T1: [A, milk, 9/2],

[A, eggs, 9/3],

[A, milk, 9/3],

[B, eggs, 9/2],

[B, eggs, 9/3]

Talbe T2: [milk],

TevesTalbe T2: [milk],

[eggs]

If you write the first statement without pre-processing then it will not be correct,

As the count(*) in group A is 3, and the count(*) in group B is 2.

However, you can solve this by: (Assuming the attributes in T1 are name, items, date)

SELECT name

FROM T1 as x

WHERE NOT EXISTS (

SELECT *

FROM T2

EXCEPT

SELECT items

FROM T1

WHERE name=x.name)

TevesForget to say, I am not doing T1 divided by T2 but actually finding who bought all items in T2, which means T1(name, items) divided by T2.

AdamLet’s take it easy on the guy,

he’s completely right. All that is necessary is that is to preface the query with a select distinct.

could look something like this (for the simple case of one item):

with t1p as (select distinct * from t1),

total_items as (select count(*) from (select distinct * from t2) a)

select A

from t1p

where B in (select B from t2)

group by A

having count(*)=(select * from total_items);