Structured Query Language (SQL)

Basic SQL Commands

A basic SQL block is composed of SELECT, FROM and WHERE

select column1, column2, ...
from table_name
where condition
  • select clause used to list attributes as a result of the query
  • from clause is a list of relations to be accessed in the evaluation for the query
  • where clause sets some conditions/predicate involving attributes of the relation in the from clause.

order of operations from, where, select

Danger

If you put in multiple tables under FROM, what SQL really is doing is taking the cartesian product of the tables (1.FROM), and then checking the conditions in (2.WHERE). This can result in slow or wrong answers.

The Cartesian product operation produces rows combining each row from the first table with each row from the second table. For example, each tuple in instructor is combined with every tuple in teaches, even those that refer to a different instructor.

Rename Operation

  • For if attributes in the from clause may have the same name, in which case we have a duplicated in the result
  • For if we want to perform arithmetic in the select clause on some expression
  • Short and for convenience
  • Wish to compare tuples in the same relation
old_name as new_name

The as clause can appear in both the select clause and from clause.

Important

To compare tuples in the same relation

Find the names of all instructors who earn more than the lowest paid instructor in the Biology department OR Find the names of all instructors whose salary is greater than at least one instructor in the Biology department are the same statement:

select distinct T.name
from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = 'Biology'
  • T and S: think of them as copies of the relation instructor, they are declared as aliases

String Operations

Pattern matching use the operator like (using two special characters):

  • Percent(%): The % character matches any substring
  • Underscore: The _ character matches any character

“Find the names of all departments whose building name includes the substring ‘Watson’.” This query can be written as:

select dept_name
from department
where building like %Watson%

Mismatch by using the not like comparison operator.

* in select clause

The asterisk symbol * can be used in the select clause to indicate all attributes.

Ordering the Display of Tuples

The order by clause cause the tuples in the result of a query to appear in sorted order.

In alphabetical order we write:

select name
from instructor
where dept_name = 'Physics'
order by name
  • By default, order by lists items in ascending order. We can specify the sorting order by adding desc or asc.

Where-Clause Predicates/Conditions

There is between comparison operator to simply where clauses that specify that a value be to some value and to some other value.

Find the names of instructors with salary amounts between 100,000, we can use the between comparison to write:

select name
from instructor
where salary between 90000 and 100000

Similarly, there is a not between comparison operator.

The comparison operators can be used on tuples, and the ordering is defined lexicographically.

select name, course_id
from instructor, teaches
where (instructor.ID, dept_name) = (teaches.ID, 'Biology')
  • Which is the same as where instructor.ID = teaches.ID and dept_name = 'Biology'

Set Operations

Union

Performed on two sets. For instance, if we want to find the set of all courses taught either in Fall2017 or in Spring 2018, we use union.

The union operation automatically eliminates duplicates, unlike the select clause. So, if there are two sections offering the same course in Spring 2019, then it will only appear once.

(select course_id
from section
where semester = 'Fall' and year = 2017)
union
(select course_id
from section
where semester = 'Spring' and year = 2018)

If we want to retain all duplicates, we must write union all.

Intersect

To find the set of all courses taught in both Fall 2017 and Spring 2018 we use intersect. The intersect operation automatically eliminates duplicates. If we want to retain all duplicates, we must write intersect all.

  • rarely used?

Except

The except operation outputs all tuples from its first input that do not occur in the second input, it performs set difference.

For example, if four sections of ECE-101 were taught in the Fall 2017 semester and two sections of ECE-101 were taught in the Spring 2018 semester, the result of the except operation would not have any copy of ECE-101.

If we want to retain duplicates, we must write except all.

Null Values

  • The result of an arithmetic expression (+, -, x, /) is null if any of the input values is null.

  • SQL treats as unknown the result of any comparison involving a null value (other than predicates is null and is not null).

  • If where clause condition evaluates to either false or unknown for a tuple, that tuplis is not added to the result.

Aggregate Functions

  • Average: avg()
  • Minimum: min()
  • Maximum: max()
  • Total: sum()
  • Count: count() Input to sum and avg needs to be numbers.

Careful, need to probably do groub by after. Use distinct in the aggregate function to eliminate duplicates.

Group By

Apply the aggregate function not only to a single set of tuples, but also to a group of sets of tuples; we specify this in SQL using the group by clause.

Used to form groups.

Tuples with same value on all attributes in the group by clause are placed in one group.

Example: Find the average salary in each department.

select dept_name, avg(salary) as avg_salary
from instructor
group by dept_name

Grouped by the dept_name attribute. And then we compute the average salary of each group resulting to:

Example: Find the number of instructors in each department who teach a course in the Spring 2018 semester

select dept_name, count(distinct ID) as instr_count
from instructor, teaches
where instructor.ID=teaches.ID and
	semester = 'Spring 2018' and year = 2018
group by dept_name

Important

Ensure that the only attributes that appear in the select statement without being aggregated are those that are present in the group by clause. That is, any attribute that is not present in the group by clause may appear in the select clause ONLY as an argument to an aggregate function. —erroneous query

Having

https://www.w3schools.com/sql/sql_having.asp Having clause is used to apply a condition to groups rather than to tuples.

The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.

select column_name(s)
from table_name
where condition
group by column_name(s)
having condition
order by column_name(s);

Example: Find the only departments where the average salary of the instructors is more than 42000$.

Condition applies to each group constructed by the group by clause.

select dept_name, avg(salary) as avg_salary
from instructor
group by dept_name
having avg(salary) > 42000

Operations order:

  1. From clause first to get a relation.
  2. Where clause condition are applied on the result relation of the from clause.
  3. Tuples satisfying the where conditions are placed into groups by the group by clause if it is present. If not, the entire set of tuples satisfying the where clause is treated as one group.
  4. Having clause if present is applied to each group, groups that don’t satisfy the having clause predicate are removed.
  5. The select clause uses the remaining groups to generate tuples of the result of the query, applying the aggregate functions to get a single result tuple for each group.

Another example: For each course section offered in 2017, find the average total credits (tot_cred) of all students enrolled in the section, if the section has at least 2 students.

select course_id, semester, year, sec_id, avg(tot_cred)
from student, takes
where student.ID=takes.ID and year = 207
group by course_id, semester, year, sec_id
having count(ID)>=2

Aggregation with Null and Boolean Values

All aggregate functions except count(star) ignore null values in their input collection. As a result of null values being ignored, the collection of values may be empty.

Aggregate function some and every can be applied on a collection of Boolean values and compute or and and of the values.

Nested Subqueries

A subquery is a select-from-where expression that is nested within another query.

Common use of subqueries is to perform tests for set membership, make set comparisons and determine set cardinality by nesting subqueries in the where clause.

Can be nested in:

  • where clause
  • from clause
  • with clause

Set Membership

SQL allows testing tuples for membership in a relation.

  • in connective tests for set membership, where set is a collection of values produced by a select clause.
  • not in connective test for the absence of set membership

Example: Find all the courses taught in the both the Fall 2017 and Spring 2018 semesters.

select distinct course_id
from section
where semester = 'Fall' and year = 2017 and
	course_id in(select course_id
				from section
				where semester = 'Spring' and year = 2018)

Note we need to use distinct. It doesn’t automatically remove duplicates like intersect.

Found all course taught in 2018, then we need to find those courses that were taught in Fall 20-17 and that appear in the set of courses obtained in the subquery. nested in the where clause of an outer query.

We use the not in construct in a similar way as in.

Example: Find all the courses taught in the Fall 2017 semester but not in the Spring 2018 semester.

select distinct course_id
from section
where semester = 'Fall' and year = 2017 and
	course_id not in(select course_id
				from section
				where semester = 'Spring' and year = 2018)

Can also use in and not in operators on enumerated sets:

select distinct name
from instructor
where name not in ('Mozart', 'Einstein')

Also to test for membership in an arbitrary relation in SQL:

  • Example: find the total number of (distinct) students who have taken course sections taught by the instructor with ID 110 011.
select count (distinct ID)
from takes
where (course id, sec id, semester, year) in (select course id, sec id, semester, year
											 from teaches 
											 where teaches.ID= '10101');
 

Clever - find the teachers 10101 that teaches the courses that have a specific section, semester and year, and counts the student that takes those courses found in the subquery!

Set Comparison

Example: Find the names of all instructors whose salary is greater than at least one instructor in the Biology department

In SQL, “greater than at least one” is represented by > some.

select name
from instructor
where salary > some (select salary
					 from instructor
					 where dept name = 'Biology')

The subquery generates all salary values of all instructors in the Biology department. The > some comparison in the where clause of the outer select is true if the salary value of the tuple is greater than at least one member of the set of all salary values for instructors in Biology. So basically returns all name of instructor except the lowest one, since all instructor will have a higher salary than at least one, that is the lowest.

SQL allows:

  • < some and < all
  • <= some and <= all
  • >= some and >= all
  • = some and = all
  • <> some and <> all = some is identical to in, but <>some is not the same as not in. Similarly for all.

Another example: Find the names of all instructors that have a salary value greater than that of each instructor in the Biology department.

”greater than all” is > all

select name
from instructor
where salary > all (select salary
					from instructor
					where dept name = 'Biology')
 

Does this outer query only return one tuple? Since only one instructor can have a higher salary than all instructors right???

Another example: Find the departments that have the highest average salary.

  • Start writing query to find all average salaries, then nest it as a subquery of a larger query that finds those departments for which the average salary is greater than or equal to all average salaries.
select dept_name
from instructor
group by dept_name
having avg (salary) >= all (select avg (salary)
							from instructor
							group by dept name)

Test Empty Relations

SQL includes a feature for testing whether a subquery has any tuples in its results. The exists construct returns the value true if the argument subquery is nonempty.

Example: Find all courses taught in both the Fall 2017 semester and in the Spring 2018 semester.

select course id
from section as S
where semester = 'Fall' and year= 2017 and 
exists (select *
		from section as T
		where semester = 'Spring' and year= 2018 and
		S.course id = T .course id)

Another example: Find all students who have taken all courses offered in the Biology department.

Hard for me to understand :(

select S.ID, S.name
from student as S
where not exists ((select course_id
				   from course 
				   where dept name = 'Biology') 
				   except
				   (select T.course_id
				   from takes as T 
				   where S.ID=T.ID)
  • If the “WHERE NOT EXISTS” clause returns false for a student in the “student” table, it means that the student has not taken all the courses offered in the Biology department. Consequently, that student’s ID and name will not be included in the final result set.
  • But from what I can understand is that we first select all courses offered in the Biology department. Then we select all the courses that student S takes. Then we use except to return only the distinct values from the first query that do not exist in the second query. So it returns the courses id that are not taken by each student. The query determines the set of courses that a student has not taken from the Biology department. This set of course IDs is then used to check if there are no missing courses for each student using the “NOT EXISTS” clause.
  1. The subquery retrieves the course IDs from the “course” table where the department name is ‘Biology’. This provides the set of all course IDs offered in the Biology department.
  2. The subquery then uses the “EXCEPT” operator to subtract the course IDs taken by a specific student (retrieved from the “takes” table) from the set of all course IDs offered in the Biology department.
  3. The result of the “EXCEPT” operator is a set of course IDs that the student has not taken from the Biology department.
  4. The “NOT EXISTS” clause checks if there are no records returned by the subquery. If the subquery returns no results, it means that the set of course IDs the student has not taken from the Biology department is empty.
  5. If the set of missing course IDs is empty (i.e., the “NOT EXISTS” clause returns true), it implies that the student has taken all the courses offered in the Biology department. In that case, the student’s ID and name will be included in the final result set. If it returns false, that means the student did not take all courses in the Biology department.

Another example: Find the total number of (distinct) students who have taken course sections taught by the instructor with ID 110 011.

select count (distinct ID)
from takes
where exists (select course_id, sec_id, semester, year
			  from teaches 
			  where teaches.ID= '10101'
			  and takes.course_id = teaches.course_id
			  and takes.sec_id = teaches.sec_id
			  and takes.semester = teaches.semester
			  and takes.year = teaches.year
)
  • The subquery retrieves the relevant columns from the teaches table.
  • where teaches.ID= '10101' This ensures that we are looking for course sections taught by the instructor with the ID ‘10101’.
  • The remaining conditions within the subquery establish the relationship between the “teaches” and “takes” tables. The takes.course id = teaches.course id condition ensures that the course ID matches between the two tables. Similarly, for the rest.
  • The outer query uses the “select count(distinct ID)” statement to count the distinct student IDs. This will give us the total number of distinct students who meet the specified conditions.
  • By using the WHERE EXISTS clause with the subquery, the outer query counts the number of distinct student IDs for which there is at least one record in the “teaches” table that satisfies the specified conditions.
  • In summary, the query counts the distinct student IDs from the “takes” table if there exists a corresponding record in the “teaches” table where the instructor ID is ‘10101’, and the course ID, section ID, semester, and year match between the two tables. The result provides the total number of distinct students who have taken course sections taught by the instructor with the ID ‘10101’.

Test for Absence of Duplicate Tuples

SQL includes a Boolean function for testing whether a subquery has duplicate tuples in its result.

unique construct returns the value true if the argument subquery contains no duplicate tuples. not unique

Danger

The empty set evaluates to true with the unique predicate.

Subqueries in the From Clause

Key concept: select-from-where expression returns a relation a a result and we can insert it into another select-from-where.

Example: Find the average instructors’ salaries of those departments where the average salary is greater than $42,000.

select dept name, avg salary
from (select dept name, avg (salary) as avg salary
	  from instructor 
	  group by dept name)
where avg salary > 42000
  • The outer where clause acts like a having clause.

Another example: Find the maximum across all departments of the total of all instructors’ salaries in each department.

select max (tot salary) 
from (select dept name, sum(salary)
	  from instructor 
	  group by dept name) as dept total (dept name, tot salary)
  • Sums the salary in each department in the subquery and the outer query takes the max salary from all those calculated departments.

Prefix of from clause lateral allows to access attributes of preceding tables or subqueries in the same from clause.

Another example: Print the names of each instructor, along with their salary and the average salary in their department.

select name, salary, avg_salary
from instructor I1, lateral (select avg(salary) as avg_salary
							from instructor I2
							where I2.dept_name = I1.dept_name)
  • subquery have access to I1 correlation variable from the outer query.
  • where I2.dept_name = I1.dept_name makes sure that the department average for that instructor I1 is printed every time for each instructor’s department individually.!!!!

With Clause

The with clause is way of defining temporary relations, where the definition is only available to query in which the clause occurs.

Example: Find all departments where the total salary is greater than the average of the total salary at all department.

with dept_total (dept_name, value) as
	(select dept_name, sum(salary)
	from instructor
	group by dept_name), 
dept total avg(value) as
	(select avg(value)
	from dept_total)
select dept_name 
from dept_total, dept total_avg 
where dept_total.value > dept total_avg.value
  • dept_toal returns the sum of all salary for each department.
  • dept_total_avg returns the average value from all department salaries.
  • Then we can select the department name from those two temporary relations and apply the condition, where we go through all department sums and compare it to the average of all departments.

Scalar Subqueries

Scalar subqueries can occur in select, where and having clauses. May also be defined without aggregates.

Example: lists all departments along with the number of instructors in each department.

select dept_name, (select count(*)
				   from instructor
				   where department.dept name=instructor.deptname)
				as num instructors
from department;

SQL Delete

Can only delete whole tuples. Delete only operates on one relation (table) at a time.

delete from t
where <condition>
  • the where clause can be eliminated if we don’t need it
  • Can also delete using cursors Embedded SQL

We can nest select-from-where in the where clause of a delete request.

Example: Delete the records of all instructors with salary below the average at the university.

delete from instructor
where salary <(select avg(salary)
			  from instructor)
  • Performing all the tests before performing any deletion is important — if some tuples are deleted before other tuples have been tested.

SQL Insert

We either specify a tuple to be inserted or write a query whose result is a set of tuples to be inserted.

insert into course
	values('CS437', 'Database Systems', 'Comp.Sci', 4)
insert into instructor
	select ID, name, dept_name, 18000
	from student
	where dept_name = 'Music' and tot_cred > 144;
  • specified a set of tuples using select
  • evaluates select fully before performing insertion.

SQL Update

Two components:

  1. SET, an assignment of values to attributes; and
  2. WHERE, a search condition.
UPDATE T
SET <assignments>
WHERE <condition>

SQL Multisets

The Answers to SELECT blocks are actually multisets of tuples (remember, this is how we can apply aggregate functions). Therefore, we can actually apply multiset operations on them.

  • Multiset union: Q1Q_1Q1​ UNION ALL Q2Q_2Q2​.
    • behaves like “∨” in range restricted RC with multiset semantics
  • Multiset difference: Q1Q_1Q1​ EXCEPT ALL Q2Q_2Q2​.
    • behaves like “∧¬” in range restricted RC with multiset semantics
  • Multiset intersection: Q1Q_1Q1​ INTERSECT ALL Q2Q_2Q2​.
    • maximum number of tuples common to Q1Q_1Q1​ and Q2Q_2Q2​
    • rarely used