Structured Query Language (SQL)
Basic SQL Commands
A basic SQL block is composed of SELECT
, FROM
and WHERE
- select clause used to list attributes as a result of the query
- from clause is a list of relations to be accessed in the evaluation for the query
- where clause sets some conditions/predicate involving attributes of the relation in the from clause.
order of operations from, where, select
Danger
If you put in multiple tables under
FROM
, what SQL really is doing is taking the cartesian product of the tables (1.FROM
), and then checking the conditions in (2.WHERE
). This can result in slow or wrong answers.
The Cartesian product operation produces rows combining each row from the first table with each row from the second table. For example, each tuple in instructor is combined with every tuple in teaches, even those that refer to a different instructor.
Rename Operation
- For if attributes in the from clause may have the same name, in which case we have a duplicated in the result
- For if we want to perform arithmetic in the select clause on some expression
- Short and for convenience
- Wish to compare tuples in the same relation
The as clause can appear in both the select clause and from clause.
Important
To compare tuples in the same relation
Find the names of all instructors who earn more than the lowest paid instructor in the Biology department OR Find the names of all instructors whose salary is greater than at least one instructor in the Biology department are the same statement:
- T and S: think of them as copies of the relation instructor, they are declared as aliases
String Operations
Pattern matching use the operator like (using two special characters):
- Percent(%): The % character matches any substring
- Underscore: The _ character matches any character
“Find the names of all departments whose building name includes the substring ‘Watson’.” This query can be written as:
Mismatch by using the not like comparison operator.
* in select clause
The asterisk symbol * can be used in the select clause to indicate all attributes.
Ordering the Display of Tuples
The order by clause cause the tuples in the result of a query to appear in sorted order.
In alphabetical order we write:
- By default, order by lists items in ascending order. We can specify the sorting order by adding desc or asc.
Where-Clause Predicates/Conditions
There is between comparison operator to simply where clauses that specify that a value be to some value and to some other value.
Find the names of instructors with salary amounts between 100,000, we can use the between comparison to write:
Similarly, there is a not between comparison operator.
The comparison operators can be used on tuples, and the ordering is defined lexicographically.
- Which is the same as
where instructor.ID = teaches.ID and dept_name = 'Biology'
Set Operations
Union
Performed on two sets. For instance, if we want to find the set of all courses taught either in Fall2017 or in Spring 2018, we use union.
The union operation automatically eliminates duplicates, unlike the select clause. So, if there are two sections offering the same course in Spring 2019, then it will only appear once.
If we want to retain all duplicates, we must write union all.
Intersect
To find the set of all courses taught in both Fall 2017 and Spring 2018 we use intersect. The intersect operation automatically eliminates duplicates. If we want to retain all duplicates, we must write intersect all.
- rarely used?
Except
The except operation outputs all tuples from its first input that do not occur in the second input, it performs set difference.
For example, if four sections of ECE-101 were taught in the Fall 2017 semester and two sections of ECE-101 were taught in the Spring 2018 semester, the result of the except operation would not have any copy of ECE-101.
If we want to retain duplicates, we must write except all.
Null Values
-
The result of an arithmetic expression (+, -, x, /) is null if any of the input values is null.
-
SQL treats as unknown the result of any comparison involving a null value (other than predicates is null and is not null).
-
If where clause condition evaluates to either false or unknown for a tuple, that tuplis is not added to the result.
Aggregate Functions
- Average:
avg()
- Minimum:
min()
- Maximum:
max()
- Total:
sum()
- Count:
count()
Input to sum and avg needs to be numbers.
Careful, need to probably do groub by
after.
Use distinct in the aggregate function to eliminate duplicates.
Group By
Apply the aggregate function not only to a single set of tuples, but also to a group of sets of tuples; we specify this in SQL using the group by clause.
Used to form groups.
Tuples with same value on all attributes in the group by clause are placed in one group.
Example: Find the average salary in each department.
Grouped by the dept_name attribute. And then we compute the average salary of each group resulting to:
Example: Find the number of instructors in each department who teach a course in the Spring 2018 semester
Important
Ensure that the only attributes that appear in the select statement without being aggregated are those that are present in the group by clause. That is, any attribute that is not present in the group by clause may appear in the select clause ONLY as an argument to an aggregate function. —erroneous query
Having
https://www.w3schools.com/sql/sql_having.asp
Having
clause is used to apply a condition to groups rather than to tuples.
The HAVING
clause was added to SQL because the WHERE
keyword cannot be used with aggregate functions.
Example: Find the only departments where the average salary of the instructors is more than 42000$.
Condition applies to each group constructed by the group by clause.
Operations order:
- From clause first to get a relation.
- Where clause condition are applied on the result relation of the from clause.
- Tuples satisfying the where conditions are placed into groups by the group by clause if it is present. If not, the entire set of tuples satisfying the where clause is treated as one group.
- Having clause if present is applied to each group, groups that don’t satisfy the having clause predicate are removed.
- The select clause uses the remaining groups to generate tuples of the result of the query, applying the aggregate functions to get a single result tuple for each group.
Another example: For each course section offered in 2017, find the average total credits (tot_cred) of all students enrolled in the section, if the section has at least 2 students.
Aggregation with Null and Boolean Values
All aggregate functions except count(star) ignore null values in their input collection. As a result of null values being ignored, the collection of values may be empty.
Aggregate function some and every can be applied on a collection of Boolean values and compute or and and of the values.
Nested Subqueries
A subquery is a select-from-where expression that is nested within another query.
Common use of subqueries is to perform tests for set membership, make set comparisons and determine set cardinality by nesting subqueries in the where clause.
Can be nested in:
- where clause
- from clause
- with clause
Set Membership
SQL allows testing tuples for membership in a relation.
- in connective tests for set membership, where set is a collection of values produced by a select clause.
- not in connective test for the absence of set membership
Example: Find all the courses taught in the both the Fall 2017 and Spring 2018 semesters.
Note we need to use distinct. It doesn’t automatically remove duplicates like intersect.
Found all course taught in 2018, then we need to find those courses that were taught in Fall 20-17 and that appear in the set of courses obtained in the subquery. nested in the where clause of an outer query.
We use the not in construct in a similar way as in.
Example: Find all the courses taught in the Fall 2017 semester but not in the Spring 2018 semester.
Can also use in and not in operators on enumerated sets:
Also to test for membership in an arbitrary relation in SQL:
- Example: find the total number of (distinct) students who have taken course sections taught by the instructor with ID 110 011.
Clever - find the teachers 10101 that teaches the courses that have a specific section, semester and year, and counts the student that takes those courses found in the subquery!
Set Comparison
Example: Find the names of all instructors whose salary is greater than at least one instructor in the Biology department
In SQL, “greater than at least one” is represented by > some
.
The subquery generates all salary values of all instructors in the Biology department. The > some
comparison in the where clause of the outer select is true if the salary value of the tuple is greater than at least one member of the set of all salary values for instructors in Biology. So basically returns all name of instructor except the lowest one, since all instructor will have a higher salary than at least one, that is the lowest.
SQL allows:
< some
and< all
<= some
and<= all
>= some
and>= all
= some
and= all
<> some
and<> all
= some
is identical toin
, but<>some
is not the same asnot in
. Similarly forall
.
Another example: Find the names of all instructors that have a salary value greater than that of each instructor in the Biology department.
”greater than all” is > all
Does this outer query only return one tuple? Since only one instructor can have a higher salary than all instructors right???
Another example: Find the departments that have the highest average salary.
- Start writing query to find all average salaries, then nest it as a subquery of a larger query that finds those departments for which the average salary is greater than or equal to all average salaries.
Test Empty Relations
SQL includes a feature for testing whether a subquery has any tuples in its results. The exists construct returns the value true if the argument subquery is nonempty.
Example: Find all courses taught in both the Fall 2017 semester and in the Spring 2018 semester.
Another example: Find all students who have taken all courses offered in the Biology department.
Hard for me to understand :(
- If the “WHERE NOT EXISTS” clause returns false for a student in the “student” table, it means that the student has not taken all the courses offered in the Biology department. Consequently, that student’s ID and name will not be included in the final result set.
- But from what I can understand is that we first select all courses offered in the Biology department. Then we select all the courses that student S takes. Then we use except to return only the distinct values from the first query that do not exist in the second query. So it returns the courses id that are not taken by each student. The query determines the set of courses that a student has not taken from the Biology department. This set of course IDs is then used to check if there are no missing courses for each student using the “NOT EXISTS” clause.
- The subquery retrieves the course IDs from the “course” table where the department name is ‘Biology’. This provides the set of all course IDs offered in the Biology department.
- The subquery then uses the “EXCEPT” operator to subtract the course IDs taken by a specific student (retrieved from the “takes” table) from the set of all course IDs offered in the Biology department.
- The result of the “EXCEPT” operator is a set of course IDs that the student has not taken from the Biology department.
- The “NOT EXISTS” clause checks if there are no records returned by the subquery. If the subquery returns no results, it means that the set of course IDs the student has not taken from the Biology department is empty.
- If the set of missing course IDs is empty (i.e., the “NOT EXISTS” clause returns true), it implies that the student has taken all the courses offered in the Biology department. In that case, the student’s ID and name will be included in the final result set. If it returns false, that means the student did not take all courses in the Biology department.
Another example: Find the total number of (distinct) students who have taken course sections taught by the instructor with ID 110 011.
- The subquery retrieves the relevant columns from the teaches table.
where teaches.ID= '10101'
This ensures that we are looking for course sections taught by the instructor with the ID ‘10101’.- The remaining conditions within the subquery establish the relationship between the “teaches” and “takes” tables. The
takes.course id = teaches.course id
condition ensures that the course ID matches between the two tables. Similarly, for the rest. - The outer query uses the “select count(distinct ID)” statement to count the distinct student IDs. This will give us the total number of distinct students who meet the specified conditions.
- By using the
WHERE EXISTS
clause with the subquery, the outer query counts the number of distinct student IDs for which there is at least one record in the “teaches” table that satisfies the specified conditions. - In summary, the query counts the distinct student IDs from the “takes” table if there exists a corresponding record in the “teaches” table where the instructor ID is ‘10101’, and the course ID, section ID, semester, and year match between the two tables. The result provides the total number of distinct students who have taken course sections taught by the instructor with the ID ‘10101’.
Test for Absence of Duplicate Tuples
SQL includes a Boolean function for testing whether a subquery has duplicate tuples in its result.
unique construct returns the value true if the argument subquery contains no duplicate tuples. not unique
Danger
The empty set evaluates to true with the unique predicate.
Subqueries in the From Clause
Key concept: select-from-where expression returns a relation a a result and we can insert it into another select-from-where.
Example: Find the average instructors’ salaries of those departments where the average salary is greater than $42,000.
- The outer where clause acts like a having clause.
Another example: Find the maximum across all departments of the total of all instructors’ salaries in each department.
- Sums the salary in each department in the subquery and the outer query takes the max salary from all those calculated departments.
Prefix of from clause lateral allows to access attributes of preceding tables or subqueries in the same from clause.
Another example: Print the names of each instructor, along with their salary and the average salary in their department.
- subquery have access to I1 correlation variable from the outer query.
where I2.dept_name = I1.dept_name
makes sure that the department average for that instructor I1 is printed every time for each instructor’s department individually.!!!!
With Clause
The with clause is way of defining temporary relations, where the definition is only available to query in which the clause occurs.
Example: Find all departments where the total salary is greater than the average of the total salary at all department.
- dept_toal returns the sum of all salary for each department.
- dept_total_avg returns the average value from all department salaries.
- Then we can select the department name from those two temporary relations and apply the condition, where we go through all department sums and compare it to the average of all departments.
Scalar Subqueries
Scalar subqueries can occur in select, where and having clauses. May also be defined without aggregates.
Example: lists all departments along with the number of instructors in each department.
SQL Delete
Can only delete whole tuples. Delete only operates on one relation (table) at a time.
- the where clause can be eliminated if we don’t need it
- Can also delete using cursors Embedded SQL
We can nest select-from-where in the where clause of a delete request.
Example: Delete the records of all instructors with salary below the average at the university.
- Performing all the tests before performing any deletion is important — if some tuples are deleted before other tuples have been tested.
SQL Insert
We either specify a tuple to be inserted or write a query whose result is a set of tuples to be inserted.
- specified a set of tuples using select
- evaluates select fully before performing insertion.
SQL Update
Two components:
- SET, an assignment of values to attributes; and
- WHERE, a search condition.
SQL Multisets
The Answers to SELECT blocks are actually multisets of tuples (remember, this is how we can apply aggregate functions). Therefore, we can actually apply multiset operations on them.
- Multiset union: Q1Q_1Q1
UNION ALL
Q2Q_2Q2.- behaves like “∨” in range restricted RC with multiset semantics
- Multiset difference: Q1Q_1Q1
EXCEPT ALL
Q2Q_2Q2.- behaves like “∧¬” in range restricted RC with multiset semantics
- Multiset intersection: Q1Q_1Q1
INTERSECT ALL
Q2Q_2Q2.- maximum number of tuples common to Q1Q_1Q1 and Q2Q_2Q2
- rarely used