Cheatsheets - SQL
Count
SELECT count(*) from classics;
Distinct
select distinct author from classics;
Delete
DELETE FROM classics WHERE title='Little Dorrit';
Query
SELECT author,title FROM classics;
Exact Match:
SELECT author,title FROM classics WHERE author="Mark Twain";
Approx Match:
SELECT author,title FROM classics WHERE author LIKE "Charles%";
SELECT author,title FROM classics WHERE title LIKE "%Species";
SELECT author,title FROM classics WHERE title LIKE "%and%";
Limit the number of results:
SELECT author,title FROM classics LIMIT 3;
SELECT author,title FROM classics LIMIT 1,2;
SELECT author,title FROM classics LIMIT 3,1; (omit first 3, starting from 4, select 1)
MATCH...AGAINST: search in FULLTEXT, could contain multiple words return nothing since 'and' is a stopword:
SELECT author,title FROM classics
WHERE MATCH(author,title) AGAINST('and');
SELECT author,title FROM classics
WHERE MATCH(author,title) AGAINST('old shop');
SELECT author,title FROM classics
WHERE MATCH(author,title) AGAINST('tom sawyer');
SELECT author,title FROM classics
WHERE MATCH(author,title)
AGAINST('+charles -species' IN BOOLEAN MODE);
SELECT author,title FROM classics
WHERE MATCH(author,title)
AGAINST('"origin of"' IN BOOLEAN MODE);
Order by:
SELECT author,title FROM classics ORDER BY author;
SELECT author,title FROM classics ORDER BY title DESC;
SELECT author,title,year FROM classics ORDER BY author,year DESC;
SELECT author,title,year FROM classics ORDER BY author ASC,year DESC;
Group by:
SELECT category,COUNT(author) FROM classics GROUP BY category;
Join:
SELECT name,author,title from customers,classics
WHERE customers.isbn=classics.isbn;
SELECT name,author,title FROM customers
JOIN classics ON customers.isbn=classics.isbn;
SELECT name,author,title FROM customers NATURAL JOIN classics;
As:
SELECT name,author,title from
customers AS cust, classics AS class WHERE cust.isbn=class.isbn;
SELECT author,title FROM classics WHERE
author LIKE "Charles%" AND author LIKE "%Darwin";
SELECT author,title FROM classics WHERE
author LIKE "%Mark Twain%" OR author LIKE "%Samuel Langhorne Clemens%";
SELECT author,title FROM classics WHERE
author LIKE "Charles%" AND author NOT LIKE "%Darwin";
Find MIN
select MIN(DATE) from SomeTable;
Join array to string
SELECT ARRAY_TO_STRING(arr, ",")
FROM ...
title: SQL Interview Questions topic: interview
What are the SQL Standards and Dialects?
The latest SQL standard is SQL:2016.
Systems that support standard SQL:
- Presto: used by Facebook, open source, written in Java, successor of Hive (Presto is based on ANSI SQL but HiveQL is not)
- BigQuery / Spanner, used by Google, both use Google's Standard SQL, based on ANSI SQL:2011
BigQuery and Presto are optimized for batch/read queries.
Dialects / Extensions:
- PL/SQL: Oracle's procedural extension for SQL
- PL/pgSQL: PostgreSQL's version.
- Transact-SQL (T-SQL): Microsoft's extension to SQL
What are DQL, DDL, DCL and DML?
Read More: DQL vs DDL vs DCL vs DML
What are different types of joins?
INNER JOIN
: returns matched rows in both tables- OUTER JOIN:
LEFT JOIN
: returns all rows from the left table and the matched rows from the right table.RIGHT JOIN
: returns all rows from the right table and the matched rows from the left table.FULL JOIN
: returns all the rows from both joined tables, whether they have a matching row or not.
CROSS JOIN
: a cartesian product of the two tables.
How to speed up queries?
- Avoid using
SELECT *
. Applying aLIMIT
clause to aSELECT *
query does not affect the amount of data read - Keep data denormalized. "The storage savings from normalized data are less of a concern in modern systems"
- "Trim the data as early in the query as possible, before the query performs a
JOIN
" JOIN
order: place the largest table first, followed by the smallest, and then by decreasing size.
What is a primary key?
A primary key is a column that can uniquely identify the rows.
What are contraints?
E.g. NOT NULL
, UNIQUE
, PRIMARY KEY
, FOREIGN KEY
.
2 Types: column level constraints and table level constraints.
What is a Trigger in SQL?
A trigger is a database object that is associated with a table. It will be activated when a defined action is executed for the table. E.g. when you run INSERT
, UPDATE
and DELETE
.
What's the difference between WHERE
and HAVING
?
WHERE
: filter all the rows.HAVING
: used afterGROUP BY
, filter grouped results.
What is normalization?
Normalization: decompose tables to remove redundant data. It devides large tables into smaller tables linked by keys.
Normal forms:
- 1NF (First Normal Form)
- Each table cell should contain a single value.
- Each record needs to be unique.
- 2NF (Second Normal Form)
- 1NF + Single Column Primary Key that does not functionally dependant on any subset of candidate key relation.
- 3NF (Third Normal Form)
- 2NF + NO transitive functional dependencies.
- BCNF (Boyce-Codd Normal Form), a.k.a. 3.5NF, to address certain types of anomalies not dealt with by 3NF as originally defined
- 3NF + if a trivial functional dependency X -> Y exist for a relation; then X must be a super key.
- 4NF (Fourth Normal Form)
- no database table instance contains two or more, independent and multivalued data describing the relevant entity
- 5NF (Fifth Normal Form)
- 4NF + it cannot be decomposed into any number of smaller tables without loss of data.
- 6NF (Sixth Normal Form)
What is Denormalization?
The reverse process of normalization: use redundancy to improve performance.
What is transitive functional dependency?
A transitive dependency exists when you have the following functional dependency pattern: A->B and B->C; therefore A->C.
For example: Book -> Author, Author -> Author age, therefore Book -> Author age.
Transitive functional dependency basically means when changing a non-key column, it may cause any of the other non-key columns to change.
What is a Primary Key?
A primary key uniquely identifies a row in a table.
What is a Composite Key?
Composite Key: consists of multiple columns that together uniquely identify a row.
What is a Foreign Key?
A foreign key referneces the primary key of another table.
- it can have a different name from the primary key.
- they do not have to be unique.
- foreign key can be null; primary key cannot.
What is the difference between UDF and TVF?
Both are user-defined. TVF (table-valued function) returns a table. They are different in different systems. For example:
- In BigQuery, UDF returns the result as a value (SQL UDFs are created by
CREATE FUNCTION
orCREATE TEMP FUNCTION
). TVF returns a table (created byCREATE TABLE FUNCTION
). - In SQL Server, there are 2 kinds of UDF: SVF (Scalar-valued Function) and TVF (Table-valued Function).
What is the difference between DELETE
, TRUNCATE
and DROP
?
DELETE
: delete rows from the table based on the condition specified byWHERE
.TRUNCATE
: delete all the rows and free the space.DROP
: drop the table.
What is Pattern Matching in SQL?
Wildcards:
%
matches 0 or more characters._
matches exactly one character.
-- matches any string that contains `bar`
SELECT * FROM Foo WHERE field LIKE '%bar%';
-- matches any string with exact 3 characters
SELECT * FROM Foo WHERE field LIKE '___';
How to create a table?
It might be different in different engines. For example in Presto:
CREATE TABLE IF NOT EXISTS
table_name (
xxx_names ARRAY<VARCHAR>,
ds VARCHAR
)
How to insert values?
INSERT INTO table_name(column1, column2, ...)
VALUES (value1, value2, ...);
How to use CASE
/ WHEN
?
Create is_not_null
column
CASE WHEN <column> IS NOT NULL THEN 1 ELSE 0 END AS is_not_null
How to calculate MAX_BY / MIN_BY?
Presto provides MAX_BY
and MIN_BY
, but BigQuery does not, here's an alternative:
ARRAY_AGG(field1 ORDER BY field2)[OFFSET(0)]
How to calculate Cumulative sum?
SUM(a) OVER (ORDER BY a DESC ROWS UNBOUNDED PRECEDING)
How to calculate % of overall?
SUM(a) / SUM(SUM(a)) OVER ()
How to calculate Conditional SUM?
If bool_val
is true, add up the val.
SUM(IF(bool_val, val, 0))
How to Join ARRAY to STRING
SELECT ARRAY_TO_STRING(arr, ",")
FROM ...
How to transpose a table?
Most of SQL engines do not have built-in functions to convert rows into columns, (a.k.a "transpose" the table). In order to convert a table like this:
id | key | value
-----|-------|------------
id_0 | key_0 | value_1
id_0 | key_1 | value_10
id_1 | key_1 | value_100
id_2 | key_0 | value_1000
To this, so each key is a separate column:
id | key_0 | key_1
-----|------------|------------
id_0 | value_1 | value_10
id_1 | null | value_100
id_2 | value_1000 | null
This can be achieved by GROUP
(by id) and ARRAY_AGG
and filter out un-related info:
-- prep the table
WITH data AS (
SELECT 'id_0' AS id, 'key_0' AS key, 'value_1' AS value
UNION ALL SELECT 'id_0' AS id, 'key_1' AS key, 'value_10' AS value
UNION ALL SELECT 'id_1' AS id, 'key_1' AS key, 'value_100' AS value
UNION ALL SELECT 'id_2' AS id, 'key_0' AS key, 'value_1000' AS value
)
-- actual pivot logic
SELECT
id,
ARRAY_AGG(CASE key WHEN 'key_0' THEN value ELSE NULL END IGNORE NULLS)[SAFE_OFFSET(0)] AS key_0,
ARRAY_AGG(CASE key WHEN 'key_1' THEN value ELSE NULL END IGNORE NULLS)[SAFE_OFFSET(0)] AS key_1
FROM data
GROUP BY 1
Where CASE key WHEN 'key_0' THEN value ELSE NULL END
will nullify values not related to this key, and IGNORE NULLS
will make sure only relevant values goes into ARRAY_AGG
.
What is UNNEST
?
UNNEST
: expands an ARRAY into a relation (i.e. table)
- Presto: supports
ARRAY
to 1 column andMAP
to 2 columns table - BigQuery: only supports
ARRAY
to 1 column table
For example:
SELECT ARRAY[1, 2, 3]
Result is one row
f0_
---
[1, 2, 3]
Use UNNEST
SELECT * FROM UNNEST(ARRAY[1, 2, 3]) AS a
The result is a table of one column
a
---
1
2
3
Equivalent to a map
function:
SELECT a*a FROM UNNEST(ARRAY[1, 2, 3]) AS a
a
---
1
4
9
SELECT ARRAY(SELECT a*a FROM UNNEST(ARRAY[1, 2, 3]) AS a)
f0_
---
[1,4,9]
SELECT ARRAY_TO_STRING(
ARRAY(SELECT CAST(a * a AS STRING) FROM UNNEST(ARRAY[1, 2, 3]) AS a), ","
)
f0_
---
1,4,9
What is Rollup?
ROLLUP(YEAR, MONTH, DAY)
With a ROLLUP
, it will have the following outputs:
YEAR, MONTH, DAY
YEAR, MONTH
YEAR
()
What is Cube?
With CUBE
, it will have the following:
YEAR, MONTH, DAY
YEAR, MONTH
YEAR, DAY
YEAR
MONTH, DAY
MONTH
DAY
()
CUBE
essentially contains every possible rollup scenario for each node whereas ROLLUP
will keep the hierarchy in tact (so it won't skip MONTH and show YEAR/DAY, whereas CUBE will)
How to calculate SUM of a repeated field?
message TopLevelMessage {
repeated FooMessage foos = 1;
...
}
message FooMessage {
int64 bar = 1;
int64 baz = 2;
...
}
SELECT
...
(SELECT SUM(bar) FROM UNNEST(foos)) AS sum_bar
FROM table
How to Change password?
For example in PostgreSQL:
ALTER ROLE username
WITH PASSWORD 'password';