Cloud Spanner Cheatsheet

Cloud Spanner is Google Cloud's globally distributed, externally consistent, relational database service that offers high availability, strong consistency, and horizontal scalability.

Core Concepts

Instance: A logical container for Spanner databases. Instances define geographic placement (regional or multi-region) and compute capacity (nodes or processing units).
- Regional: Data replicated within a single Google Cloud region for lower latency.
- Multi-Region: Data replicated across multiple regions for higher availability and global reads.
- Compute Capacity: Measured in Nodes (1000 Processing Units) or Processing Units (PUs, increments of 100). Determines compute/storage limits and performance.
Database: Contains tables, indexes, and data. Resides within an instance. Uses Google Standard SQL dialect.
Schema: Defined using Data Definition Language (DDL). Includes tables, columns, data types, indexes, and relationships (interleaving).
Tables: Standard relational tables.
Primary Key: Uniquely identifies rows. Crucial for data distribution and locality. Avoid monotonically increasing keys (like simple sequences or timestamps) for the first part of the key to prevent hotspots.
Interleaving: Physically co-locates child table rows with their parent row using INTERLEAVE IN PARENT. Improves performance for common parent-child joins.
Secondary Index: Allows efficient querying on non-primary key columns. Can include (STORING) non-key columns to cover queries. Can be interleaved.
Transactions: Spanner provides ACID guarantees.
- Read-Write Transactions: Standard transactions for reads and writes. Use locking.
- Read-Only Transactions: Do not lock. Can perform strong reads (most up-to-date) or stale reads (read data from a specific timestamp in the past, lower latency).
Mutations: API calls for inserting, updating, or deleting data (alternative to DML statements, often preferred programmatically for batching).
Query Optimizer: Analyzes queries and generates execution plans. Use EXPLAIN or Query Insights to understand performance.
Google Standard SQL: Spanner uses this SQL dialect (ANSI 2011 compliant with extensions).

`gcloud` CLI Commands (`gcloud spanner ...`)

Replace [PLACEHOLDERS] with your values.

Instances

List instances:
```
gcloud spanner instances list
```

Describe an instance:

gcloud spanner instances describe [INSTANCE_ID]

Create an instance:

# Regional (e.g., us-central1)
gcloud spanner instances create [INSTANCE_ID] \
    --config=regional-us-central1 \
    --description="My Regional Instance" \
    --processing-units=100 # Or --nodes=1

# Multi-Region (e.g., nam7)
gcloud spanner instances create [INSTANCE_ID] \
    --config=nam7 \
    --description="My Multi-Region Instance" \
    --nodes=3 # Or --processing-units=3000

Update instance capacity:

gcloud spanner instances update [INSTANCE_ID] --nodes=5
# OR
gcloud spanner instances update [INSTANCE_ID] --processing-units=500

Delete an instance:

gcloud spanner instances delete [INSTANCE_ID]

Databases

List databases in an instance:

gcloud spanner databases list --instance=[INSTANCE_ID]

Describe a database:

gcloud spanner databases describe [DATABASE_ID] --instance=[INSTANCE_ID]

Create a database (optionally with initial DDL):

gcloud spanner databases create [DATABASE_ID] --instance=[INSTANCE_ID] \
    --ddl="CREATE TABLE Singers (SingerId INT64 NOT NULL, Name STRING(MAX)) PRIMARY KEY (SingerId)"

Update database schema (DDL):

gcloud spanner databases ddl update [DATABASE_ID] --instance=[INSTANCE_ID] \
    --ddl="ALTER TABLE Singers ADD COLUMN BirthDate DATE"

Delete a database:

gcloud spanner databases delete [DATABASE_ID] --instance=[INSTANCE_ID]

Database Operations (DDL, Backups)

List long-running operations:

gcloud spanner operations list --instance=[INSTANCE_ID]

Describe an operation:

gcloud spanner operations describe [OPERATION_ID] --instance=[INSTANCE_ID]

IAM (Permissions)

Get IAM policy for an instance:

gcloud spanner instances get-iam-policy [INSTANCE_ID]

Set IAM policy for an instance (use with caution - replaces policy):

gcloud spanner instances set-iam-policy [INSTANCE_ID] [POLICY_FILE.json]

Add IAM policy binding:

gcloud spanner instances add-iam-policy-binding [INSTANCE_ID] \
    --member="user:[email protected]" \
    --role="roles/spanner.databaseUser"

SQL (Google Standard SQL Dialect)

Data Definition Language (DDL)

Create Table:

CREATE TABLE Albums (
  SingerId     INT64 NOT NULL,
  AlbumId      INT64 NOT NULL,
  AlbumTitle   STRING(MAX),
  ReleaseDate  DATE
) PRIMARY KEY (SingerId, AlbumId),
  INTERLEAVE IN PARENT Singers ON DELETE CASCADE; -- Example interleaving

Alter Table:

ALTER TABLE Albums ADD COLUMN MarketingBudget NUMERIC;

Drop Table:
```
DROP TABLE Albums;
```

Create Index:

CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle);

-- Index with stored columns (covering index)
CREATE INDEX AlbumsByTitleStoringDate ON Albums(AlbumTitle) STORING (ReleaseDate);

Drop Index:
```
DROP INDEX AlbumsByAlbumTitle;
```

Data Manipulation Language (DML)

Insert:

INSERT INTO Singers (SingerId, Name, BirthDate)
VALUES (1, 'Alice', '1990-05-15'),
       (2, 'Bob', NULL);

Update:

UPDATE Singers
SET BirthDate = '1991-12-01'
WHERE SingerId = 2;

Delete:

DELETE FROM Singers
WHERE SingerId = 1;

Batch DML (Execute multiple DML statements atomically):

START BATCH DML;
INSERT INTO Singers (SingerId, Name) VALUES (3, 'Charlie');
UPDATE Albums SET MarketingBudget = 50000 WHERE SingerId = 2 AND AlbumId = 1;
COMMIT BATCH; -- Or RUN BATCH

Note: Mutations API is often preferred over Batch DML for programmatic batching.

Querying (SELECT)

Basic Select:

SELECT SingerId, Name FROM Singers WHERE BirthDate > '1990-01-01' ORDER BY Name;

Joins:

SELECT s.Name, a.AlbumTitle
FROM Singers AS s
JOIN Albums AS a ON s.SingerId = a.SingerId
WHERE a.ReleaseDate >= '2020-01-01';

Query Hints (e.g., force index usage):

SELECT AlbumId, AlbumTitle FROM Albums@{FORCE_INDEX=AlbumsByAlbumTitle}
WHERE AlbumTitle = 'My Great Album';

Transactions

Read-Write Transaction (Conceptual - actual implementation depends on client library):

-- In client library code: Start ReadWrite Transaction
SELECT ... ;
UPDATE ... ;
INSERT ... ;
-- In client library code: Commit Transaction

Read-Only Transaction (Conceptual):

-- In client library code: Start ReadOnly Transaction (can specify strong or staleness)
SELECT ... ;
SELECT ... ;
-- In client library code: Close Transaction (no commit needed)

Key Data Types

BOOL
INT64 (64-bit integer)
FLOAT64 (double precision floating point)
NUMERIC (lossless decimal, preferred for financial data)
STRING(len | MAX)
BYTES(len | MAX)
DATE
TIMESTAMP (nanosecond precision, UTC)
JSON
ARRAY<type> (Arrays of most other types, except ARRAY and STRUCT)
STRUCT<field_name type, ...>

Performance & Best Practices

Schema Design:
- Choose Primary Keys carefully to avoid hotspots (avoid sequential leading keys). Consider UUIDs, hash-based keys, or reversing sequential IDs.
- Use Interleaving for parent-child relationships frequently accessed together.
- Use Secondary Indexes for common query patterns on non-key columns. Use STORING to cover queries.
Writes:
- Use the Mutations API for batching writes programmatically.
- Avoid excessively large transactions.
Reads:
- Use Read-Only transactions when possible.
- Use Stale Reads (exact_staleness or max_staleness) for lower latency when slightly old data is acceptable.
Monitoring:
- Monitor CPU utilization (keep below ~65% for regional, ~45% for multi-region).
- Watch latency, QPS, lock contention (spanner.googleapis.com/transaction/lock_wait_seconds).
- Use Query Insights to identify slow or expensive queries.
Query Optimization:
- Analyze query plans using EXPLAIN (via client libraries or Cloud Console).
- Ensure queries use appropriate indexes.
- Avoid SELECT * if only specific columns are needed.
Instance Sizing: Start with appropriate compute capacity and scale up/down based on monitoring.

Pricing Components

Compute Capacity: Per node-hour or PU-hour (billed hourly).
Database Storage: Per GB-month (includes tables, indexes, metadata).
Backup Storage: Per GB-month.
Network Egress: Per GB (egress between regions or out to internet).