logo

Cloud Spanner Cheatsheet

Cloud Spanner is Google Cloud's globally distributed, externally consistent, relational database service that offers high availability, strong consistency, and horizontal scalability.

Core Concepts

  • Instance: A logical container for Spanner databases. Instances define geographic placement (regional or multi-region) and compute capacity (nodes or processing units).
    • Regional: Data replicated within a single Google Cloud region for lower latency.
    • Multi-Region: Data replicated across multiple regions for higher availability and global reads.
    • Compute Capacity: Measured in Nodes (1000 Processing Units) or Processing Units (PUs, increments of 100). Determines compute/storage limits and performance.
  • Database: Contains tables, indexes, and data. Resides within an instance. Uses Google Standard SQL dialect.
  • Schema: Defined using Data Definition Language (DDL). Includes tables, columns, data types, indexes, and relationships (interleaving).
  • Tables: Standard relational tables.
  • Primary Key: Uniquely identifies rows. Crucial for data distribution and locality. Avoid monotonically increasing keys (like simple sequences or timestamps) for the first part of the key to prevent hotspots.
  • Interleaving: Physically co-locates child table rows with their parent row using INTERLEAVE IN PARENT. Improves performance for common parent-child joins.
  • Secondary Index: Allows efficient querying on non-primary key columns. Can include (STORING) non-key columns to cover queries. Can be interleaved.
  • Transactions: Spanner provides ACID guarantees.
    • Read-Write Transactions: Standard transactions for reads and writes. Use locking.
    • Read-Only Transactions: Do not lock. Can perform strong reads (most up-to-date) or stale reads (read data from a specific timestamp in the past, lower latency).
  • Mutations: API calls for inserting, updating, or deleting data (alternative to DML statements, often preferred programmatically for batching).
  • Query Optimizer: Analyzes queries and generates execution plans. Use EXPLAIN or Query Insights to understand performance.
  • Google Standard SQL: Spanner uses this SQL dialect (ANSI 2011 compliant with extensions).

gcloud CLI Commands (gcloud spanner ...)

Replace [PLACEHOLDERS] with your values.

Instances

  • List instances:

    gcloud spanner instances list
    
  • Describe an instance:

    gcloud spanner instances describe [INSTANCE_ID]
    
  • Create an instance:

    # Regional (e.g., us-central1)
    gcloud spanner instances create [INSTANCE_ID] \
        --config=regional-us-central1 \
        --description="My Regional Instance" \
        --processing-units=100 # Or --nodes=1
    
    # Multi-Region (e.g., nam7)
    gcloud spanner instances create [INSTANCE_ID] \
        --config=nam7 \
        --description="My Multi-Region Instance" \
        --nodes=3 # Or --processing-units=3000
    
  • Update instance capacity:

    gcloud spanner instances update [INSTANCE_ID] --nodes=5
    # OR
    gcloud spanner instances update [INSTANCE_ID] --processing-units=500
    
  • Delete an instance:

    gcloud spanner instances delete [INSTANCE_ID]
    

Databases

  • List databases in an instance:
    gcloud spanner databases list --instance=[INSTANCE_ID]
    
  • Describe a database:
    gcloud spanner databases describe [DATABASE_ID] --instance=[INSTANCE_ID]
    
  • Create a database (optionally with initial DDL):
    gcloud spanner databases create [DATABASE_ID] --instance=[INSTANCE_ID] \
        --ddl="CREATE TABLE Singers (SingerId INT64 NOT NULL, Name STRING(MAX)) PRIMARY KEY (SingerId)"
    
  • Update database schema (DDL):
    gcloud spanner databases ddl update [DATABASE_ID] --instance=[INSTANCE_ID] \
        --ddl="ALTER TABLE Singers ADD COLUMN BirthDate DATE"
    
  • Delete a database:
    gcloud spanner databases delete [DATABASE_ID] --instance=[INSTANCE_ID]
    

Database Operations (DDL, Backups)

  • List long-running operations:
    gcloud spanner operations list --instance=[INSTANCE_ID]
    
  • Describe an operation:
    gcloud spanner operations describe [OPERATION_ID] --instance=[INSTANCE_ID]
    

IAM (Permissions)

  • Get IAM policy for an instance:
    gcloud spanner instances get-iam-policy [INSTANCE_ID]
    
  • Set IAM policy for an instance (use with caution - replaces policy):
    gcloud spanner instances set-iam-policy [INSTANCE_ID] [POLICY_FILE.json]
    
  • Add IAM policy binding:
    gcloud spanner instances add-iam-policy-binding [INSTANCE_ID] \
        --member="user:[email protected]" \
        --role="roles/spanner.databaseUser"
    

SQL (Google Standard SQL Dialect)

Data Definition Language (DDL)

  • Create Table:

    CREATE TABLE Albums (
      SingerId     INT64 NOT NULL,
      AlbumId      INT64 NOT NULL,
      AlbumTitle   STRING(MAX),
      ReleaseDate  DATE
    ) PRIMARY KEY (SingerId, AlbumId),
      INTERLEAVE IN PARENT Singers ON DELETE CASCADE; -- Example interleaving
    
  • Alter Table:

    ALTER TABLE Albums ADD COLUMN MarketingBudget NUMERIC;
    
  • Drop Table:

    DROP TABLE Albums;
    
  • Create Index:

    CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle);
    
    -- Index with stored columns (covering index)
    CREATE INDEX AlbumsByTitleStoringDate ON Albums(AlbumTitle) STORING (ReleaseDate);
    
  • Drop Index:

    DROP INDEX AlbumsByAlbumTitle;
    

Data Manipulation Language (DML)

  • Insert:
    INSERT INTO Singers (SingerId, Name, BirthDate)
    VALUES (1, 'Alice', '1990-05-15'),
           (2, 'Bob', NULL);
    
  • Update:
    UPDATE Singers
    SET BirthDate = '1991-12-01'
    WHERE SingerId = 2;
    
  • Delete:
    DELETE FROM Singers
    WHERE SingerId = 1;
    
  • Batch DML (Execute multiple DML statements atomically):
    START BATCH DML;
    INSERT INTO Singers (SingerId, Name) VALUES (3, 'Charlie');
    UPDATE Albums SET MarketingBudget = 50000 WHERE SingerId = 2 AND AlbumId = 1;
    COMMIT BATCH; -- Or RUN BATCH
    
    Note: Mutations API is often preferred over Batch DML for programmatic batching.

Querying (SELECT)

  • Basic Select:
    SELECT SingerId, Name FROM Singers WHERE BirthDate > '1990-01-01' ORDER BY Name;
    
  • Joins:
    SELECT s.Name, a.AlbumTitle
    FROM Singers AS s
    JOIN Albums AS a ON s.SingerId = a.SingerId
    WHERE a.ReleaseDate >= '2020-01-01';
    
  • Query Hints (e.g., force index usage):
    SELECT AlbumId, AlbumTitle FROM Albums@{FORCE_INDEX=AlbumsByAlbumTitle}
    WHERE AlbumTitle = 'My Great Album';
    

Transactions

  • Read-Write Transaction (Conceptual - actual implementation depends on client library):
    -- In client library code: Start ReadWrite Transaction
    SELECT ... ;
    UPDATE ... ;
    INSERT ... ;
    -- In client library code: Commit Transaction
    
  • Read-Only Transaction (Conceptual):
    -- In client library code: Start ReadOnly Transaction (can specify strong or staleness)
    SELECT ... ;
    SELECT ... ;
    -- In client library code: Close Transaction (no commit needed)
    

Key Data Types

  • BOOL
  • INT64 (64-bit integer)
  • FLOAT64 (double precision floating point)
  • NUMERIC (lossless decimal, preferred for financial data)
  • STRING(len | MAX)
  • BYTES(len | MAX)
  • DATE
  • TIMESTAMP (nanosecond precision, UTC)
  • JSON
  • ARRAY<type> (Arrays of most other types, except ARRAY and STRUCT)
  • STRUCT<field_name type, ...>

Performance & Best Practices

  • Schema Design:
    • Choose Primary Keys carefully to avoid hotspots (avoid sequential leading keys). Consider UUIDs, hash-based keys, or reversing sequential IDs.
    • Use Interleaving for parent-child relationships frequently accessed together.
    • Use Secondary Indexes for common query patterns on non-key columns. Use STORING to cover queries.
  • Writes:
    • Use the Mutations API for batching writes programmatically.
    • Avoid excessively large transactions.
  • Reads:
    • Use Read-Only transactions when possible.
    • Use Stale Reads (exact_staleness or max_staleness) for lower latency when slightly old data is acceptable.
  • Monitoring:
    • Monitor CPU utilization (keep below ~65% for regional, ~45% for multi-region).
    • Watch latency, QPS, lock contention (spanner.googleapis.com/transaction/lock_wait_seconds).
    • Use Query Insights to identify slow or expensive queries.
  • Query Optimization:
    • Analyze query plans using EXPLAIN (via client libraries or Cloud Console).
    • Ensure queries use appropriate indexes.
    • Avoid SELECT * if only specific columns are needed.
  • Instance Sizing: Start with appropriate compute capacity and scale up/down based on monitoring.

Pricing Components

  • Compute Capacity: Per node-hour or PU-hour (billed hourly).
  • Database Storage: Per GB-month (includes tables, indexes, metadata).
  • Backup Storage: Per GB-month.
  • Network Egress: Per GB (egress between regions or out to internet).

Further Resources