logo

Protocol Buffers (Protobuf) Cheatsheet

Core Concepts

  • .proto Files: Define the data structure (messages) and services.
  • Messages: The primary data structure unit, similar to a class or struct. Contains typed fields.
  • Fields: Named and typed components within a message. Each field has a unique number.
  • Field Numbers: Unique numeric identifier for each field within a message (1, 2, 3,...). Crucial for binary encoding and backward/forward compatibility. Cannot be changed once in use.
  • Scalar Types: Basic data types like integers, floats, booleans, strings, bytes.
  • Enums: Define a set of named constants.
  • Compiler (protoc): Generates data access classes/structs in your chosen language from .proto definitions.
  • Wire Format: Compact binary encoding optimized for size and speed. Not self-describing (needs the .proto definition to interpret).

Basic Syntax (.proto File)

// Specify the syntax version (proto3 is recommended)
syntax = "proto3";

// Optional: Define a package to prevent name clashes
package my.project.protos;

// Optional: Import definitions from other .proto files
import "google/protobuf/timestamp.proto";

// Define a message structure
message Person {
  // Field definition: type name = field_number;
  string name = 1;
  int32 id = 2; // Unique ID number for the Person. Cannot be changed.
  string email = 3;
  bool is_active = 4;

  // Enum defined inside the message (scoped)
  enum PhoneType {
    PHONE_TYPE_UNSPECIFIED = 0; // Zero value must be first for proto3 enums
    MOBILE = 1;
    HOME = 2;
    WORK = 3;
  }

  // Repeated field (list/array) of another message type
  repeated PhoneNumber phones = 5;

  // Use an imported type
  google.protobuf.Timestamp last_updated = 6;
}

// Define another message type
message PhoneNumber {
  string number = 1;
  Person.PhoneType type = 2; // Use the nested enum type
}

// Define an enum at the top level
enum Corpus {
  CORPUS_UNSPECIFIED = 0;
  UNIVERSAL = 1;
  WEB = 2;
  IMAGES = 3;
  LOCAL = 4;
  NEWS = 5;
  PRODUCTS = 6;
  VIDEO = 7;
}

// Comments use C++/Java style
// Single line comment

/*
Multi-line comment
*/

Scalar Data Types

.proto Type Notes C++ Type Java Type Python Type Go Type C# Type
double 64-bit float double double float float64 double
float 32-bit float float float float float32 float
int32 32-bit integer (variable encoding) int32_t int int int32 int
int64 64-bit integer (variable encoding) int64_t long int or long int64 long
uint32 32-bit unsigned integer uint32_t int int or long uint32 uint
uint64 64-bit unsigned integer uint64_t long int or long uint64 ulong
sint32 Signed 32-bit integer (ZigZag encoding) int32_t int int int32 int
sint64 Signed 64-bit integer (ZigZag encoding) int64_t long int or long int64 long
fixed32 Fixed 32-bit integer (always 4 bytes) uint32_t int int uint32 uint
fixed64 Fixed 64-bit integer (always 8 bytes) uint64_t long int or long uint64 ulong
sfixed32 Signed Fixed 32-bit integer int32_t int int int32 int
sfixed64 Signed Fixed 64-bit integer int64_t long int or long int64 long
bool Boolean true/false bool boolean bool bool bool
string UTF-8 encoded string std::string String str (py3) unicode (py2) string string
bytes Arbitrary sequence of bytes std::string ByteString bytes []byte ByteString
  • Variable Encoding (int32, int64, etc.): Uses fewer bytes for smaller numbers.
  • ZigZag Encoding (sint32, sint64): More efficient for encoding negative numbers than standard int32/int64.
  • Fixed Encoding (fixed32, etc.): Always uses a fixed number of bytes. More efficient than variable encoding if values are often large.

Field Rules (Proto3)

  • Implicit optional: Fields are optional by default. If a field is not set, it takes its default value (0 for numbers, empty string for strings, false for bools, empty for bytes/repeated/maps, zero enum value). The encoder might omit default-value fields.
  • repeated: The field can appear zero or more times (list/array).
    repeated string tags = 1; // A list of strings
    
  • optional (Keyword): Explicitly marks a field as optional. Allows checking if the field was explicitly set, even if set to its default value. Useful for distinguishing between "not set" and "set to default (e.g., 0 or false)".
    optional int32 page_size = 2; // Can check if page_size was actually provided
    optional bool use_cache = 3;  // Can differentiate not set vs explicitly set to false
    

Field Numbers

  • Unique integers from 1 to 2^29 - 1.
  • Cannot use numbers 19000 through 19999 (reserved).
  • Numbers 1 to 15 use 1 byte for encoding (field number + wire type). Use them for frequently set fields.
  • Numbers 16 to 2047 use 2 bytes.
  • Crucial: Once a field number is used, it should never be changed or reused for a different field in that message to maintain backward/forward compatibility. If deleting a field, reserve its number.
message MyMessage {
  reserved 2, 15; // These field numbers cannot be reused
  reserved "foo", "bar"; // Field names can also be reserved

  string name = 1;
  // Field 2 was removed
  int32 count = 3;
  // Field 15 was removed
}

Enumerations (enum)

  • Define a set of named constant values.
  • The first defined value must be zero in proto3.
  • Values must be 32-bit integers.
  • Can optionally allow aliasing if multiple names map to the same number.
enum Status {
  STATUS_UNSPECIFIED = 0; // Zero value first
  PENDING = 1;
  RUNNING = 2;
  COMPLETED = 3;
  FAILED = 4;
}

enum Sharing {
  option allow_alias = true; // Allow multiple names for the same value
  UNKNOWN = 0;
  PUBLIC = 1;
  PRIVATE = 2;
  SECRET = 2; // Alias for PRIVATE
}

Nested Types

  • Messages and enums can be defined inside other messages.
message Outer {
  message Inner { // Nested message
    int64 value = 1;
  }
  enum InnerEnum { // Nested enum
    INNER_DEFAULT = 0;
    INNER_VALUE = 1;
  }

  Inner inner_field = 1;
  InnerEnum enum_field = 2;
}

// Usage outside:
// Outer.Inner my_inner = 1; // Reference using Outer.Inner
// Outer.InnerEnum my_enum = 1;

Maps

  • Define associative maps (dictionaries). Cannot be repeated.
  • map<key_type, value_type> map_field = field_number;
  • key_type can be any integer or string type. Cannot be float, double, bytes, enum, or message types.
  • value_type can be any type except another map.
message Config {
  map<string, string> settings = 1; // Map from string to string
  map<int32, Project> projects_by_id = 2; // Map from int32 to Project message
}

Oneof

  • Defines a set of fields where at most one field can be set at a time. Setting one field automatically clears the others in the oneof.
  • Fields within a oneof share memory and cannot be repeated.
message Result {
  oneof data {
    string success_message = 1;
    int32 error_code = 2;
    bytes raw_output = 3;
  }
  // Fields outside the oneof are independent
  google.protobuf.Timestamp timestamp = 4;
}

Services (service, rpc)

  • Define RPC service interfaces. The Protobuf compiler can generate client and server stubs.
  • Requires request and response message types.
  • Methods can accept/return streams using the stream keyword.
// Define the service
service SearchService {
  // Simple RPC: takes SearchRequest, returns SearchResponse
  rpc Search (SearchRequest) returns (SearchResponse);

  // Server-side streaming RPC
  rpc SearchStreamResponse (SearchRequest) returns (stream SearchResponse);

  // Client-side streaming RPC
  rpc SearchStreamRequest (stream SearchRequest) returns (SearchResponse);

  // Bidirectional streaming RPC
  rpc SearchBidiStream (stream SearchRequest) returns (stream SearchResponse);
}

// Define request and response messages
message SearchRequest {
  string query = 1;
  int32 page_number = 2;
}

message SearchResponse {
  repeated Result results = 1;
  int32 total_count = 2;
}

message Result {
  string url = 1;
  string title = 2;
}

Packages & Imports

  • package my.package.name;
    • Declares a namespace for the .proto file to prevent naming conflicts between types.
    • Affects generated code namespaces/packages.
  • import "path/to/other.proto";
    • Allows using definitions (messages, enums) from another .proto file.
    • protoc needs to be able to find the imported file via the -I / --proto_path argument.
  • import public "path/to/other.proto";
    • Publicly imports definitions. Anyone importing your file also implicitly imports the publicly imported file's definitions. Use sparingly.
  • Well-Known Types: Common useful types provided by Google. Need to be imported.
    • import "google/protobuf/timestamp.proto"; (Timestamp)
    • import "google/protobuf/duration.proto"; (Duration)
    • import "google/protobuf/wrappers.proto"; (Wrappers for scalars like StringValue, Int32Value - useful for distinguishing unset from default)
    • import "google/protobuf/struct.proto"; (Arbitrary JSON-like structure)
    • import "google/protobuf/any.proto"; (Container for arbitrary message type)
    • import "google/protobuf/empty.proto"; (Empty message type)
    • import "google/protobuf/field_mask.proto"; (Field mask for partial updates)

Options

  • Annotations that provide metadata or influence code generation. Can be file-level, message-level, field-level, enum-level, etc.
  • Defined using option option_name = value;.
// File-level options
option java_package = "com.example.myproject.protos";
option java_multiple_files = true;
option go_package = "example.com/myproject/protos";

message MyMessage {
  option deprecated = true; // Mark message as deprecated

  string old_field = 1 [deprecated = true]; // Mark field as deprecated
  int32 id = 2;

  // Custom options (requires defining the option type first)
  // option (my_project.custom_option) = "some_value";
}

Generating Code (protoc)

  • The Protocol Buffer Compiler (protoc) generates code for your target language(s).
  • Basic Command:
    protoc --proto_path=IMPORT_PATH --<lang>_out=DST_DIR path/to/your_proto_file.proto [path/to/another.proto]
    
  • Arguments:
    • --proto_path=IMPORT_PATH or -I=IMPORT_PATH: Specifies the directory where protoc should look for imported .proto files (can be specified multiple times). Often the current directory (.).
    • --<lang>_out=DST_DIR: Specifies the output directory for the generated code for a specific language (<lang>).
      • Common <lang> values: cpp, csharp, java, kotlin, objectivec, php, python, ruby, go, js (JavaScript).
      • Language-specific options can often be passed after _out:, e.g., --js_out=import_style=commonjs,binary:.
    • path/to/your_proto_file.proto: The input .proto definition file(s).
  • Plugins: For languages not directly supported or for custom code generation (e.g., gRPC, Go plugins).
    • --plugin=protoc-gen-NAME=path/to/plugin
    • --NAME_out=DST_DIR
    • Example (gRPC Go): protoc --proto_path=. --go_out=. --go-grpc_out=. my_service.proto (requires protoc-gen-go and protoc-gen-go-grpc in your PATH).

Best Practices

  • Use proto3 syntax unless you need specific proto2 features (like required fields or explicit default values).
  • Use descriptive names for messages, fields, enums, and services.
  • Assign field numbers carefully and never reuse or change them. Use reserve for removed fields/numbers.
  • Use package declarations to avoid name collisions.
  • Prefer optional keyword in proto3 when you need to distinguish between a field not being set and being set to its default value (0, false, empty string).
  • Use oneof for fields where only one can be present.
  • Keep .proto files focused on data structure; avoid putting business logic in generated code.
  • Evolve schemas carefully: only add new fields or mark existing fields as deprecated. Changing types or numbers breaks compatibility.

FAQ

Is there a size limit of a proto message?

According to the official Protocol Buffers documentation, any proto message in its serialized form must be smaller than 2 GiB (Gibibytes).  

Many implementations will refuse to serialize or parse messages that meet or exceed this 2 GiB limit. This limit often stems from the use of 32-bit integers for size calculations within the libraries.

Recommendation: keep individual protobuf messages relatively small.