Protocol Buffers (Protobuf) Cheatsheet
Core Concepts
.proto
Files: Define the data structure (messages) and services.- Messages: The primary data structure unit, similar to a class or struct. Contains typed fields.
- Fields: Named and typed components within a message. Each field has a unique number.
- Field Numbers: Unique numeric identifier for each field within a message (1, 2, 3,...). Crucial for binary encoding and backward/forward compatibility. Cannot be changed once in use.
- Scalar Types: Basic data types like integers, floats, booleans, strings, bytes.
- Enums: Define a set of named constants.
- Compiler (
protoc
): Generates data access classes/structs in your chosen language from.proto
definitions. - Wire Format: Compact binary encoding optimized for size and speed. Not self-describing (needs the
.proto
definition to interpret).
Basic Syntax (.proto
File)
// Specify the syntax version (proto3 is recommended)
syntax = "proto3";
// Optional: Define a package to prevent name clashes
package my.project.protos;
// Optional: Import definitions from other .proto files
import "google/protobuf/timestamp.proto";
// Define a message structure
message Person {
// Field definition: type name = field_number;
string name = 1;
int32 id = 2; // Unique ID number for the Person. Cannot be changed.
string email = 3;
bool is_active = 4;
// Enum defined inside the message (scoped)
enum PhoneType {
PHONE_TYPE_UNSPECIFIED = 0; // Zero value must be first for proto3 enums
MOBILE = 1;
HOME = 2;
WORK = 3;
}
// Repeated field (list/array) of another message type
repeated PhoneNumber phones = 5;
// Use an imported type
google.protobuf.Timestamp last_updated = 6;
}
// Define another message type
message PhoneNumber {
string number = 1;
Person.PhoneType type = 2; // Use the nested enum type
}
// Define an enum at the top level
enum Corpus {
CORPUS_UNSPECIFIED = 0;
UNIVERSAL = 1;
WEB = 2;
IMAGES = 3;
LOCAL = 4;
NEWS = 5;
PRODUCTS = 6;
VIDEO = 7;
}
// Comments use C++/Java style
// Single line comment
/*
Multi-line comment
*/
Scalar Data Types
.proto Type | Notes | C++ Type | Java Type | Python Type | Go Type | C# Type |
---|---|---|---|---|---|---|
double |
64-bit float | double |
double |
float |
float64 |
double |
float |
32-bit float | float |
float |
float |
float32 |
float |
int32 |
32-bit integer (variable encoding) | int32_t |
int |
int |
int32 |
int |
int64 |
64-bit integer (variable encoding) | int64_t |
long |
int or long |
int64 |
long |
uint32 |
32-bit unsigned integer | uint32_t |
int |
int or long |
uint32 |
uint |
uint64 |
64-bit unsigned integer | uint64_t |
long |
int or long |
uint64 |
ulong |
sint32 |
Signed 32-bit integer (ZigZag encoding) | int32_t |
int |
int |
int32 |
int |
sint64 |
Signed 64-bit integer (ZigZag encoding) | int64_t |
long |
int or long |
int64 |
long |
fixed32 |
Fixed 32-bit integer (always 4 bytes) | uint32_t |
int |
int |
uint32 |
uint |
fixed64 |
Fixed 64-bit integer (always 8 bytes) | uint64_t |
long |
int or long |
uint64 |
ulong |
sfixed32 |
Signed Fixed 32-bit integer | int32_t |
int |
int |
int32 |
int |
sfixed64 |
Signed Fixed 64-bit integer | int64_t |
long |
int or long |
int64 |
long |
bool |
Boolean true/false | bool |
boolean |
bool |
bool |
bool |
string |
UTF-8 encoded string | std::string |
String |
str (py3) unicode (py2) |
string |
string |
bytes |
Arbitrary sequence of bytes | std::string |
ByteString |
bytes |
[]byte |
ByteString |
- Variable Encoding (
int32
,int64
, etc.): Uses fewer bytes for smaller numbers. - ZigZag Encoding (
sint32
,sint64
): More efficient for encoding negative numbers than standardint32
/int64
. - Fixed Encoding (
fixed32
, etc.): Always uses a fixed number of bytes. More efficient than variable encoding if values are often large.
Field Rules (Proto3)
- Implicit
optional
: Fields are optional by default. If a field is not set, it takes its default value (0 for numbers, empty string for strings, false for bools, empty for bytes/repeated/maps, zero enum value). The encoder might omit default-value fields. repeated
: The field can appear zero or more times (list/array).repeated string tags = 1; // A list of strings
optional
(Keyword): Explicitly marks a field as optional. Allows checking if the field was explicitly set, even if set to its default value. Useful for distinguishing between "not set" and "set to default (e.g., 0 or false)".optional int32 page_size = 2; // Can check if page_size was actually provided optional bool use_cache = 3; // Can differentiate not set vs explicitly set to false
Field Numbers
- Unique integers from 1 to 2^29 - 1.
- Cannot use numbers 19000 through 19999 (reserved).
- Numbers 1 to 15 use 1 byte for encoding (field number + wire type). Use them for frequently set fields.
- Numbers 16 to 2047 use 2 bytes.
- Crucial: Once a field number is used, it should never be changed or reused for a different field in that message to maintain backward/forward compatibility. If deleting a field,
reserve
its number.
message MyMessage {
reserved 2, 15; // These field numbers cannot be reused
reserved "foo", "bar"; // Field names can also be reserved
string name = 1;
// Field 2 was removed
int32 count = 3;
// Field 15 was removed
}
Enumerations (enum
)
- Define a set of named constant values.
- The first defined value must be zero in proto3.
- Values must be 32-bit integers.
- Can optionally allow aliasing if multiple names map to the same number.
enum Status {
STATUS_UNSPECIFIED = 0; // Zero value first
PENDING = 1;
RUNNING = 2;
COMPLETED = 3;
FAILED = 4;
}
enum Sharing {
option allow_alias = true; // Allow multiple names for the same value
UNKNOWN = 0;
PUBLIC = 1;
PRIVATE = 2;
SECRET = 2; // Alias for PRIVATE
}
Nested Types
- Messages and enums can be defined inside other messages.
message Outer {
message Inner { // Nested message
int64 value = 1;
}
enum InnerEnum { // Nested enum
INNER_DEFAULT = 0;
INNER_VALUE = 1;
}
Inner inner_field = 1;
InnerEnum enum_field = 2;
}
// Usage outside:
// Outer.Inner my_inner = 1; // Reference using Outer.Inner
// Outer.InnerEnum my_enum = 1;
Maps
- Define associative maps (dictionaries). Cannot be
repeated
. map<key_type, value_type> map_field = field_number;
key_type
can be any integer or string type. Cannot be float, double, bytes, enum, or message types.value_type
can be any type except another map.
message Config {
map<string, string> settings = 1; // Map from string to string
map<int32, Project> projects_by_id = 2; // Map from int32 to Project message
}
Oneof
- Defines a set of fields where at most one field can be set at a time. Setting one field automatically clears the others in the
oneof
. - Fields within a
oneof
share memory and cannot berepeated
.
message Result {
oneof data {
string success_message = 1;
int32 error_code = 2;
bytes raw_output = 3;
}
// Fields outside the oneof are independent
google.protobuf.Timestamp timestamp = 4;
}
Services (service
, rpc
)
- Define RPC service interfaces. The Protobuf compiler can generate client and server stubs.
- Requires request and response message types.
- Methods can accept/return streams using the
stream
keyword.
// Define the service
service SearchService {
// Simple RPC: takes SearchRequest, returns SearchResponse
rpc Search (SearchRequest) returns (SearchResponse);
// Server-side streaming RPC
rpc SearchStreamResponse (SearchRequest) returns (stream SearchResponse);
// Client-side streaming RPC
rpc SearchStreamRequest (stream SearchRequest) returns (SearchResponse);
// Bidirectional streaming RPC
rpc SearchBidiStream (stream SearchRequest) returns (stream SearchResponse);
}
// Define request and response messages
message SearchRequest {
string query = 1;
int32 page_number = 2;
}
message SearchResponse {
repeated Result results = 1;
int32 total_count = 2;
}
message Result {
string url = 1;
string title = 2;
}
Packages & Imports
package my.package.name;
- Declares a namespace for the
.proto
file to prevent naming conflicts between types. - Affects generated code namespaces/packages.
- Declares a namespace for the
import "path/to/other.proto";
- Allows using definitions (messages, enums) from another
.proto
file. protoc
needs to be able to find the imported file via the-I
/--proto_path
argument.
- Allows using definitions (messages, enums) from another
import public "path/to/other.proto";
- Publicly imports definitions. Anyone importing your file also implicitly imports the publicly imported file's definitions. Use sparingly.
- Well-Known Types: Common useful types provided by Google. Need to be imported.
import "google/protobuf/timestamp.proto";
(Timestamp)import "google/protobuf/duration.proto";
(Duration)import "google/protobuf/wrappers.proto";
(Wrappers for scalars likeStringValue
,Int32Value
- useful for distinguishing unset from default)import "google/protobuf/struct.proto";
(Arbitrary JSON-like structure)import "google/protobuf/any.proto";
(Container for arbitrary message type)import "google/protobuf/empty.proto";
(Empty message type)import "google/protobuf/field_mask.proto";
(Field mask for partial updates)
Options
- Annotations that provide metadata or influence code generation. Can be file-level, message-level, field-level, enum-level, etc.
- Defined using
option option_name = value;
.
// File-level options
option java_package = "com.example.myproject.protos";
option java_multiple_files = true;
option go_package = "example.com/myproject/protos";
message MyMessage {
option deprecated = true; // Mark message as deprecated
string old_field = 1 [deprecated = true]; // Mark field as deprecated
int32 id = 2;
// Custom options (requires defining the option type first)
// option (my_project.custom_option) = "some_value";
}
Generating Code (protoc
)
- The Protocol Buffer Compiler (
protoc
) generates code for your target language(s). - Basic Command:
protoc --proto_path=IMPORT_PATH --<lang>_out=DST_DIR path/to/your_proto_file.proto [path/to/another.proto]
- Arguments:
--proto_path=IMPORT_PATH
or-I=IMPORT_PATH
: Specifies the directory whereprotoc
should look for imported.proto
files (can be specified multiple times). Often the current directory (.
).--<lang>_out=DST_DIR
: Specifies the output directory for the generated code for a specific language (<lang>
).- Common
<lang>
values:cpp
,csharp
,java
,kotlin
,objectivec
,php
,python
,ruby
,go
,js
(JavaScript). - Language-specific options can often be passed after
_out:
, e.g.,--js_out=import_style=commonjs,binary:.
- Common
path/to/your_proto_file.proto
: The input.proto
definition file(s).
- Plugins: For languages not directly supported or for custom code generation (e.g., gRPC, Go plugins).
--plugin=protoc-gen-NAME=path/to/plugin
--NAME_out=DST_DIR
- Example (gRPC Go):
protoc --proto_path=. --go_out=. --go-grpc_out=. my_service.proto
(requiresprotoc-gen-go
andprotoc-gen-go-grpc
in your PATH).
Best Practices
- Use
proto3
syntax unless you need specific proto2 features (likerequired
fields or explicit default values). - Use descriptive names for messages, fields, enums, and services.
- Assign field numbers carefully and never reuse or change them. Use
reserve
for removed fields/numbers. - Use
package
declarations to avoid name collisions. - Prefer
optional
keyword in proto3 when you need to distinguish between a field not being set and being set to its default value (0, false, empty string). - Use
oneof
for fields where only one can be present. - Keep
.proto
files focused on data structure; avoid putting business logic in generated code. - Evolve schemas carefully: only add new fields or mark existing fields as
deprecated
. Changing types or numbers breaks compatibility.
FAQ
Is there a size limit of a proto message?
According to the official Protocol Buffers documentation, any proto message in its serialized form must be smaller than 2 GiB (Gibibytes).
Many implementations will refuse to serialize or parse messages that meet or exceed this 2 GiB limit. This limit often stems from the use of 32-bit integers for size calculations within the libraries.
Recommendation: keep individual protobuf messages relatively small.