Proto2 vs Proto3
Proto1 is deprecated.
Proto3 is a simplification of Proto2. Both Proto2 and Proto3 are active
Common
Proto2 and proto3 are wire compatible: the same construct in proto2 and proto3 will have the same binary representation. Which means they can reference symbols across versions and generate code that works well together.
Differences
Presence
- Proto2: supports
optional
natively, the wrapper types were recommended but should be avoided in new applications. - Proto3: originally did not support presence tracking for primitive fields. As of 2020, proto3 supports both
optional
fields which havehas_foo()
methods and "singular" fields, which do not. Be sure to useoptional
if your protocol requires knowledge of field presence.
Default values
Proto3 does not permit custom default values. All fields in proto3 have consistent zero defaults.
Required fields
Proto3 removes support for required
fields.
Enums Defaults
- Proto3:
enums
require an entry with the value0
to act as the default value. - Proto2:
enums
use the first syntactic entry in theenum
declaration as the default value where it is otherwise unspecified.
Enums Unrecognized
In languages with closed enums (ex. Java):
- all proto3 enums generate an
UNRECOGNIZED
entry to accommodate unknown enum values. proto3 setters prohibitUNRECOGNIZED
values, so a simple copy of an enum field from one proto to another will crash if the enum field value isUNRECOGNIZED
- Proto2 enums never represent unknown enum values, but instead place them in the unknown field set. A proto2 enum can have confusing behavior (ex. repeated fields report incorrect counts and are reordered in reserialization when an unknown value is encountered)
Enums cross reference
- A proto2 message can reference a proto3 enum or message
- A proto3 message cannot reference a proto2 enum due to differences in semantics.
Extensions / Any
Proto3 removes support for extensions
; instead use of Any
fields to represent untyped fields. The extensions mechanism is wire compatible with a normal field declaration whereas Any
is not, so a field cannot be changed to an Any
as the schema evolves, while it could be changed to an extension in proto2.
Any
is significantly more verbose on the wire as it uses a string based type_url
as a key while extensions use a varint encoded field number.
Parsed eagerly or lazily:
- Extensions (other than
MessageSet
) are parsed eagerly (and sometimes selectively if you provide a custom ExtensionRegistry) Any
is always parsed lazily. This delta in performance profile may be important for some applications (e.g. an Android app may prefer to parse messages off the UI thread).
String field validation
Protocol Buffer string fields have always been documented to be UTF-8
encoded.
- Proto2 does not validate that inbound / outbound bytes are indeed UTF-8 encoded.
- Proto3 validates that all string fields are appropriately
UTF-8
encoded during parsing and in byte-oriented setters.
This validation means that parsing string fields in proto3 is more CPU intensive and parse failures are possible when passed an improperly structured string field. The flipside is that eager validation ensures that the problem can be identified quickly and resolved at the source.
String field parsing
In Java, proto3 parses String
fields as UTF-8
eagerly whereas proto2 parses them lazily.
JSON support
Proto3 defines a canonical JSON specification for all features whereas there is no specification for various proto2 features like extensions. The behavior of proto2 features is thus implementation-dependent.
Others
- proto3 adds int min/max sentinels to C++ enums, preventing use of
-Werror
,-Wswitch
. - In proto3,
optional
fields cannot be changed torepeated
because that will cause old messages to be declared invalid. - it is unsafe to rename or change proto packages of any proto used in an Any proto. Extension resolution is numeric, like field numbers. Any proto resolution is stringy like stubby methods.