The Multibase Data Format

The Multibase Data Format Protocol Labs

548 Market Street, #51207 San Francisco CA 94104 US +1 619 957 7606 juan@protocol.ai http://juan.benet.ai/

Digital Bazaar

203 Roanoke Street W. Blacksburg VA 24060 US +1 540 961 4469 msporny@digitalbazaar.com http://manu.sporny.org/

Security base-encoding base64 base58 Raw binary data is often encoded using a mechanism that enables the data to be included in human-readable text-based formats. This mechanism is often referred to as "base-encoding the data". Base-encoding is often used when expressing binary data in hyperlinks, cryptographic keys in web pages, or security tokens in application software. There are a variety of base-encodings, such as base32, base58, and base64. It is not always possible to differentiate one base-encoding from another. The purpose of this specification is to provide a mechanism to be able to deterministically identify the base-encoding for a particular string of data. This specification is a joint work product of Protocol Labs, the W3C Digital Verification Community Group, and the W3C Credentials Community Group. Feedback related to this specification should logged in the issue tracker or be sent to public-credentials@w3.org. .

This specification describes a forward-compatible data model for expressing raw binary data in a variety of base-encoding formats such as base32, base58. and base64. When text is encoded as bytes, we can usually use a one-size-fits-all encoding (UTF-8) because we're always encoding to the same set of 256 bytes. When that doesn't work, usually for historical or performance reasons, we can usually infer the encoding from the context. However, when bytes are encoded as text (using a base encoding), the choice of base encoding is often restricted by the context. Worse, these restrictions can change based on where the data appears in the text. In some cases, we can only use [a-z0-9]. In others, we can use a larger set of characters but need a compact encoding. This has lead to a large set of "base encodings", one for every use-case. Unlike when encoding text to bytes, we can't just standardize around a single base encoding because there is no optimal encoding for all cases. Unfortunately, it's not always clear what base encoding is used; that's where this specification comes in. It answers the question: Given data 'd' encoded into text 's', what base is it encoded with?

A multibase-encoded value follows a simple format:

base-encoding-character base-encoded-data The encoding algorithm is a single character value that is always the first byte of the data. The possible values for this field are provided in The Multibase Algorithm Registry.

The following is an encoding of "Hello World!" using the version of base-58 that utilizes the Bitcoin encoding character set:

z2NEpo7TZRRrLZSi2U The first byte (z) specifies the multibase encoding algorithm. The rest of the data specifies the value of the output of the multibase encoding algorithm.

&rfc2119; &rfc4648;

There are a number of security considerations to take into account when implementing or utilizing this specification. TBD

The multibase examples are chosen to show different encoding algorithms and different output lengths at play. The input test data for all of the examples in this section is:

Multibase is awesome! \o/

F4D756C74696261736520697320617765736F6D6521205C6F2F

BJV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP

zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt

MTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw==

The editors would like to thank the following individuals for feedback on and implementations of the specification (in alphabetical order):

The following initial entries should be added to the Multibase Algorithms Registry to be created and maintained at (the suggested URI) http://www.iana.org/assignments/multibase-algorithms: Algorithm Identifier (character) Status Specification identity0x00active8-bit binary (encoder and decoder keeps data unmodified) base20activebinary (01010101) base87activeoctal base109activedecimal base16factivehexadecimal base16upperFactivehexadecimal base32hexvactiveRFC 4648 case-insensitive - no padding - highest char base32hexupperVactiveRFC 4648 case-insensitive - no padding - highest char base32hexpadtactiveRFC 4648 case-insensitive - with padding base32hexpadupperTactiveRFC 4648 case-insensitive - with padding base32bactiveRFC 4648 case-insensitive - no padding base32upperBactiveRFC 4648 case-insensitive - no padding base32padcactiveRFC 4648 case-insensitive - with padding base32padupperCactiveRFC 4648 case-insensitive - with padding base32zhactivez-base-32 (used by Tahoe-LAFS) base36kactivebase36 [0-9a-z] case-insensitive - no padding base36upperKactivebase36 [0-9a-z] case-insensitive - no padding base58btczactivebase58 bitcoin base58flickrZactivebase58 flicker base64mactiveRFC 4648 no padding base64padMactiveRFC 4648 with padding - MIME encoding base64urluactiveRFC 4648 no padding base64urlpadUactiveRFC 4648 with padding proquintpactivePRO-QUINT https://arxiv.org/html/0901.4016 NOTE: The most up to date place for developers to find the table above is https://github.com/multiformats/multibase/blob/master/multibase.csv.