Package org.webpki.cbor


package org.webpki.cbor

CBOR - Encoder, Decoder, Signatures, and Encryption

This package contains Java support for CBOR [RFC 8949].

Supported Primitives

The following table shows the currently supported CBOR primitives and their mapping to Java:
CBOR PrimitiveNotesJava Mapping Implementation
integer1longCBORInt
bignum1BigIntegerCBORBigInt
floating point2doubleCBORFloat
byte stringbyte[]CBORBytes
text stringStringCBORString
true/falsebooleanCBORBoolean
nullCBORNull
arrayCBORArray
mapCBORMap
tagCBORTag
  1. The distinction between unsigned and negative values is dealt with automatically.
  2. Floating point data covers the 16, 32, and 64-bit IEEE 754 variants. The encoded representation is determined by the size and precision of a value.

This implementation does not support CBOR "simple" values beyond true, false, null, and the three floating point variants.

Deterministic Encoding

For certain security related applications, it has been proven useful performing cryptographic operations like hashing and signatures, on "raw" CBOR data. To make this possible without additional processing, data must be in a stable form "on the wire". This can either be achieved by using the binary "as is", or through deterministic encoding. This section describes a variant of latter, making compliant CBOR systems less dependent on specific encoder and decoder designs.
It is in this context worth noting that XML and JSON do not support deterministic encoding.
Although APIs may be quite different compared to the API of this package, the deterministic encoding scheme described here is intended as a standard, not limited to specific platforms.
To facilitate mainstream adoption, the encoding scheme is aligned with current best practices for encoding CBOR primitives and should with moderate efforts work with most existing CBOR encoders and decoders. In fact, having a single way of encoding CBOR data should simplify both implementations and associated testing. The encoding scheme has been verified to also be usable in highly constrained systems, albeit requiring detailed knowledge of how a specific tool addresses encoding of CBOR objects. Note that a compliant implementation does not have to support all CBOR primitives, it is sufficient that the ones that actually are used by the associated applications, conform to this specification.
The encoding scheme adheres to section 4.2 of RFC 8949, but adds a few constraints (denoted by RFC+), where the RFC offers choices. The encoding rules are as follows:
  • RFC+: Floating point and integer objects must be treated as distinct types regardless of their numeric value. This is compliant with Rule 2 in section 4.2.2 of RFC 8949.
  • RFC: Integers, represented by the integer and bignum types, must use the integer type if the value is between -2
    64
    and 2
    64
    -1
    , otherwise the bignum type must be used. The following table holds a few sample values and their proper CBOR encoding:
    ValueEncoding
    000
    -120
    25518ff
    256190100
    -25638ff
    -257390100
    10995116277751b000000ffffffffff
    184467440737095516151bffffffffffffffff
    18446744073709551616c249010000000000000000
    -184467440737095516163bffffffffffffffff
    -18446744073709551617c349010000000000000000
    Note that integers must not be supplied with leading zero bytes (like 1900ff) unless the CBOR representation offers no alternative (like 1b000000ffffffffff).
    Note that the integer encoding scheme above does not always return the most compact representation; the value 1099511627775 (0xffffffffff) would actually yield two bytes less using the bignum type.
  • RFC+: Floating point data must use the shortest IEEE 754 variant and associated CBOR encoding. The following table holds floating point values needing special considerations as well as a small set of "edge cases":
    ValueEncoding
    0.0f90000
    -0.0f98000
    Infinityf97c00
    -Infinityf9fc00
    NaNf97e00
    Assorted Edge Cases
    -5.960464477539063e-8f98001
    -5.960465188081798e-8fab3800001
    65504.0f97bff
    65504.00390625fa477fe001
    65536.0fa47800000
    10.559998512268066fa4128f5c1
    10.559998512268068fb40251eb820000001
    3.4028234663852886e+38fa7f7fffff
    3.402823466385289e+38fb47efffffe0000001
    1.401298464324817e-45fa00000001
    5.0e-324fb0000000000000001
    -1.7976931348623157e+308fbffefffffffffffff
    Note that NaN "signaling" (like f97e01), must be flagged as an error.
    Note that the shortest encoding may result in subnormal numbers like f98001.
  • RFC: Map keys must be sorted in the bytewise lexicographic order of their deterministic encoding. Duplicate keys must be rejected.
  • RFC+: Since CBOR encoding according to this specification maintains type and data uniqueness, there are no specific restrictions or tests needed in order to determine map key equivalence. As an example, the floating point numbers 0.0 and -0.0, and the integer number 0 represent the distinct keys f90000, f98000, and 00 respectively.
  • RFC+: Deterministic CBOR according to this specification may also be provided in Diagnostic Notation.
Any deviation from the rules above will throw exceptions using the standard decoder (CBORObject.decode(byte[])). For more control of the decoding process including dealing with CBOR sequences, see: CBOR decoding options.

On output (CBORObject.encode()) deterministic encoding is always performed regardless of if CBOR data was parsed or created programmatically.

For maximum interoperability with other CBOR implementations, map key types should be limited to integer and text string, as well as not being mixed in the same map.

Input Data Validation

A properly designed system validates input data before acting upon it. This section describes how this can be achieved using this particular CBOR implementation.

During CBORObject.decode(byte[]), CBOR data is checked for well-formedness as well as by default, adhering to the determinism scheme.

After successful decoding, the CBOR data is provided as a CBORObject. For extracting the data of CBOR primitives in a Java compatible way, there are type specific access methods such as CBORObject.getInt() and CBORObject.getString(). For accessing structured CBOR objects, the CBORObject.getMap(), CBORObject.getArray(), and CBORObject.getTag() methods, return container objects which in turn facilitate access to individual CBOR objects of the structure.

If the underlying CBOR data type does not match the access method (like performing CBORObject.getInt() on a CBORBigInt), an exception is thrown. That is, this implementation performs strict type checking.

However, you typically also want to verify that CBORMap objects do not contain unexpected keys, or that CBORArray objects contain unread elements. This can be achieved by calling CBORObject.checkForUnread(), after all expected objects have been read. This method verifies that the current CBOR object (including possible child objects), have been accessed, otherwise an exception will be thrown.

Built-in cryptographic support classes like CBORValidator and CBORPublicKey perform strict type checking as well as verifying that there are no unexpected objects inside of their respective containers.

"Schema" Support

Although this package does not support a CBOR counterpart to XML Schema, similar functionality can be achieved using the programmatic constructs described in the previous section. For an example, turn to Typed Objects.

Cryptographic Support

To aid the use of cryptography, support for Signatures and Encryption is integrated in the package.

Diagnostic Notation

Creating CBOR data in diagnostic notation (as described in section 8 of RFC 8949), is provided by the CBORObject.toString() method.

However, through the CBORDiagnosticNotation class, CBOR data may also be provided in diagnostic (textual) notation, making CBOR useful for "config" and test data files as well.

By adhering to the Deterministic Encoding specification above, CBOR data can be bidirectionally converted between its native (binary) format and diagnostic notation without getting corrupted. Note though that text-binary-text "roundtrips" do not necessarily return identical text: 0x10 used as diagnostic notation input will return 16 as diagnostic notation output. Caveat: for reliable conversions, floating point values must be aligned with IEEE 754 encoding and rounding rules.

The following table shows how CBOR objects should be represented in diagnostic notation:
CBOR ObjectSyntaxNotesDescription
/ comment text / 7Multi-line comment. Multi-line comments are treated as whitespace and may thus also be used between CBOR objects.
# comment text 7Single-line comment. Single-line comments are terminated by a newline character ('\n') or EOF. Single-line comments may also terminate lines holding regular CBOR items.
integer {sign}{0b|0o|0x}n 1, 2 Arbitrary sized integers without fractional components or exponents. See CBOR integer encoding. Binary, octal, and hexadecimal notation is supported by prepending numbers with 0b, 0o, and 0x respectively. The latter also permit arbitrary insertions of '_' characters between digits to enable grouping of data like 0b100_000000001.
bignum
floating point {sign}n.n{n} 1, 2Floating point values must include a decimal point and an optional exponent. See CBOR floating point encoding.
NaNNot a number. See CBOR floating point encoding.
{sign}Infinity2 Infinity. See CBOR floating point encoding.
byte string h'hex data'3, 6 Byte data provided in hexadecimal notation. Each byte must be represented by two hexadecimal digits.
b64'base64 data'3, 6, 7 Byte data provided in base64 or base64URL notation. Padding with '=' characters is optional.
'text'4, 5, 7 Byte data provided as UTF-8 encoded text.
<< object >>7 Construct holding a CBOR object which is subsequently embedded in a byte string.
text string "text"4, 5UTF-8 encoded text string.
truetrue Boolean value.
falsefalse
nullnullNull value.
array[ objects ] Array with zero or more comma separated CBOR objects.
map{ key:value } Map with zero or more comma separated key/value pairs. Keys and values are expressed as CBOR objects.
tagn( object ) 1Tag holding a CBOR object.
,7Separator character for CBOR sequences.
  1. The letter n in the Syntax column denotes one or more digits.
  2. The optional {sign} must be a single hyphen ('-') character.
  3. Whitespace characters (' ', '\t', '\r', '\n') inside of string quotes are ignored.
  4. Whitespace characters ('\t', '\r', '\n') inside of string quotes become a part of the text. To avoid getting newline characters ('\n') included in multi-line text strings, a line continuation marker consisting of a backslash ('\') immediately preceding the newline may be used.
  5. Text strings may also include JavaScript compatible escape sequences ('\'', '\"', '\\', '\b', '\f', '\n', '\r', '\t', '\uhhhh').
  6. Zero-length strings ('') return byte strings of length zero.
  7. Only applicable for input. That is, CBORObject.toString() does not produce this item.