The second edition of IAB Europe’s Transparency and Consent Framework (TCF 2.0) has entered public beta. To help publishers and vendors achieve faster loading time in the browser, AudienceProject has built two entirely new TCF 2.0 consent string parsers with much-reduced sizes.

On March 31st, the IAB Transparency and Consent Framework (TCF) 2.0 entered public beta. Therefore, publishers and vendors around the world will start writing and parsing consent strings in the new format. This format replaces the old format used in TCF 1.1 and is not backwards compatible. New parsers will have to be embedded into existing solutions, covering all the languages commonly used in web stacks.

At the top of the stack, this is almost exclusively JavaScript. This was indeed the first language for which the official parser was released. However, this parser had a few shortcomings. Most importantly, its huge size for a production bundle.

For backend oriented languages the situation is different. We use Scala extensively in our backend, so we were looking for a JVM compatible parser. At the time of writing, the official IAB Consent String SDK for Java is yet to be made compatible with the new format. IAB has released a version 2.0 parser, but this project is still labelled as being in alpha release.

To address these issues, AudienceProject has built two entirely new TCF 2.0 consent string parsers.

The JavaScript parser

We started by building our own JavaScript parser, due to the need for a “slim” implementation with a minimal network (less than 1.5KB compressed) and memory footprint for the browser which would also work with NodeJS.

It is available on Github here.

The Java parser

Once the JavaScript parser was completed, we found the need for a TCF 2.0 compatible consent string parser for Java and took a slightly different approach.

It is available on Github here.

A High-Level Abstraction for Binary Structures

The consent string format specified in TCF 2.0 is a Base64 encoded binary block format. Parsing a binary block format by hand suffers from a number of pitfalls, that a developer implementing transparency/consent business logic should not have to worry about. It also makes the parser code difficult to read (and thus maintain) if the domain-specific content of the consent string is mixed up with binary parsing logic.

Enter Kaitai Struct:

The Kaitai Struct framework allowed us to specify the entire IAB consent string format (both version 1.1 and 2.0) in a declarative style using the Kaitai expression language, which is easily human-readable and reads much the same as the official specification of the format from IAB.

For example, take a look at the definition of the first four fields in the core string – and how they correspond one to one with the core string description on IAB’s GitHub page.

- id: version
type: b6
- id: created
type: b36
- id: last_updated
type: b36
- id: cmp_id
type: b12

Like this, the mapping from domain-specific content to actual, technical implementation almost becomes a matter of taste.

Using the Kaitai Struct Compiler, the declarative expression language is quickly turned into concrete Java classes. The Kaitai runtime dependency reads the consent string after decoding it from Base64 and outputs Java objects of the structured Java classes. From here on, the consent string is easy to work with from any language targeting the JVM (such as our own Scala backend).

In the library here, we have added a few extra domain-specific utilities and classes to make it easier to work with the consent string. We have also made one crucial implementation detail, which is to read the byte-stream output of the Base64 decoder as an unaligned bit-stream, in order to parse the TCF format correctly, as this was not the default behaviour of the Kaitai runtime.

Parsing in other languages

It would be possible to target multiple different languages using the exact same specification files that we used for the Java parser. The Kaitai Struct Compiler targets a wide array of languages – not only Java. When IAB makes a new standard and the format is updated, reflecting those changes is as easy as editing the specification files and running the Kaitai Struct Compiler to generate new sources.