Skip to main content

Ground Up Functional API Design in C++

There are plenty of reasons why functional APIs look and feel different than more common object-oriented APIs. A tell-tale sign of a functional APIs is existence of a core abstraction and a set of methods with algebraic properties. The abstraction part is of course about being able to talk about just the necessary details and nothing more. The algebraic part is about being able to take one or more instances of the same abstraction and being able to create a new one following some laws. I.e., Lawfully composing smaller parts into a bigger thing that is an instance of the same abstraction the original instances were.

Composition by itself is nothing new to OO programmers. Composite, Decorator design patterns are all about putting together smaller pieces into something bigger. However, they miss out on the lawful part. Functional paradigm gives you extra guard rails so that you can bake in some well-understood mathematical properties into the abstraction.

I think most people with even basic OO experience "get" the abstraction part. But the algebraic part, however, needs some thinking, some training, and most importantly some well-made, simple to understand examples.

Speaking of examples, C++ has already adopted the ranges library as a technical specification (TS). At its core there are lawful algebraic properties. Understanding the functional principles from an industrial implementation is possible but harder. RxCpp is another example. Both libraries use operator overloading, which I think is a convenience and not a core part of the design. My focus today is understandability. So, here's my attempt to demystify the art of lawful algebraic api design with a simple example of data generators. See cpp-generators on github.


This talk presents two classic techniques from the functional domain — composable data generators and property-based testing — implemented in C++14 for testing a generic serialization and deserialization library (RefleX). It presents a systematic technique of constructing data generators from a mere random number generator and random type generation using compile-time meta-programming. The talk describes the laws of monoids, functors, applicative, and monads and how they are implemented in a simple to understand abstraction of data generators.

Check out Slides. Transcripts below for a quick read.

2:30
This is an attempt to demystify algebraic api design and buzz words like functor, monoid, and monads.

4:05
I wish the book Functional Programming in C++ by Manning had a chapter on algebraic api design like the material covered in this talk.

7:30
Property-based testing---the motivation behind data generators. Testing a serialization and deserialization library (Reflex)

12:00
The RefleX library---Compile-time reflection in C++ for DDS-XTypes

14:30
How to test RefleX?  What types and data should be used to test RefleX? We'll use property-based testing.

17:50
A code and type generator machinery that hammers the code with unthinkable tests. It is developer's nightmare but also the best friend.

Progression of the data generator abstraction...

20:00
A simple random number generator---a built-in library function

22:00 
A bool generator, an uppercase character generator---a custom function on top of the library function.

23:00
A generator of random numbers within a range---a function with some state (a closure)

26:00
A oneof generator from an std::initializer_list<t>.

28:00
Gen<t> template with a generate function. A named type for the function object. It's structurally same as a closure. So it's not more powerful.

29:00
Is a function object (think closures) a sufficient abstraction for all kinds of generators we might need? No.

31:30
Now a much farther jump into abstraction level---a composable generator---an algebraic generator with methods such as map, concat, reduce, zip, where.

33:00
Composition is not going to be arbitrary. It will be governed by some well-known rules (laws). These are guard rails.

34:45
Mapping over a generator. Composition of a primitive generator and a mapping function gives another generator.

36:48
An excellent question from the audience. Why is the while loop infinite? The reason is that we've not seen termination as a requirement for the generators. It later become important to make sure the generator abstraction is a well-behaved monoid. To satisfy the monoid laws, especially the identity law, we've to have an empty generator and a way to convey that a generator ended. Hence we introduce termination in the core abstraction. Guard rails and all...

37:30
Introducing Functor---you can map over it. Functor laws. Composition law and the identity law.

40:30
Implementing the map function for generator

42:30
Overloading lvalue and rvalue reference qualified function (e.g., map) for fluent api. Algebraic APIs commonly implemented using the Fluent Interface technique generates a number of temporary objects. The overhead of temporary objects can be eliminated using move semantics. You can overload functions specifically for temporary objects (rvalues) using rvalue reference qualification.

44:00
Introducing Monoid---The fundamental ability to combine smaller generators to create a bigger generator. Identity law---appending an empty generator gives back the original generator. Without an empty generator it's not a monoid because identity law is not satisfied yet. A notion of termination must be introduced. We do that by just throwing an exception std::out_of_range.

50:00
Associative law of Monoid. Grouping does not matter.

53:00
Introducing Applicatives: Zipping generators. Implementation of zip_with. Generator is an applicative.

56:30
Introducing Monad to handle dependent generators. Generator of generators come up often. concat_map flattens nested generators. Monad laws.

1:03:00
Recapping algebraic api design based on mathematical abstractions. Algebra = a first-class lawful abstraction + compositional api.

1:06:00
Catagory theoretic intuition is much a deeper thing. I'm not sure I get it. For example, intuitively understanding "Monad is just a monoid in the category of endofunctors".

1:08:00
We can generate random types at compile-time if we can generate random numbers at compile-time.

1:10:00
Generating random numbers at compile-time. A 16-bit Linear Feedback Shift Register (LFSR) constexpr function.

constexpr int LFSR_bit(int prev) { 
  return (((prev >> 0) ^ (prev >> 2) ^ 
           (prev >> 3) ^ (prev >> 5)) & 1); 
} 

constexpr int LFSR(int prev) { 
  return ((prev >> 1) | (LFSR_bit(prev) << 15)); 
}

1:13:30
Random type generation at compile-time using a recursive TypeMap and an LFSR. See type_generator.

1:20:00
Generating a random 51 levels deep nested tuple or worse a tuple of static size 2.7 MB. Test your generic library with monster types.

Comments

Popular posts from this blog

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together.

It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array?

int native[3][4];
std::array<std::array<int, 3>, 4> arr;

No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is:

std::array<std::array<int, 4>, 3> arr;

That's quite annoying for two r…

Covariance and Contravariance in C++ Standard Library

Covariance and Contravariance are concepts that come up often as you go deeper into generic programming. While designing a language that supports parametric polymorphism (e.g., templates in C++, generics in Java, C#), the language designer has a choice between Invariance, Covariance, and Contravariance when dealing with generic types. C++'s choice is "invariance". Let's look at an example.
struct Vehicle {}; struct Car : Vehicle {}; std::vector<Vehicle *> vehicles; std::vector<Car *> cars; vehicles = cars; // Does not compile The above program does not compile because C++ templates are invariant. Of course, each time a C++ template is instantiated, the compiler creates a brand new type that uniquely represents that instantiation. Any other type to the same template creates another unique type that has nothing to do with the earlier one. Any two unrelated user-defined types in C++ can't be assigned to each-other by default. You have to provide a c…

Want speed? Use constexpr meta-programming!

It's official: C++11 has two meta-programming languages embedded in it! One is based on templates and other one using constexpr. Templates have been extensively used for meta-programming in C++03. C++11 now gives you one more option of writing compile-time meta-programs using constexpr. The capabilities differ, however.

The meta-programming language that uses templates was discovered accidently and since then countless techniques have been developed. It is a pure functional language which allows you to manipulate compile-time integral literals and types but not floating point literals. Most people find the syntax of template meta-programming quite abominable because meta-functions must be implemented as structures and nested typedefs. Compile-time performance is also a pain point for this language feature.

The generalized constant expressions (constexpr for short) feature allows C++11 compiler to peek into the implementation of a function (even classes) and perform optimizations i…