Monday, November 26, 2012

The Importance of Function Objects

Function objects allow for freedoms that free functions do not. They are more compatible with higher order functions, composition, and generic programming. They are easier to use; extensible and convenient.  They are often even more efficient.


Higher Order Functions.


Write a function that adds two arguments of any given type.

template< class X, class Y >
constexpr auto add( X&& x, Y&& y ) 
    -> decltype( std::declval() + std::declval() )
{
    return std::forward(x) + std::forward(y);
}


Easy, right? Now pass that in to std::accumulate.

int main() {
    std::vector<int> v = { 0, 1, 2, 3, 4, 5 };
    int sum = std::accumulate( v.begin(), v.end(), 0, add );
    std::cout << "sum = " << sum << std::endl;
}

And compile!

fo.cpp:20:68: error: no matching function for call to 'accumulate(std::vector<int>::iterator, std::vector<int>::iterator, int, <unresolved overloaded function type>)'

Huh, GCC can't deduce the type of add. Really, the lexical name, add, refers to a whole set of possible template overloads. It could pick one from the set if we supplied some arguments, but not by the name alone. We can fix this by specifying the type arguments.

    int sum = std::accumulate( v.begin(), v.end(), 0, add<int&,int&> );

Why do we specify int& instead of int? If we specify int, then because of the quirks in perfect forwarding, signature is

    int add( int&&, int&& )

Since its inputs will be lvalues, this will fail to compile. Instead, it will look like this:

    int add( int& &&, int& && )

and the arguments will be forwarded to the addition as int&'s.

At this point, falling back on std::plus<int>() would make the most sense, but what if we don't want to have to specify the type? The proposal n3421 expresses this concern, and would allow for the syntax, std::plus<>(), which would be a function object version of add. Finally, that line would compile, but n3421 isn't standard yet. Why not do this?

struct Add {
    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( std::declval<Y>() + std::declval<X>() )
    {
        return std::forward<X>(x) + std::forward<Y>(y);
    }
};

constexpr auto add = Add();
 
...
 
int sum = std::accumulate( v.begin(), v.end(), 0, add ); 

Here, add would be essentially no different from an instance of std::plus<>, but because it is already instantiated, we can avoid much of the syntactic cruft.


Programming Compositionally.


Let's say we have a two-dimensional array

std::vector<int> v = { 1, 2, 3, 4, 5 };
std::vector<std::vector<int>> vv = { v, v, v };

and we want to find the sum of all its element. We might first write a small function to find the sum of a one-dimensional array.

template< class S >
int sum( const S& s ) {
    return std::accumulate( s.begin(), s.end(), 0, add );
}

With this, we could write a simple for loop to iterate over each element, adding the sums of each vector, but that's the whole point of accumulate! We could use std::transform to build a list of sums and accumulate that, but what's the point? The easiest thing to do is use function composition.

Previously, I have talked about function composition. I defined a type, Composition<F,G>, which passed its first argument to G and the result of that along with any following arguments to F.

std::accumulate expects a function where the first value is the accumulator (the sum) and the second value comes from the sequence (in this case: another std::vector). We want the sum of the vector, and then to add the results together. One could fairly trivially write a function that does this explicitly, but that will have to be done for every similar use-case. Patterns of composition can be abstracted away.

template< class F, class G > struct AccumRight {
    F f = F();
    G g = G();

    constexpr AccumRight() { }


    constexpr AccumRight( F f, G g) 
        : f(f), g(g) { }

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( f(std::declval<X>(), g(std::declval<Y>()) ) )
    {
        return f( std::forward<X>(x), g(std::forward<Y>(y)) );
    }
};

template< class F, class G, class A = AccumRight<F,G> >
constexpr A rAccumulation( F f, G g ) {
    return A( f, g );
}

Finally, we can write that function. How does it look if we use free functions?

sum = std::accumulate( vv.begin(), vv.end(), 0, 
                       rAccumulation(add<int&,int>,sum<std::vector<int>>) );

Notice how, since the right-hand argument of add will be an rvalue (the result of sum), we can just tell it int. Now, how about function objects?

sum = std::accumulate( vv.begin(), vv.end(), 0, 
                       rAccumulation(add,sum) );

Much nicer, no? Imagine more complex examples involving passing higher-order functions to higher-order functions. Without using such techniques, higher order functional programming in C++ would be daunting. Manually deducing types is difficult and error prone--sometime impossible due to implementation details, but the compiler can do it instead, if given the chance. Because of this, techniques in C++ that would otherwise be thought of as impossible become easy.


Finer Control.


std::get<N> is a nice function for working with pairs and tuples, but doesn't work on normal containers. Lets define a function object for that!

struct Selector {
    int i;
    
    Selector( int j=0 ) : i(j) { }
 
    template< class S >
    typename const S::value_type& operator () ( const S& s ) const {
        return *std::next( s.begin(), i );
    }
};
 
Selector first(0), third(2); 

int main() {
    std::vector<int> v = { 1, 2, 3, 4, 5 };
    std::cout << "first = " << first(v) << std::endl;
    std::cout << "third = " << third(v) << std::endl;
    std::cout << "fourth = " << Selector(third(v))(v) << std::endl;
}

This will output

    first =1
    third = 3
    fourth = 4

We typically think of functions as static, unchanging things, but Selector is dynamic. It could be used, for example, as a function returning the current player in a game by keeping its member, i, always up-to-date.

For another example, consider a function object, Printer, that prints its arguments to some file. We might write that like this:
  
namespace std {

template< class X >
string to_string( const std::vector<X>& v ) {
    string str = "[ ";
    for( const X& x : v ) str += to_string(x) + " ";
    str += "]";
    return str;
}

// Because the standard doesn't define this.
string to_string( const std::string& s ) {
    return s;
}

template< class T >
string to_string( const T& s ) {
    ostringstream ss;
    ss << s;
    return ss.str();
}

} // namespace std

struct Printer {
    std::string destination;
    
    Printer( std::string filename ) 
        : destination( std::move(filename) )
     {
     }
     
     void redirect( std::string filename ) {
         destination = std::move( filename );
     }
     
     template< class X >
     void operator () ( const X& x ) {
         std::ofstream s( destination, std::ios_base::app );
         s << std::to_string(x);
     }
};

int main() {
    std::vector<int> v = { 1, 2, 3, 4, 5 };
    std::vector<std::vector<int>> vv = { v, v, v };
    
    Printer prnt( "file1" );
    prnt( v );
    prnt.redirect( "file2" );
    prnt( vv );
}

Here, the dynamics come from being able to change the destination of our output at runtime. It outputs:

    file1: [ 1 2 3 4 5 ]
    file2: [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]


Extensibility.

I have talked before about generalizing associativity, transitivity, and other such things in function objects. Inheritance adds a new dimension to functions. Derivatives can benefit from functionality offered in base classes and base classes can benefit from their derivations via the curiously recurring template pattern.

For a different example, let's have the Printer class time-stamp its output.

#include <ctime>
struct StampedPrinter : Printer {
    StampedPrinter( std::string filename )
        : Printer( std::move(filename) )
    {
    }
    
    template< class X >
    void operator () ( const X& x ) {
        std::time_t t = std::time( nullptr );
        std::tm tm = *std::localtime( &t );
        
        std::string time = std::to_string(tm.tm_min) + ":" + 
                           std::to_string(tm.tm_sec) + " -- ";
                           
        Printer::operator()( time + std::to_string(x) + "\n" );
    }
};

By simply changing the type of prnt in the previous example to StampedPrinter, it will output:

    file1:
    6:46 -- [ 1 2 3 4 5 ]

    file2:
    6:46 -- [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]


And if run again,

    file1:
    6:46 -- [ 1 2 3 4 5 ]
    13:13 -- [ 1 2 3 4 5 ]

    file2:
    6:46 -- [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]
    13:13 -- [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]


In conclusion,

Function objects can do anything that free functions can do, but they cooperate better. They are more cohesive and orthogonal. I hope that these simple demonstrations have been thought provoking.

The code for this post: https://gist.github.com/4151880
See also: An interesting article that talks about a trampoline function that runs a recursive function object for tail-call optimization: http://fendrich.se/blog/2012/11/22/compile-time-loops-in-c-plus-plus-11-with-trampolines-and-exponential-recursion/