Running Median II¶

Title: Find the Running Median
Link: https://www.hackerrank.com/challenges/find-the-running-median/

The Complexity Conundrum¶

As we saw with the previous examples, “Complexity Killed The Cat”, because as soon as we did not have an efficient insertion method to keep the container sorted, even our Floating Iterator approach was not enough. Even if it reduced the operations to calculate the running media to a single iterator increment/decrement operation and not always.

The only other way to reduce complexity may be to look at a container hosting a “RandomAccessIterator”. This is the only iterator that can “randomly” jump and therefore must not be incremented n times (expensive) to reach the expected destination. A single operation suffices.

Let us give std::vector<T> a try. The problem that one can foresee is clear:

An insertion is expensive.
Re-allocations and copies to fit new and old elements in a contiguous array.

Here is the code and it beats all test cases, when not using CASE1.

#include <algorithm> // std::cout/cin
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator, std::back_inserter
#include <vector> // std::vector

// Main
int
main(int, char *[]) {
    auto in = std::istream_iterator<int>{std::cin}; // input iterator
    auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
    std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
    auto v = std::vector<int>{};
    for(auto t = *in++, odd = 1; t--; odd = not odd) {
        auto val = *in++; // fetch next val
#ifdef CASE1
        const auto fgreater = [&val](const auto &x) { return val > x; };
        auto idx = std::find_if(v.begin(), v.end(), fgreater);
#else
        constexpr auto fgreater = std::greater<int>{};
        auto idx = std::lower_bound(v.begin(), v.end(), val, fgreater);
#endif
        v.insert(idx, val); // insert val just before idx
        auto left = static_cast<int>((v.size() - 1) / 2);
        auto outval = static_cast<double>(v[left]);
        if (not odd) // median is avg of 2 values if size is even
            outval = (outval + static_cast<double>(v[left + 1])) / 2;
        *out++ = outval;
    }
    return 0;
}

GitHub: 16-running-median-ii/running-median-ii-01.cpp

As it was the case with the std::list implementation we need to look for the insertion point to keep things always sorted. The “good case” is to use std::lower_bound. If you wonder why that is the case, it is because it uses the goodies from RandomAccessIterator to gives us the insertion point by repeatedly partitioning the range, until it finds its target.

The “bad case” mimics what we did with std::list by using std::find to get the insertion point. It works but it is again so expensive that complexity-limited test cases will be failed.

After that the contiguous nature of std::vector offers us another advantage: we do not need to float an iterator along the running median. Using the operator [] is enough to get to the running media point in just a single operation and to the next position in cases where the size is even (after the insertion).

Having to move items to make place for the inserted element and to reallocate and copy elements if needed is not enough to kill the benefits of the low complexity of the RandomAccessIterator that std::vector features.

Living On The Heap¶

This is not a matter of a second though and also not having second thoughts about the previous implementations. The breadcrumbs leading to this problem on HackerRank show this: “Data Structures > Heap > Find the Running Median”. It may well be that we missed the entire point of the challenge by thinking too much outside of the box with our std::multiset solution and focusing then on the complexity with the additional std::vector approach.

The Heap-Wish is fulfilled with a new std::priority_queue implementation.

#include <functional> // std::greater
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator, std::back_inserter
#include <queue> // std::priority_queue
#include <vector> // std::vector

// Main
int
main(int, char *[]) {
    auto in = std::istream_iterator<int>{std::cin}; // input iterator
    auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
    std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
    auto ql = std::priority_queue<int>{}; // left heap / right heap (inverted)
    auto qr = std::priority_queue<int, std::vector<int>, std::greater<int>>{};
    for(auto t = *in++, odd = 0; t--; odd = not odd) {
        odd ? ql.push(*in++)    : qr.push(*in++);
        odd ? qr.push(ql.top()) : ql.push(qr.top());
        odd ? ql.pop()          : qr.pop();
        auto rmed = static_cast<double>(ql.top());
        if (odd) // was odd, we made it even
            rmed = (rmed + static_cast<double>(qr.top())) / 2;
        *out++ = rmed;
    }
    return 0;
}

GitHub: 16-running-median-ii/running-median-ii-02.cpp

The queue acts like a heap, keeping always the largest element at the top. Unless, of course, a comparison function like std::greater<T> is given to keep the minimum element at the top.

Combining a left heap (max-at-the-top) and a right heap (min-at-the-top) we can simulate the Floating Iterator approach, because we have direct access to the elements that allow us to calculate the running median.

        odd ? ql.push(*in++)    : qr.push(*in++);
        odd ? qr.push(ql.top()) : ql.push(qr.top());
        odd ? ql.pop()          : qr.pop();

GitHub: 16-running-median-ii/running-median-ii-02.cpp:17:19

These three lines are the key. Push to one heap, depending on parity, and move (top + pop) to the other heap to keep them balanced (in “odd” cases the left heap has one element more as the right heap)

The one-liners are an “odd” construction with the formatting showing that we perform the same operations during each round, but with the roles of the heaps swapped. We would ideally use polymorphism to avoid having to know on which heap we operate. However, std::priority_queue<int> and its counterpart std::priority_queue<int, std::vector<int>, std::greater<int>> are different types. And we are not in the business of using void * to duck types.

Type Erasure Polymorphism¶

What we need is to create our own inheritance chain to make the types polymorphic, i.e., create a wrapper with templates that mimic the needed interface and then create our templated “subclasses”. This is called Type Erasure, although nobody is erasing actually anything. The type is being hidden behind another type. This new type does usually present the same interface, but it can actually adapt interfaces effectively mixing real different types. Our types are only different because of the comparison function, std::less vs std::greater. For all intent and purposes they are polymorphic and we use Type Erasure to present them as such.

// Type Erasure Idiom for the Heap
template<typename T>
struct Heap {
    virtual ~Heap() = default;

    auto *get_basepointer() { return this; }

    // abstract interface
    virtual void push(const T &) = 0;
    virtual T top() const = 0;
    virtual void pop() = 0;
    virtual size_t size() const = 0;
};

template<typename T, template <typename> typename Comp = std::less>
struct MyHeap : Heap<T> {
    using PrioQ = std::priority_queue<T, std::vector<T>, Comp<T>>;
    PrioQ m_prioq;

    void push(const T &x) override { m_prioq.push(x); }
    T top() const override { return m_prioq.top(); }
    void pop() override { m_prioq.pop(); }
    size_t size() const override { return m_prioq.size(); }
};

GitHub: 16-running-median-ii/running-median-ii-03.cpp:8:32

The only difference with regards to usual implementations is that the base class takes a template parameter T. That is to avoid having to specify the type taken by push and returned by top. Luckily we can directly calculate T in the MyHeap implementation, by looking at PrioQ::value_type. Both our heaps use the same T, and int, and will therefore be subclasses of our Heap<T> abstract base class. Notice that we have there a get_basepointer method, so that any subclass can get a pointer to Heap<T> to work to easily work with polymorphism.

    std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
    auto ql = MyHeap<int>{}; // left heap (max at top)
    auto qr = MyHeap<int, std::greater>{}; // right heap (min at top)
    auto *qlp = ql.get_basepointer(), *qrp = qr.get_basepointer();
    for(auto t = *in++; t--; std::swap(qlp, qrp)) {
        qlp->push(*in++); // push to get it sorted
        qrp->push(qlp->top()); // balance to other side moving top
        qlp->pop(); // complete move by removing the moved value
        auto rmed = static_cast<double>(qrp->top()); // side we really pushed to
        if (qlp->size() == qrp->size()) // same size -- use other side too
            rmed = (rmed + static_cast<double>(qlp->top())) / 2;
        *out++ = rmed;
    }

GitHub: 16-running-median-ii/running-median-ii-03.cpp:38:50

We create the heaps and get the pointers to them. The same auto allows us to get the pointer for both because we are getting a pointer to Heap<int>, i.e.: the same type. This is where polymorphism starts to show up. As it does in the for loop where we use std::swap(qlp, qrp), to swap the pointers in the variables. Were they not compatible, we could not do that.

The algorithm operates now without knowing if we are in an odd or even case and simply pushes to one heap, sorting the values, and the balances whatever is at the top to the other side. Instead of keeping tabs with the parity of the count, we simply compare the size of the polymorphic queues to know if we need just one value for the running median or the average of two.

This solution, as it previous non-polymorphic implementation, passes all the tests. Mission accomplished.

#include <functional> // std::greater
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator
#include <queue> // std::priority_queue
#include <vector> // std::vector

// Type Erasure Idiom for the Heap
template<typename T>
struct Heap {
    virtual ~Heap() = default;

    auto *get_basepointer() { return this; }

    // abstract interface
    virtual void push(const T &) = 0;
    virtual T top() const = 0;
    virtual void pop() = 0;
    virtual size_t size() const = 0;
};

template<typename T, template <typename> typename Comp = std::less>
struct MyHeap : Heap<T> {
    using PrioQ = std::priority_queue<T, std::vector<T>, Comp<T>>;
    PrioQ m_prioq;

    void push(const T &x) override { m_prioq.push(x); }
    T top() const override { return m_prioq.top(); }
    void pop() override { m_prioq.pop(); }
    size_t size() const override { return m_prioq.size(); }
};

// Main
int
main(int, char *[]) {
    auto in = std::istream_iterator<int>{std::cin}; // input iterator
    auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
    std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
    auto ql = MyHeap<int>{}; // left heap (max at top)
    auto qr = MyHeap<int, std::greater>{}; // right heap (min at top)
    auto *qlp = ql.get_basepointer(), *qrp = qr.get_basepointer();
    for(auto t = *in++; t--; std::swap(qlp, qrp)) {
        qlp->push(*in++); // push to get it sorted
        qrp->push(qlp->top()); // balance to other side moving top
        qlp->pop(); // complete move by removing the moved value
        auto rmed = static_cast<double>(qrp->top()); // side we really pushed to
        if (qlp->size() == qrp->size()) // same size -- use other side too
            rmed = (rmed + static_cast<double>(qlp->top())) / 2;
        *out++ = rmed;
    }
    return 0;
}

GitHub: 16-running-median-ii/running-median-ii-03.cpp

Variant (Pseudo-)Polymorphism¶

The question is whether we really need polymorphism with C++17 or we can use modern facilities to achieve the same effect without resorting to tricks like creating our own hierarchy to achieve Type Erasure.

We are only really interested in reaching the push, top and top methods of our heaps, the methods of the based std::priority_queue instances we use as heaps. Recall that when were addressing the “Types, Types, Types ” challenge, we resorted to using std::variant, a newcomer in C++17 described as a type-safe union that can therefore hold different types, always one at time.

That is going to be our key cornerstone to implement the variant-polymorphism: we will have an std::variant holding the types of both heaps, the left one (max-at-the-top) and the right one (min-at-the-top). We will then have two variants, each holding an instance of the heaps and we will use them to push, balance, calculate the running media and then swap them as we did before. Without having to define a chain hierarchy to implement Type Erasure.

#include <functional> // std::greater
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator, std::back_inserter
#include <queue> // std::priority_queue
#include <variant> // std::variant, std::visit
#include <vector> // std::vector

// Main
int
main(int, char *[]) {
    auto in = std::istream_iterator<int>{std::cin}; // input iterator
    auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
    std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
    using LHeap = std::priority_queue<int>;
    using RHeap = std::priority_queue<int, std::vector<int>, std::greater<int>>;
    using VHeap = std::variant<LHeap, RHeap>;
    auto vql = VHeap{LHeap{}}, vqr = VHeap{RHeap{}};
    const auto visitor = [&in, &out](auto &&ql, auto &&qr) {
        ql.push(*in++); // push to get it sorted
        qr.push(ql.top()); // balance to other side moving top
        ql.pop(); // complete move by removing the moved value
        auto rmed = static_cast<double>(qr.top()); // side we really pushed to
        if (ql.size() == qr.size()) // same size -- use other side too
            rmed = (rmed + static_cast<double>(ql.top())) / 2;
        *out++ = rmed;
    };
    for(auto t = *in++; t--; std::swap(vql, vqr))
        std::visit(visitor, vql, vqr);
    return 0;
}

GitHub: 16-running-median-ii/running-median-ii-04.cpp

The only drawback in the code above, if we could say so, is that the push-balance-calculate has to be placed in a lambda expression. This goes into std::visit, the functionality that will feed our heaps to the algorithm. Using std::visit is the key, because it passes the type being held in the std::variant without forcing us to know it beforehand or using the index.

Given that a lambda expression can declare its parameters to be of type auto, the combination is ideal.

The algorithm is exactly the same as in the previous solution but no pointers are involved.