Running Median¶
Title: Find the Running Median
Link: https://www.hackerrank.com/challenges/find-the-running-median/
A Set is not a "Set¶
The problem gives us first an int. This is the number of values to read. After we read each value, we have a “set” of integer values and we have to print the running median. I.e,: the value of the int located in the middle (of the sorted “set”) for an odd count of integers and the mean of the two values in the middle if the count is even.
Given that the problem is specifically talking about a “set”, let us use an std::set
to solve the problem. It is ideal because the “set” is always sorted. We will just insert the value and look for the values in the middle.
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator, std::back_inserter
#include <set> // std::set
// Main
int
main(int, char *[]) {
auto in = std::istream_iterator<int>{std::cin}; // input iterator
auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
std::cout << std::fixed << std::setprecision(1);
auto s = std::set<int>{};
for(auto t = *in++; t--;) { // number of elements until exhausted
s.insert(*in++);
auto left = static_cast<int>((s.size() - 1) / 2);
double val = static_cast<double>(*std::next(s.begin(), left));
if (not (s.size() % 2))
val = (val + *std::next(s.begin(), left + 1)) / 2;
*out++ = val;
}
return 0;
}
Easy, compact, down to 21 lines with no major efforts and it passes our local basic test.
-
We insert the next value
-
We get an iterator to the middle of the
std::set
(already auto-sorted) and the value located there -
If the size even we get another iterator to the next value, add it to our previous value and calculate the average
-
Output.
The devil, as always, is in the details. We have a second test with more integer values that fails to pass.
$ make 01 test01 | less
Creating build dir
g++ -std=c++17 -o build/running-median-01 running-median-01.cpp
./build/running-median-01 < running-median.input01 > /tmp/tmp.g7s4f2Zt3o.output
Solution 01: FAILED
---- Expected Output Begin ----
...
---- Expected Output End ----
---- Output Being ----
...
---- Output End ----
---- Diff Begin ----
--- running-median.output01
+++ /tmp/tmp.g7s4f2Zt3o.output
@@ -480,41 +480,36 @@
51079.5
51055.0
51079.5
+51079.5
51055.0
...
And the failure has to be given to the use of “set” in the problem description. The input can, and will, have repeated values. What was declared to be a “set” is actually not a set.
A MultiSet is a "Set¶
The standard library is generous and in this case offers us an std::multiset
that directly fulfills our requirements. It is a “set” that can host multiple repeated keys and it is still sorted.
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator, std::back_inserter
#include <set> // std::multiset
// Main
int
main(int, char *[]) {
auto in = std::istream_iterator<int>{std::cin}; // input iterator
auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
auto s = std::multiset<int>{}; // allow repeated keys
for(auto t = *in++; t--;) { // number of elements until exhausted
s.insert(*in++); // add to multi-set
auto left = static_cast<int>((s.size() - 1) / 2); // pos of left val
auto itleft = std::next(s.begin(), left); // iterator to left median val
auto outval = static_cast<double>(*itleft);
if (not (s.size() % 2)) // median is avg of 2 values if size is even
outval = (outval + *(++itleft)) / 2;
*out++ = outval;
}
return 0;
}
We have added 1 line of code to keep a reference to the itleft
iterator. This is to optimize the std::next(s.begin, distance)
calls. We had two of them and we only have one. The second can be a simple *(*++itleft)
to get the next position and dereference it.
This passes the basic local test and the additional test01
item with repeated values. But once again this is not the full solution. This code will fail in many test cases with a “Time Limit Exceeded” message.
The Floating Iterator in the Middle¶
Even if we have optimized away one of the std::next
calls the other one is terribly expensive. It may not seem so, but the iterator of an std::multiset
(and std::set
too) is a BidirectionalIterator and cannot be propelled to our required position with the operator +=
. It only supports ++
and --
. This means that std::next
behind the scenes has to execute ++
from the beginning for as many times as we want to move the iterator.
For a local case that is not a problem, but we may be talking of sets hosting millions of elements.
The solution is clear, we need an iterator that keeps floating in the middle of the set after we insert a value. Sometimes it will have to move 1 position forward or backwards after we insert and change the size of the set to be “odd” or “even”. But it will be a single ++
or --
operation.
We are changing the complexity of that bit from O(n)
to O(1)
, i.e., from linear to constant time (and sometimes we will not even move the iterator). The other complexity factors in the problem remain constant.
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator, std::back_inserter
#include <set> // std::multiset
// Main
int
main(int, char *[]) {
auto in = std::istream_iterator<int>{std::cin}; // input iterator
auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
auto s = std::multiset<int>{}; // allow repeated keys
auto rmed = s.end(); // running median iterator
for(auto t = *in++, odd = 1, val = 0, rval = 0; t--; odd = not odd) {
s.insert(val = *in++); // add to multi-set and store in "val"
rmed = (val >= rval) ? std::next(rmed, odd) : std::prev(rmed, not odd);
auto outval = static_cast<double>(rval = *rmed); // rval for next round
if (not odd) // median is avg of 2 values if size is even
outval = (outval + static_cast<double>(*std::next(rmed))) / 2;
*out++ = outval;
}
return 0;
}
Wow! Even adding that logic we have kept the same line count: 22
. Our trick is to use initially auto rmed = s.end()
to get the iterator we will float around the middle point. Choosing s.end
is just cosmetic to indicate we just need an iterator and not really the value behind it. At that point in time we could also have gone for s.begin()
. With an empty std::multiset
both point to the same position.
The key is the logic to keep the iterator afloat.
The one-liner could have been written as:
if (val >= rval)
if (odd)
rmed = std::next(rmed);
else if (not odd)
rmed = std::prev(rmed);
Translated: if the new inserted val is equal or greater than the midpoint we currently val (rval => running val
) and if the insertion has made the size of the container to be odd
, move the iterator to the right (forward) one position. Else and if the size if now even
, move the iterator one position to the left (backwards).
Yes, we used old tricks such as assignments when a parameter is passed to save a couple of lines. Example: s.insert(val = *in++);
. In this case val
is assigned the value of the dereferenced iterator and this value is also the parameter for insert
. The instantiation of the for
loop has also gotten a bit more of noise.
Not All Containers Can Be Floated¶
It would seem that if we apply the “Floating Iterator” technique to another container we would also succeed. But we will not. See it here applied to an std::list
, that also has an “BidirectionalIterator”.
#include <algorithm> // std::cout/cin
#include <iostream> // std::cout/cin
#include <iomanip> // std::setprecision, ...
#include <iterator> // std::istream/ostream_iterator, std::back_inserter
#include <list> // std::list
// Main
int
main(int, char *[]) {
auto in = std::istream_iterator<int>{std::cin}; // input iterator
auto out = std::ostream_iterator<double>{std::cout, "\n"}; // out iter
std::cout << std::fixed << std::setprecision(1); // fixed 1 decimal
auto l = std::list<int>{}; // allow repeated keys
auto rmed = l.begin(); // running median iterator, ref to end
auto inspos = rmed;
for(auto t = *in++, odd = 1, rval = 0; t--; odd = not odd) {
auto val = *in++; // fetch next val
if (val >= rval) {
auto fpos = [val](auto x) { return val <= x; }; // to find insert pos
inspos = std::find_if(std::next(rmed), l.end(), fpos); // find the pos
} else {
auto fpos = [val](auto x) { return val < x; }; // to find insert pos
inspos = std::find_if(l.begin(), rmed, fpos); // find the pos
}
l.insert(inspos, val); // insert val just before inspos
rmed = (val >= rval) ? std::next(rmed, odd) : std::prev(rmed, not odd);
auto outval = static_cast<double>(rval = *rmed);
if (not odd) // median is avg of 2 values if size is even
outval = (outval + static_cast<double>(*std::next(rmed))) / 2;
*out++ = outval;
}
return 0;
}
The motto here is: “Find the Difference”. And the difference is in the application of std::find_if
to locate the insertion point. Where our std::multiset
handled the insertion in the background, std::list
needs that we find the insertion point.
The complexity of insertion in std::multiset
instances is O(log n)
and not linear, because the implementation is done with “red-black trees” or a similar structure.
In our code above we have tried to reduce the O(n)
complexity, by only looking for the insertion point on the half before/after the floating iterator. But that will not change things by an order of magnitude.
All this means this solution will fail due to the complexity.
Summary¶
Let us summarize it here with a well-known saying: “Complexity killed the cat”. We had to move from an initial std::set
to an std::multiset
to account for multiple repeated keys and had to “invent” a floating iterator to triumph over the complexity requirements of the problem.
But as we have proven, the “Floating Iterator” is not the panacea that will cure all evil. An std::list
with its Bidirectional Iterator and linear nature (it is a list) cannot achieve the complexity target.
Let us see what the next chapter will bring to us.