Here at Cyolo, we create software that handles network connections in a rather unique topology and our implementation receives connection objects from a dynamic array of channels. This problem has a known solution that we’ll talk more about later, but unfortunately, this solution doesn’t support our scale - a fact that sent us to explore this subject. This blog post will explain the challenge we encountered, the interesting solution we found, and our analysis of all the available solutions for this problem.
We've been using GO's reflect.Select
for some time now and it served us pretty well with a fair performance until we reached its limit of 65,535 cases, so we had to find another method without compromising on performance.
We needed a more robust solution that will, without compromising on performance, receive from a dynamic list that sometimes consists of hundreds of thousands of channels. There has always been the obvious solution of iterating through the array of channels, initiating a goroutine per such, and aggregating the results to an aggregation channel which we’ll receive from it eventually. But we didn’t want to go there because it’s obviously a resource-heavy solution and we knew that there must be an efficient solution.
It's a pretty common pattern in software optimization that the optimal solution - performance-wise - is the weirdest. This is why it is called “thinking outside of the box.” Such is the solution we came up with.
We statically created batches of select..case
statements for every result of the power of two of exponent up to 32 along with a function that routes to the different cases and aggregates the results through an aggregate channel. We call this method selectN
.
An example of such a batch:
Here's the logic of aggregating the first result from any number of channels using these kinds of select..case
batches:
We compared all three solutions: reflect.Select
, goroutine per channel, and our idea of static batches of select..case
. We used GO’s built-in benchmarking mechanism. It runs a piece of code multiple times and calculates the average of the result. The results shed a light on which method performs better while using fewer resources.
The goroutine method had the highest latency, due to the large amount of memory it uses. The latency of the other two methods were fairly low and similar to each other.
The goroutine method had the highest memory usage. reflect.Select
does not use a lot of memory until it reaches more than one hundred channels. selectN
is the method which uses the least amount of memory.
The previous two graphs explain this one. Goroutine requires the most memory allocations, followed by reflect.Select
. While selectN
is stable and quick.
reflect.Select
works pretty fast. Even though it uses a lot of allocations, it doesn’t consume too much memory. Given a small number of channels, it is the fastest method, but given more than 100 channels, it becomes slower than our selectN
of different batches of channels. This is due to the growing number of variables and pointers that reflect.Select
uses.
The method of goroutine per channel with an aggregate channel performed the worst across all metrics, due to the large amount of memory it uses up.
The third idea of statically initiated batches of select cases outperformed all other methods. It especially excelled in memory usage, achieving up to 800% improvement over reflect.Select
. But, in terms of latency, it only gained 200% improvement compared to reflect.Select
It is resource-intensive in terms of the CPU and processing, which come at a cost.
The solution that we found achieved more than we expected, solved our problem and has already been running in our production servers for a few months now.
We are very happy to share with you our knowledge and we hope that you enjoyed reading about it.