How We Enabled Dynamic Channel Selection at Scale in GO

Here at Cyolo, we create software that handles network connections in a rather unique topology and our implementation receives connection objects from a dynamic array of channels. This problem has a known solution that we’ll talk more about later, but unfortunately, this solution doesn’t support our scale - a fact that sent us to explore this subject. This blog post will explain the challenge we encountered, the interesting solution we found, and our analysis of all the available solutions for this problem.

The Challenge

We've been using GO's reflect.Select for some time now and it served us pretty well with a fair performance until we reached its limit of 65,535 cases, so we had to find another method without compromising on performance.

We needed a more robust solution that will, without compromising on performance, receive from a dynamic list that sometimes consists of hundreds of thousands of channels. There has always been the obvious solution of iterating through the array of channels, initiating a goroutine per such, and aggregating the results to an aggregation channel which we’ll receive from it eventually. But we didn’t want to go there because it’s obviously a resource-heavy solution and we knew that there must be an efficient solution.

Our Solution

It's a pretty common pattern in software optimization that the optimal solution - performance-wise - is the weirdest. This is why it is called “thinking outside of the box.” Such is the solution we came up with.

We statically created batches of select..case statements for every result of the power of two of exponent up to 32 along with a function that routes to the different cases and aggregates the results through an aggregate channel. We call this method selectN.

An example of such a batch:

Here's the logic of aggregating the first result from any number of channels using these kinds of select..case batches:

Gathering and Comparing the Results

We compared all three solutions: reflect.Select, goroutine per channel, and our idea of static batches of select..case. We used GO’s built-in benchmarking mechanism. It runs a piece of code multiple times and calculates the average of the result. The results shed a light on which method performs better while using fewer resources.

Latency

The goroutine method had the highest latency, due to the large amount of memory it uses. The latency of the other two methods were fairly low and similar to each other.

Memory Consumption

The goroutine method had the highest memory usage. reflect.Select does not use a lot of memory until it reaches more than one hundred channels. selectN is the method which uses the least amount of memory.

Memory Allocations

The previous two graphs explain this one. Goroutine requires the most memory allocations, followed by reflect.Select. While selectN is stable and quick.

Analysis

reflect.Select works pretty fast. Even though it uses a lot of allocations, it doesn’t consume too much memory. Given a small number of channels, it is the fastest method, but given more than 100 channels, it becomes slower than our selectN of different batches of channels. This is due to the growing number of variables and pointers that reflect.Select uses.

The method of goroutine per channel with an aggregate channel performed the worst across all metrics, due to the large amount of memory it uses up.

The third idea of statically initiated batches of select cases outperformed all other methods. It especially excelled in memory usage, achieving up to 800% improvement over reflect.Select. But, in terms of latency, it only gained 200% improvement compared to reflect.Select It is resource-intensive in terms of the CPU and processing, which come at a cost.

Last Note

The solution that we found achieved more than we expected, solved our problem and has already been running in our production servers for a few months now.

We are very happy to share with you our knowledge and we hope that you enjoyed reading about it.

Aviv Klasquin Komissar

Author

Aviv Klasquin Komissar is a member of the R&D team at Cyolo.

By Challenge

By Industry

Resources

Featured content

Cyolo Unveils Major New Capabilities, Expanding Secure Remote Access Coverage for OT and Cyber-Physical Systems

Can Cybersecurity Drive Growth? The Strategic Role of Secure Remote Access in Manufacturing

The Manufacturers’ Guide to Secure Remote Access for OT

Privileged Access in OT: What It Is, Why It Matters, and Where to Begin

How We Enabled Dynamic Channel Selection at Scale in GO

Aviv Klasquin Komissar

The Challenge

Our Solution