Developing an LRUCache in Go with Github

May 13, 2024

In the last post I talked about my experience testing Copilot for a week. In this one we are going to implement a data structure, an LRUCache, using AI assistants to see how it works.

Since I started with computers I have always been interested in data structures. I love to study them, understand them and know what real use cases they have. In the end, one of the things I love is optimization and thinking about how to make faster the algorithms we use on a daily basis without realizing it: image processing, video streaming, reading data from disk... there are endless examples I could bring up.

Then there are also the counterexamples, and is that in the end today many applications end up being "simple" CRUDs, and it is relatively easy to make them go slow and this is something that never ceases to amaze me, because in my case I value very much the optimization and care of certain parts of a product.

This is why I love to study data structures, and one that I have been wanting to implement for some time is a LRUCache. Which also tends to be an interview question on many occasions to see how a person reasons and what data structures he knows.

In the video that accompanies the post you can see the development of this data structure, using GitHub Copilot to draw your own conclusions, and you can access the code in the following repository.

What is a LRUCache?

An LRUCache is a data structure that is used to store and access a series of data, but limiting the space or capacity, so that if I reach the maximum of elements it is computationally easy to eliminate the less accessed elements. There is another variant which is the LFU but today I will not go there.

So the LRUCache is a structure that needs two data structures internally:

It uses a HashMap table to store the data, which is efficient when inserting and retrieving information.
It uses a LinkedList to order the elements it stores, so that the last element is always the one that has not been accessed and on the other hand, it is computationally easy to move an element to the top of the list when it is accessed.

Precisely for this reason it is a very efficient data structure for accessing and accessing the element we want to invalidate. If we talk about Big O notation, it is O(n) for storage, O(1) for access (thanks to the hashmap) and O(1) for accessing the least used element. As for the storage despite being O(n), as it has two data structures, we are duplicating some of the information.

The LRU algorithm

This data structure is relatively simple and has a multitude of applications today. The first one that comes to my mind is to have a web server, for example a page with Next.js, which is computationally expensive to render since it works with strings. So if our page has a lot of traffic, it will be in our interest to cache this data to save and reduce the amount of CPU needed.

So when we add layers of cache, for example a Redis, one policy that can be used is the LRU. And if you think about it it is a strategy that is really good despite how simple it is because, in most cases, what is accessed the most will be kept in memory while what is accessed the least can be removed. This is a very simple prediction algorithm but it has a good logical basis behind it.

This statement must be analyzed well because, if you do not have enough memory to have most of the data accessed, or the pattern of access to your information tends to be random, the cache may not be useful in these cases.

For example, think of a CDN server such as Cloudflare or Amazon Cloudfront. In a normal hard disk does not fit all the information, even in a server itself, so they will have the information distributed with access patterns, and possibly in these algorithms there will be a LRUCache that decides which information is the most accessed to have it "hot" and bring from other data servers that are not the edge that information that does not fit and that is accessed to a lesser extent.

In fact the same reasoning can be applied to the CPU caches when information must be brought from the disk.

Conclusion

In this case I want to use GitHub Copilot to analyze if the suggestions are really a help or not implementing something more complex than a CRUD, although it is difficult to evaluate if I alone, using or not copilot, go faster. Because I might become more productive thanks to the suggestions even if they are not correct. Read my previous post to know my opinion about GitHub Copilot.

Developing an LRUCache in Go with Github

What is a LRUCache?

The LRU algorithm

Conclusion

Ideas, tools & strategies to build your idea in record time