Understanding The Memcached Source Code-Event Driven III

We continue examining the other two operations, i.e., create and delete, in the event driven context. As usual, we start with a command sent to a Memcached server. In fact, most of the logic involved in this post has been discussed before such as in LRU III and Event Driven II. Hence this post will only resolve the missing parts and linking points.

Continue Reading →

Understanding The Memcached Source Code-Event Driven II

In classic multithreading, blocking I/O operations constrain the maximum number of requests a server can handle. Hence asynchronous event driven model is used to eliminate the throughput bottleneck. As such, the synchronous and potentially slow process is divided into logic segments that are free of I/O, and are executed asynchronously. When it comes to asynchronization, extra space is required to store contexts. This is because the logic segments, that could be associated with different sessions, are executed in an interleaved way. For instance, in the case when asynchronization is implemented (emulated) using synchronous multithreading, the “extra space” is in the form of thread stack. Whilst contexts are maintained in user land in event driven.

Continue Reading →

Understanding The Memcached Source Code - Event Driven I

In classic multithreading, large amounts of slow and blocking operations, mostly, I/O, can easily drain out available thread resources, which severely constrains the maximum number of requests a server can handle per unit time. More specific, threads are scheduled out and put into sleep in the middle of procedures that contain blocking I/O, despite piling up requests packets queuing within the network stack. In such situation, server side will show low throughput, low CPU saturation and high latency.

Continue Reading →

Understanding The Memcached Source Code - LRU III

In previous posts, we have discussed different facets of an item, i.e., slab, hash map and LRU list as well as their associated (CRUD) methods, which build up the internal procedures and perform client requests after the corresponding commands are parsed by the drive machine. This time we will go through those procedures by issuing telnet commands to a Memcached instance and see how the discussed modules work together on various item operations. We will also see the whole picture of LRU lists that maintain the property of ‘least recently used’ in accordance to those operations.

Continue Reading →

Understanding The Memcached Source Code - LRU II

More often than not, the LRU algorithm is combined with a hash map, and is referred to as a LRU cache. In a LRU-cache, the hash map enables fast accessing of cached objects; and LRU avoids the cache to grow infinitely by marking expired, or so called, least recently used objects. This time we examine the memcached‘s implementation of hash map.

Continue Reading →

Understanding The Memcached Source Code - Slab II

This time we continue examining how slabs memory is allocated. Firstly we look at the two arguments for slabs_init, which were passed over in the previous article. The first one is settings.maxbytes. It limits the overall memory that can be used by the memcached instance. In slabs_init, the value of settings.maxbytes is assigned to the global variable mem_limit which will be used very soon. The other argument is preallocate. It determines whether to preallocate slab for each slab class. This argument is toggled with L command line argument.

Continue Reading →

Understanding The Memcached Source Code - Slab I

slab allocator (I - this article , II , III) is the core module of the cache system, which largely determines how efficient the bottleneck resource, memory, can be utilized. The other 3 parts, namely, LRU algorithm (I , II , III) for entry expiration; and an event driven model (I , II , III) based on libevent; and the consistent hashing (not complete) for data distribution, are built around it. Variants of slab allocator is implemented in other systems, such as nginx and Linux kernel, to fight a common problem called memory fragmentation. And this article will, of course, focus on Memcached‘s implementation of the algorithm. memcached version: 1.4.28 Firstly, let’s answer some questions.

Continue Reading →

setsockopt, TCP_NODELAY and Packet Aggregation I

Latency, instead of throughput, is found as the system bottleneck more often than not. However, the TCP socket enables a so-called nagle algorithm by default, which delays an egress packet in order to coalesces it with one that could be sent in the future, into a single TCP segment. This effectively reduces the number of TCP segments and the bandwidth overhead used by the TCP headers, whilst potentially imposes latency for every network request (response) being sent. Lock, and his temperamental brother, Block, are the two notorious villains in the world of programming. In the beginning, they always show up to assist. But sooner or later, they will kick your back-end like really hard. When I consider about nagle algorithem, it seems to me another scenario involving block operations which are meant to be helpful. So I decide to put hands on a keyboard to test if I am wrong. Software setupClient OS: Debian 4.9.88Server OS (LAN & WAN): Unbutu 16.04gcc: 6.3.0 Hardware (or VM) setupServer (LAN): Intel® Core™2 Duo CPU E8400 @ 3.00GHz × 2, 4GBServer (WAN): t2.micro, 1GB

Continue Reading →