KLabGames Tech Blog - English

KLab develops and provides service for a variety of smartphone games. The world of mobile games is growing by leaps and bounds, and the development process calls for a different set of skills than traditional console games.

Greetings fellow Python enthusiasts, @methane here. compact dict was merged into Python 3.6 back in September (right before it went into beta). As a result of this fortunate turn of events, I received a recommendation and became a Python Core Developer in October of last year.

 

Please allow me to clarify. It’s not like I’m a full-time committer for Python employed by KLab. However, thanks to KLab’s flex-time system, I’ve been able to spend the majority of my work hours contributing to OSS and pouring over code. Especially in the last 3 months, I’ve been spending a lot of time working with Python specifically, so it’s almost like I’m a full-time Python coder.

 

I don’t really get very many opportunities to share this part of my engineering efforts here in Japan, so after being absolutely blown away by what Money Forward’s Urabe-san wrote about in his recent article on ruby-core, I decided to buckle down and write about some of the stuff that’s been going on in the world of Python recently.


Python 3.6 Released

Python 3.6 was released on December 23. There were so many important improvements included in this update that there’s no way I could cover them all in one blog. However, the easiest one to explain (even for people who don’t normally work with Python) is probably f-string.

 

f-string is a feature that enables you to write methods in the middle of strings. Many LL have this feature. However, when used too much, it makes it more difficult to maintain the code you’re writing. I’d like to suggest the following–just as you would write f"{foo.name} = {foo.value}" instead of "{foo.name} = {foo.value}".format(foo=foo), simply use it to replace .format(name=name)  and off you go.

 

Personally, I mostly contributed to speeding up compact dict and asyncio. This is a little off-point, but I thought I’d share this little tidbit: 2 days after Python 3.6 was released, a new hash that’s almost identical to compact dict was implemented in Ruby 2.4.0. What a coincidence.


UTF-8 Support Improved for C locale (Mainly)

In Linux, Python looks at the locale for determining the encoding for Terminal, standard input-output, and file paths.

 

However, C locale in POSIX (also known as POSIX locale) uses ASCII as a general rule. When non-ASCII characters are used, it generates a Unicode Encode Error. As for standard input-output, you can control it with the environment variable PYTHONIOENCODING. However, it can’t be used to set the encodings for command line functions or file paths.

 

C locale is the default locale, so when you don’t define it in crontab, it gets used automatically at somewhat inopportune times–like when the shh sends a LANG=ja_JP.UTF-8 to a server connection that doesn’t have a  ja_JP.UTF-8 locale.  Many engineers choose to use C locale on purpose when they want to avoid translated error messages (it’s hard to write reports in English), or when they don’t want the behavior of commands to change. When creating small Linux environments with containers or other special insertions, apart from C, no other locales even exist in order to conserve space.

 

Thanks to the above reasons and more, sometimes Unicode Encode Errors are generated when using Python in C locale. This especially creates problems for people who aren’t actually Python engineers themselves, but are in a situation in which they are using Python-created tools.

 

Python has a broad base of users. This means that the way these engineers use locales also varies widely. To be brutally honest, there just aren’t too many users out there using C locale who really want to use ASCII. That’s why it’s been proposed that the new default setting for C locale should be changed to UTF-8.

 

(Because this is still in the proposal phase, you can’t use it even if you check out Python’s in-development branch.)


PEP 540: Add a New UTF-8 Mode

We’ve proposed adding a UTF-8 mode to Python. In UTF-8 mode, it ignores the encoding designated by the locale, and the file paths and standard input-output are all set to UTF-8.

 

This mode comes in three basic settings: disabled, enabled, and strict. The difference between enabled and strict is as follows. You should use enabled when you plan on using surrogate escape to handle transparent byte strings that aren’t encoded in UTF-8. With strict, byte strings that aren’t encoded in UTF-8 throw errors. With C locale, the mode is set to enabled by default. This makes it possible to use file paths and read and write standard input-output not encoded in UTF-8. These days you might not ever really use this, but this is extremely useful when mounting external file systems (just like back in the day).

 

For locales other than C, the encoding set by locale uses strict. This means that any data besides UTF-8 generates an error. This is useful when you want to eliminate everything but UTF-8. For example, let’s say you want to catch and eliminate any data that may become corrupted and turn into an illegible string of weird characters before it becomes a problem.

 

This mode can be controlled by environment variables such as PYTHONUTF8 and options like  -X utf-8. For engineers who want to ignore the locale altogether, you can write .bashrc to export PYTHONUTF8=1. You could also write PYTHONUTF8=1 to /etc/environment if you prefer.


PEP 538: Coercing the Legacy C locale to C.UTF-8

This PEP (Python Enhancement Proposal) proposes changing the locale to C.UTF-8 (if the system supports it.) when C locale is the one designating the environment variables. By doing so, not only will it affect Python itself, but we can also expect it to make other libraries operate in UTF-8 instead of ASCII or latin1.

 

For example, this applies to readline when being used by Python’s REPL. We’ve received reports that readline is able to handle UTF-8 conveniently and easily without changing any settings on REPL when being used by Python on an Android device.


Expanding New Calling Conventions (METH_FASTCALL)

Fundamentally speaking, when looking at the way Python calls functions from the perspective of the world of C, ordered parameters are passed as doubles and keyword variables are passed via dict.

 

With the advent of Python 3.6, a new calling convention emerged that took the head pointer array and the number of ordered variables and passed them on, instead of using doubles. This meant that the variables stored in the stack on the calling side didn’t have to keep them as doubles–they were free to pass them on just as they are.

 

Currently, efforts to make use of a new calling convention for “functions that can be called by Python” created in the C language are already underway. In fact, the most important parts have already been finished. Additionally, this is still kind of hush-hush, but apart from functions and methods created by PyMethodDef (name, flag showing which calling convention to use, structure made of function pointers), there is a form that makes entire objects callable. (For engineers familiar with Python, you might want to think of it as a  operator.itemgetter() that returns objects.) The function pointer tp_call is included in the structure made up of this metadata.


Because PyMethodDef traditionally uses flags to offer support for multiple calling conventions, we had to use a special API when making calls from the C language. Until now, tp_call only offered support for standard calling conventions that merely received doubles and dict. Now the external library has a chance of calling the function pointer directly without going through the API.

 

At the same time, in order to maintain compatibility, the function pointer tp_fastcall has been added. For designs that have made room for tp_fastcall already, there is a patch currently under review that embeds a function that automatically converts tp_fastcall to tp_call.

 

At the end of the day, while it may look like we are only calling one variable from the world of Python, this variable is often passed between multiple functions in the world of C. As a result, when both the new and traditional methods are used inside the same code, it creates a risk in which an inordinate number of conversions may take place internally when they don’t need to be. With the advent of Python 3.7, the passing of variables internally has been unified to only use the new method. I believe the new calling conventions are able to flex their true muscle and show of their true power in this environment.

Background


There was an article about ordered dictionaries posted on the PyPy developers blog back in January 2015:

Faster, more memory efficient and more ordered dictionaries on PyPy


The new version of PyPy (2.5.1) released afterwards continued offering insertion order support for dict, while also reducing the amount of memory used.


In other news, someone proposed an innovative idea for CPython in PEP 468: We can preserve the order of arguments by receiving keyword arguments in the **kwargs syntax as placeholder arguments.


For example, inside the SQLAlchemy query, if we wrote .filter_by(name="methane", age=32), there was no way to be sure if it would create a query that read WHERE name = "methane" AND age = 32 or WHERE age = 32 AND name="methane". This new change brings the added benefit of being able to preserve the order of arguments.


(filter_by and other workarounds are special shortcut functions. You can also preserve the order of arguments if you use the filter method, which doesn’t use keyword arguments.)


The person who proposed this idea re-implemented the OrderedDict class that was part of pure Python in C. Since Python 3.5, OrderedDict has become significantly faster and is much more memory-efficient. (The reason he avoided revising dict was that it has already been optimized in such a multifaceted and complicated way for the sake of the Python interpreter.)


However, while it has been re-implemented in C, OrderedDict, which manages the order with bi-directional links, still had some pretty costly overhead of its own. It was using almost twice the amount of memory.


Python 3.5.1 (default, Dec  7 2015, 17:23:22)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> d = {i:i for i in range(100)}
>>> from collections import OrderedDict
>>> od = OrderedDict((i,i) for i in range(100))
>>> sys.getsizeof(d), sys.getsizeof(od)
(6240, 11816)


Because of that and other factors, PEP 468 was put on the shelf.


All that being said, I too was resistant to the idea of changing the specs if there was a chance that it would reduce the performance of all keyword arguments, even if it was useful in some use cases. I had also taken an interest in PyPy’s dict, so I thought I would take a stab at it with just enough time to leave room for the chance for it to be in time for Python 3.6.


(The beta version is planned for release in the first half of September, after which no one will be allowed to add any more features. I would need to implement the changes, run an evaluation verification test, and discuss it on the mailing list before this deadline in order for it to be merged in the the new version of Python.)


Data Structures

The only data structure I changed was PyDictKeysObject. However, the memory layout has become much more dynamic than before.


struct _dictkeysobject {
   Py_ssize_t dk_refcnt;
   Py_ssize_t dk_size;
   dict_lookup_func dk_lookup;
   Py_ssize_t dk_usable;
   Py_ssize_t dk_nentries;  /* How many entries are used. */
   char dk_indices[8];    /* Dynamically sized. 8 is the minimum. */
};

#define DK_SIZE(dk) ((dk)->dk_size)
#define DK_IXSIZE(dk) (DK_SIZE(dk) <= 0xff ? 1 : DK_SIZE(dk) <= 0xffff ? 2 : \
                      DK_SIZE(dk) <= 0xffffffff ? 4 : sizeof(Py_ssize_t))
#define DK_ENTRIES(dk) ((PyDictKeyEntry*)(&(dk)->dk_indices[DK_SIZE(dk) * \
                       DK_IXSIZE(dk)]))

dk_get_index(PyDictKeysObject *keys, Py_ssize_t i)
{
   Py_ssize_t s = DK_SIZE(keys);
   if (s <= 0xff) {
       return ((char*) &keys->dk_indices[0])[i];
   }
   else if (s <= 0xffff) {
       return ((PY_INT16_T*)&keys->dk_indices[0])[i];
   }
   else if (s <= 0xffffffff) {
       return ((PY_INT32_T*)&keys->dk_indices[0])[i];
   }
   else {
       return ((Py_ssize_t*)&keys->dk_indices[0])[i];
   }
}

dk_set_index(PyDictKeysObject *keys, Py_ssize_t i)
{


The previous hash tables were arrays of three-word (hash, key, value) PyDictKeyEntry data structures. However, in the new method I was trying, I changed these hash table elements to integer values. char dk_index[8] is used when declaring the data structure, but this is only to set the minimum dk_size to 8 at this time. A larger size will be acquired when it’s allocated. This integer type actually allows for a dk_size of up to 128 is a char. However, starting at 256 it becomes an int16_t. This is how I was able to cut down the size of the hash table as much as possible.


Furthermore, we are unable to declare the data structure directly because the size of dk_indices is dynamic. However, I carefully placed the PyDictKeyEntry data structure array after this data structure. The size of this array is not dk_size, but rather it is 2/3 of that. (I introduced this idea in a previous article. This is the maximum number of elements that can be inserted into this hash table.) When inserting new elements, we simply write them into this array, and save the index in dk_indices. “dk_nentries” refers to the number of elements found inside the array.


The following pseudo-code, which assumes that the same key doesn’t already exist, shows what the code looks like when inserting new elements.


// Search for the place of insertion inside dk_indices
pos = lookup(keys, key, hash);


// Write the new entry into the entries array

DK_ENTRIES(mp)[keys->dk_nentries].me_hash = hash;
DK_ENTRIES(mp)[keys->dk_nentries].me_key = key;
DK_ENTRIES(mp)[keys->dk_nentries].me_value = value;

// Save the index of that entry into dk_indices
dk_set_index(keys, pos, keys->dk_nentries);

// Lastly, we increment the entry number we just finished using
mp->dk_nentries++;


Deleting Items

In order to delete items from this version of dict, we have to insert a placeholder “dummy element” in the same position inside dk_indices. (The reason each index is dealt with using a 1-byte entry number that goes up to 128 and not 256 is because we need to account for negative values that will be inserted as dummy values and represented as empty spaces.)


There are two ways to delete items from our entries.


When the “compact dict” idea was first posted to the Python developers mailing list, it was suggested that we would be able to closely preserve the entries array by moving the last element to the location where the deleted element had previously been. Using this method, the last element moves forward. This means that when we delete an element, we lose the attribute of “being able to preserve the order in which elements were inserted.”



On the other hand, what the PyPy community and I decided to go with at this time is simply to insert a handy NULL in the newly-vacated position.


// Search for the position of the index of the element that was deleted inside dk_indices
pos = lookup(keys, key, hash);

// Get the position of the element to be deleted inside the entry array
index = dk_get_index(keys, pos);

// Delete the element
DK_ENTRIES(mp)[index].me_key = NULL;
DK_ENTRIES(mp)[index].me_value = NULL;

// Add a “dummy” inside dk_indices
dk_set_index(keys, pos, DUMMY);


Handling things this way, the entry array becomes filled with dummy entries when a lot insertions and deletions are performed. This gives rise to a slightly unpleasant situation that calls for a compaction to be run in order to take care of this. However, in the method I first proposed, the hash table became filled with dummy entries when insertions and deletions are performed repeatedly, giving rise to the possibility that we will no longer be able to perform searches. Either way, we are going to have to run a compaction. At the end of the day, I believe it’s better to be able to preserve the insertion order.


By the way, deleting the last element in the entry array and decrementing dk_nentries, .popitem() preserves the average calculation amount in 0(1). In this case as well, we do not increment the “remaining number of insertable elements” known as dk_usable. This means that when we repeatedly delete and insert items, we will need to run a compaction and restructure the hash table.


Shared-Key Dict

Now we can get on the to the problem known as shared key dict.


This is what I thought when I first started working on the project: If we do NOT insert dummy elements into the hash table and if the entry array side is set to NULL, we can tell this is just another dummy element, just like before when we implemented compact dict.


Boy, did I have a lot to learn. This way, we won’t be able to preserve the order of insertion for dict the first time a new element is added to shared key.


>>> class A:
...     pass
...
>>> a = A()
>>> b = A()
>>> a.a = 1
>>> a.b = 2
>>> b.b = 1
>>> b.a = 2
>>> a.__dict__.items()
dict_items([('a', 1), ('b', 2)])
>>> b.__dict__.items()  # Even though the actual insertion order is b, a…
dict_items([('a', 2), ('b', 1)])


In order to address this problem, we need to consider the following three methods. After discussing this on the mailing list, we have no way of knowing which way the language will proceed until a final decision is made by either Guido or a core developer appointed by Guido himself.


  1. Simply Accept Things the Way They Are

The way the language specs for Python work right now, the order of dict is indefinite. As a result, even though currently the behavior of the language preserves insertion order (EXCEPT for dict, which manages the characteristics of instances), there is no problem from the point of view of the actual specs of the language.


When using compact dict, the size of shared key dict and the ma_values array becomes 2/3 the size of dk_keys, making it much more compact. As a result, we should simply accept this as a blessing from above.


On the other hand, as a downside of all this, we have to work extra hard to preserve the insertion order in almost all cases. This means that programmers who don’t take the time to check on the language specs might mistake this as part of the specs. Blaming this problem on the person for mistaking the specs of the language is hardly fair. For example, in order to avoid this downside in Go, the order when map is iterated is left indefinite on purpose (by using pseudo-random numbers generated at high speeds).


2. Stop Using Shared Key If the Insertion Order Is Wrong

If the new element attempts to insert itself in an order that differs from the order held by shared key, you could always just stop using shared key immediately.


While this looks like the safest route at first glance, it’s difficult to tell just how long you can maintain shared key, as it becomes difficult to predict the amount of resources that will be consumed. Also, when a rarely used path differs from the inserted order, shared key will be disengaged. Even though we’re continuing to use the same-sized dict for around the same amount, the amount of memory used will continue to increase slowly but steadily. These are just some of the problems that may arise from adopting this method.


For programs such as web applications that run for extended periods of time, no one is happy about the fact that the amount of memory used becomes difficult to predict and, on top of that, gradually increases. Choosy programmers should choose to stay away from this method.


3. Stop Using Shared Key Dict

shared key dict is an interesting beast. When it fits just right, it is extremely effective. However, compact ordered dict is more stable and more effective overall. On top of that, in order to support shared key dict, the implementation of dict becomes much more complicated. I tried seeing how many implementations of shared key dict I could remove. Out of a total of 4100 lines, I was able to remove around 500. I simply just deleted them, so if you did a little refactoring, you could probably get rid of even more.


While this is effective, when I built a Python document using Sphinx and measured maxrss using /usr/bine/time, I got the following results:


  • shared: 176312k

  • compact + shared: 158104k

  • compact only: 166888k


As you can see, even if you quit using shared key, the amount of memory usage reduced via the effect on compact dict is larger.


(Of course, this is just the results of a test run on one application. If anyone knows of another practical application that’s good at measuring other indicators, and has stable running times and memory usage amounts when using classes and instances a reasonable amount, please drop me a line. I’d love to chat.)


Also, when deleting shared key and running the remaining portion of the program, you can aim for an even higher level of efficiency than compact + shared by implementing a specialized dict that is separate but even more efficient. The idea i have right now is currently being implemented in POC. If it gets adopted, I’ll share it with you all later.


OrderedDict

I’ll close out this article by adding a little more detail as to how to speed up compact dict by using OrderedDict.


Python 3 includes the move_to_end(key, last=True) method that was not available in Python 2.7.  These keyword arguments are rather tricky to use, but by using move_to_end(key, last=False), you can move elements to the front of the line. (The actual functionality aside, I think this method is extremely poorly named. It should be called move_to_front(key).)


I have an idea. In order to implement this feature, you would need to handle the dk_entries array as a fixed-capacity deque, and not as a fixed-capacity dynamic array. In other words, right now we’re using everything from dk_entries[0] to dk_entries[dk_nentries-1]. However, in addition to this, when adding elements to the front of the line, we’ll add them behind dk_entries and move towards the front of the line when inserting them.


In order to make this idea a reality, we’ll have to create a bizarre version of dk_nentries, modifying it to be able to use the scanning and resizing functions of hash table. That should do the trick just nicely. By adding 1 word (8 bytes) for per OrderedDict, we should be able to cut the amount of memory consumed in half.


All that being said, we still have our hands full with the shared key problem. On top of that, once we are able to preserve the insertion order for dict, the number of chances to use OrderedDict are reduced as a result, leaving us with little motivation to implement the idea. At the very least, I don’t think this will be ready in time for Python 3.6 (unless someone does the heavy lifting for me.)



@methane

Greetings netizens, pandax381 here. A new version of Keepalived (ver. 1.3.0) was released on November 20. I take it we’re all up-to-date on our version upgrades?


2016-11-20 | Release 1.3.0

New MAJOR release with stabilization fixes. Support to DBus. Conf extensions. Parser error log. Security extensions to run scripts more securely. Refer to ChangeLog for more info.


(In the dev mailing list, they included an announcement hinting that ver. 2.0.0 was close at hand.)


This is a quick email to announce a new major Keepalived release. We are planning with Quentin to push a new release soon as the 2.0.0 release. This one will fix and extend previous parts. It also comes with a Security fix for those making extensive use of scripts.


Released on November 20, the biggest changes in version 1.3.0 include added support for DBus as well as a new security structure for running scripts. Surprise! These two features came from patches KLab created specifically for this package. They were quietly included in the recent update, so in order to set the record straight, I’ve written about them in detail below.


Improved Healthcheck Patch

Keepalived comes with a healthcheck feature that lets you monitor real servers. However, the protocols it supports by default are limited to TCP, HTTP(S), and SMTP. Apart from HTTP and SMTP, you had to add your own healthcheck script if you wanted to do more than just check your TCP connections.


MISC_CHECK, the part of the program that runs our hand-made healthcheck script, has a few problems of its own. That’s why the engineers over here at KLab developed and released a patch that really does a wonder on Keepalived’s healthcheck. (I’ll talk more about this little issue later.)



In this patch, we added support protocols for FTP, DNS, and SSL to the healthcheck. Inside the article it says, “this patch hasn’t been applied to the DSAS live environment.” However, soon after this article was posted (way back in 2007) we officially started using the patch. In fact, we still use it today. ...Or at least until Keepalived 1.3.0 came out.


Independently-developed patches are often made obsolete by new and improved official versions of software. Thus is the fate of most independent patches. We wrote our healthcheck patch over a decade ago. A lot has changed since then. There’s no way we could keep using the patch as-is with subsequent versions of Keepalived, so we’ve made an effort to update the patch regularly in order to stay ahead of the curve. Every time a new version of Keepalived comes out, we’ve had to update the patch to keep it useable. Needless to say, it was becoming a bit of a chore. The only way to escape this vicious cycle of recursive updates was for our independently developed baby patch to grow up and leave the nest to be merged into the official version of Keepalived.



Just in case you missed the title of this blog, KLab’s in-house patch has been officially merged into Keepalived. It is now a standard feature as of the version 1.3.0 update.



Here’s the pull request from the healthcheck improvement patch.

The original patch only included support for FTP, DNS, and SSL. These days we only use DNS with DSAS, so we added a DNS healthcheck into the standard healthcheck functionality. Here’s the format for DNS_CHECK.


  # one entry for each realserver
   real_server <IPADDR> <PORT>
   {
          # DNS healthchecker
          DNS_CHECK
          {
              # ======== generic connection options
              # Optional IP address to connect to.
              # The default is the realserver IP
              connect_ip <IP ADDRESS>
              # Optional port to connect to
              # The default is the realserver port
              connect_port <PORT>
              # Optional interface to use to
              # originate the connection
              bindto <IP ADDRESS>
              # Optional source port to
              # originate the connection from
              bind_port <PORT>
              # Optional connection timeout in seconds.
              # The default is 5 seconds
              connect_timeout <INTEGER>
              # Optional fwmark to mark all outgoing
              # checker packets with
              fwmark <INTEGER>

              # Number of times to retry a failed check
              # The default is 3 times.
              retry <INTEGER>
              # DNS query type
              #   A | NS | CNAME | SOA | MX | TXT | AAAA
              # The default is SOA
              type <STRING>
              # Domain name to use for the DNS query
              # The default is . (dot)
              name <STRING>
          }
   }


There’s a lot of options in there. However, as long as you set “type” and “name” correctly, the code will do its job.


  realserver 192.0.2.100 53 {
          DNS_CHECK {
              type A
              name www.klab.com
          }
   }


You know the healthcheck was successful if you get 1 or more answers for ANSWER SECTION. You have to be careful here. Even if you get a response packet back, you can ignore these packets if ANSWER SECTION is empty. I’m not suggesting this is practical, but you could even control the result of the healthcheck via the registration situation of the DNS record if you wanted to.



Fixing Bugs, Getting Presents

As previously mentioned, MISC_CHECK, which ran our healthscript, had a few problems of its own. Until the latest version of Keepalived was released, the script called by MISC_CHECK to perform the healthcheck had the nasty habit of causing the number of processes running to multiply out of control. An unfortunate bug indeed.


Here’s an easy-to-follow example. You can easily recreate this problem by using the following settings when running MISC_CHECK.


MISC_CHECK {
   misc_path "/bin/sleep 3600"
   misc_timeout 10
}


MISC_CHECK has an interesting structure. If the script being run doesn’t end by the time misc_timeout passes, MISC_CHECK sends a signal that forces the script to end. However, there is a problem that exists inside the code that processes the sending of this signal. This problem keeps the process alive even though it should have been ended once misc_timeout times out. It also makes it so that new processes are generated, which leaves us with an entire nest of processes.


UID   PID  PPID  PGID   SID COMMAND
 0 41010     1 41010 41010 /sbin/keepalived
 0 41013 41010 41010 41010  \_ /sbin/keepalived
 0 41361 41013 41010 41010  |   \_ /sbin/keepalived
 0 41362 41361 41010 41010  |   |   \_ sh -c /bin/sleep 3600
 0 41363 41362 41010 41010  |   |       \_ /bin/sleep 3600
 0 41364 41013 41010 41010  |   \_ /sbin/keepalived
 0 41365 41364 41010 41010  |   |   \_ sh -c /bin/sleep 3600
 0 41366 41365 41010 41010  |   |       \_ /bin/sleep 3600
 0 41367 41013 41010 41010  |   \_ /sbin/keepalived
 0 41368 41367 41010 41010  |       \_ sh -c /bin/sleep 3600
 0 41369 41368 41010 41010  |           \_ /bin/sleep 3600
 0 41014 41010 41010 41010  \_ /sbin/keepalived
 0 41019     1 41010 41010 sh -c /bin/sleep 3600
 0 41020 41019 41010 41010  \_ /bin/sleep 3600
 0 41025     1 41010 41010 sh -c /bin/sleep 3600
 0 41026 41025 41010 41010  \_ /bin/sleep 3600
 0 41031     1 41010 41010 sh -c /bin/sleep 3600
 0 41032 41031 41010 41010  \_ /bin/sleep 3600

In order to run the healthcheck script, MISC_CHECK runs fork(2), then runs system(3). From the perspective of the process sending the signal, the process it wants to end is a great-grandchild. In the original code, the signal is only sent to the child process. This means that only the child process is ended, while the grandchild and great grandchild processes continue to run. Sneaky, sneaky.



The pull request above fixes this bug. When MISC_CHECK fork(2) is run, setpgid(2) is called and the process group separated. signal(2) is then able to designate individual process groups. It then is able to send signals to all of the processes belonging to that process group. This fix allows the patch to send signals to all the child, grandchild, and great grandchild processes. Problem solved!


This fix has also been merged into Keepalived version 1.3.0. Now we should be able to use MISC_CHECK without any problems.


*2: There are actually even more bugs than this. The child processes, which are absolutely essential to this program, simply ignore the signals (SIGTERM) unintentionally. The program was made to work by sending a force-quit signal (SIGKILL), which was included as a last-resort measure, to the processes which in effect did kill them when received, but I think we can all agree this was a less-than-elegant solution to the problem. This has all been resolved now.



All in a Day’s Work

In the article above I made it sound like they took our code as-is and just added it into the package. To be completely honest, they pretty much re-wrote our patch from scratch. Sadly, most of the code we wrote can’t be found inside the official version of Keepalived.


There’s a very good reason for this. The framework for the healthcheck that comes with Keepalived is designed for TCP. It wasn’t designed with UDP in mind. That’s why we wrote all the socket-related code from scratch in our original patch. This left us with some interesting results. Compared to standard healthchecks, our code is full of programming faux-pas and really misses out on a lot of best practices. For example, if you sent a pull request as-is, it might upset the entire program.


That being said, the framework wasn’t designed for UDP. We wracked our brains for days, but since there really wasn’t any other good way to go about it, we ended up revising the core of Keepalived, changing the way the very fabric of the framework. (The individual functions of healthcheck are positioned like modules. You can add new functions easily, but it’s a pretty big undertaking to revise the main body.)



In this commit, we took a framework that was designed with only TCP in mind and made it support UDP. In order to avoid changing the code being used by the framework, we used inline functions to create a wrapper, all while keeping their compatibility intact. This fix makes it much easier to create a healthchecker with a UDP base, so there’s a chance that the number of supported protocols will increase in the future.


At the end of the day, writing the message for the pull request was harder than writing the actual code itself. The hardest part for me personally was the language barrier. My English is pretty deplorable (thank you ghost-writer for translating this blog!), so if someone told me to do all this with MISC_CHECK, that would have been the end of the story. I poured my heart and soul into writing a message in English that conveyed exactly why I felt it was so important that they add this part to the default healthchecker. I was relying a lot on Google Translate (which was pretty bad back in the day for Japanese→English), and it was giving me a lot of grief, but in the end, I was able to get the results I had sought after for so long.

This may very well be the biggest and most-well known product my own code has ever been included in. Simply put, I am ecstatic. I’m not sure exactly how much demand there is for this kind of thing, but if you’re reading this blog, please try out the DNS healthcheck feature!

Back to Top