Background


There was an article about ordered dictionaries posted on the PyPy developers blog back in January 2015:

Faster, more memory efficient and more ordered dictionaries on PyPy


The new version of PyPy (2.5.1) released afterwards continued offering insertion order support for dict, while also reducing the amount of memory used.


In other news, someone proposed an innovative idea for CPython in PEP 468: We can preserve the order of arguments by receiving keyword arguments in the **kwargs syntax as placeholder arguments.


For example, inside the SQLAlchemy query, if we wrote .filter_by(name="methane", age=32), there was no way to be sure if it would create a query that read WHERE name = "methane" AND age = 32 or WHERE age = 32 AND name="methane". This new change brings the added benefit of being able to preserve the order of arguments.


(filter_by and other workarounds are special shortcut functions. You can also preserve the order of arguments if you use the filter method, which doesn’t use keyword arguments.)


The person who proposed this idea re-implemented the OrderedDict class that was part of pure Python in C. Since Python 3.5, OrderedDict has become significantly faster and is much more memory-efficient. (The reason he avoided revising dict was that it has already been optimized in such a multifaceted and complicated way for the sake of the Python interpreter.)


However, while it has been re-implemented in C, OrderedDict, which manages the order with bi-directional links, still had some pretty costly overhead of its own. It was using almost twice the amount of memory.


Python 3.5.1 (default, Dec  7 2015, 17:23:22)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> d = {i:i for i in range(100)}
>>> from collections import OrderedDict
>>> od = OrderedDict((i,i) for i in range(100))
>>> sys.getsizeof(d), sys.getsizeof(od)
(6240, 11816)


Because of that and other factors, PEP 468 was put on the shelf.


All that being said, I too was resistant to the idea of changing the specs if there was a chance that it would reduce the performance of all keyword arguments, even if it was useful in some use cases. I had also taken an interest in PyPy’s dict, so I thought I would take a stab at it with just enough time to leave room for the chance for it to be in time for Python 3.6.


(The beta version is planned for release in the first half of September, after which no one will be allowed to add any more features. I would need to implement the changes, run an evaluation verification test, and discuss it on the mailing list before this deadline in order for it to be merged in the the new version of Python.)


Data Structures

The only data structure I changed was PyDictKeysObject. However, the memory layout has become much more dynamic than before.


struct _dictkeysobject {
   Py_ssize_t dk_refcnt;
   Py_ssize_t dk_size;
   dict_lookup_func dk_lookup;
   Py_ssize_t dk_usable;
   Py_ssize_t dk_nentries;  /* How many entries are used. */
   char dk_indices[8];    /* Dynamically sized. 8 is the minimum. */
};

#define DK_SIZE(dk) ((dk)->dk_size)
#define DK_IXSIZE(dk) (DK_SIZE(dk) <= 0xff ? 1 : DK_SIZE(dk) <= 0xffff ? 2 : \
                      DK_SIZE(dk) <= 0xffffffff ? 4 : sizeof(Py_ssize_t))
#define DK_ENTRIES(dk) ((PyDictKeyEntry*)(&(dk)->dk_indices[DK_SIZE(dk) * \
                       DK_IXSIZE(dk)]))

dk_get_index(PyDictKeysObject *keys, Py_ssize_t i)
{
   Py_ssize_t s = DK_SIZE(keys);
   if (s <= 0xff) {
       return ((char*) &keys->dk_indices[0])[i];
   }
   else if (s <= 0xffff) {
       return ((PY_INT16_T*)&keys->dk_indices[0])[i];
   }
   else if (s <= 0xffffffff) {
       return ((PY_INT32_T*)&keys->dk_indices[0])[i];
   }
   else {
       return ((Py_ssize_t*)&keys->dk_indices[0])[i];
   }
}

dk_set_index(PyDictKeysObject *keys, Py_ssize_t i)
{


The previous hash tables were arrays of three-word (hash, key, value) PyDictKeyEntry data structures. However, in the new method I was trying, I changed these hash table elements to integer values. char dk_index[8] is used when declaring the data structure, but this is only to set the minimum dk_size to 8 at this time. A larger size will be acquired when it’s allocated. This integer type actually allows for a dk_size of up to 128 is a char. However, starting at 256 it becomes an int16_t. This is how I was able to cut down the size of the hash table as much as possible.


Furthermore, we are unable to declare the data structure directly because the size of dk_indices is dynamic. However, I carefully placed the PyDictKeyEntry data structure array after this data structure. The size of this array is not dk_size, but rather it is 2/3 of that. (I introduced this idea in a previous article. This is the maximum number of elements that can be inserted into this hash table.) When inserting new elements, we simply write them into this array, and save the index in dk_indices. “dk_nentries” refers to the number of elements found inside the array.


The following pseudo-code, which assumes that the same key doesn’t already exist, shows what the code looks like when inserting new elements.


// Search for the place of insertion inside dk_indices
pos = lookup(keys, key, hash);


// Write the new entry into the entries array

DK_ENTRIES(mp)[keys->dk_nentries].me_hash = hash;
DK_ENTRIES(mp)[keys->dk_nentries].me_key = key;
DK_ENTRIES(mp)[keys->dk_nentries].me_value = value;

// Save the index of that entry into dk_indices
dk_set_index(keys, pos, keys->dk_nentries);

// Lastly, we increment the entry number we just finished using
mp->dk_nentries++;


Deleting Items

In order to delete items from this version of dict, we have to insert a placeholder “dummy element” in the same position inside dk_indices. (The reason each index is dealt with using a 1-byte entry number that goes up to 128 and not 256 is because we need to account for negative values that will be inserted as dummy values and represented as empty spaces.)


There are two ways to delete items from our entries.


When the “compact dict” idea was first posted to the Python developers mailing list, it was suggested that we would be able to closely preserve the entries array by moving the last element to the location where the deleted element had previously been. Using this method, the last element moves forward. This means that when we delete an element, we lose the attribute of “being able to preserve the order in which elements were inserted.”



On the other hand, what the PyPy community and I decided to go with at this time is simply to insert a handy NULL in the newly-vacated position.


// Search for the position of the index of the element that was deleted inside dk_indices
pos = lookup(keys, key, hash);

// Get the position of the element to be deleted inside the entry array
index = dk_get_index(keys, pos);

// Delete the element
DK_ENTRIES(mp)[index].me_key = NULL;
DK_ENTRIES(mp)[index].me_value = NULL;

// Add a “dummy” inside dk_indices
dk_set_index(keys, pos, DUMMY);


Handling things this way, the entry array becomes filled with dummy entries when a lot insertions and deletions are performed. This gives rise to a slightly unpleasant situation that calls for a compaction to be run in order to take care of this. However, in the method I first proposed, the hash table became filled with dummy entries when insertions and deletions are performed repeatedly, giving rise to the possibility that we will no longer be able to perform searches. Either way, we are going to have to run a compaction. At the end of the day, I believe it’s better to be able to preserve the insertion order.


By the way, deleting the last element in the entry array and decrementing dk_nentries, .popitem() preserves the average calculation amount in 0(1). In this case as well, we do not increment the “remaining number of insertable elements” known as dk_usable. This means that when we repeatedly delete and insert items, we will need to run a compaction and restructure the hash table.


Shared-Key Dict

Now we can get on the to the problem known as shared key dict.


This is what I thought when I first started working on the project: If we do NOT insert dummy elements into the hash table and if the entry array side is set to NULL, we can tell this is just another dummy element, just like before when we implemented compact dict.


Boy, did I have a lot to learn. This way, we won’t be able to preserve the order of insertion for dict the first time a new element is added to shared key.


>>> class A:
...     pass
...
>>> a = A()
>>> b = A()
>>> a.a = 1
>>> a.b = 2
>>> b.b = 1
>>> b.a = 2
>>> a.__dict__.items()
dict_items([('a', 1), ('b', 2)])
>>> b.__dict__.items()  # Even though the actual insertion order is b, a…
dict_items([('a', 2), ('b', 1)])


In order to address this problem, we need to consider the following three methods. After discussing this on the mailing list, we have no way of knowing which way the language will proceed until a final decision is made by either Guido or a core developer appointed by Guido himself.


  1. Simply Accept Things the Way They Are

The way the language specs for Python work right now, the order of dict is indefinite. As a result, even though currently the behavior of the language preserves insertion order (EXCEPT for dict, which manages the characteristics of instances), there is no problem from the point of view of the actual specs of the language.


When using compact dict, the size of shared key dict and the ma_values array becomes 2/3 the size of dk_keys, making it much more compact. As a result, we should simply accept this as a blessing from above.


On the other hand, as a downside of all this, we have to work extra hard to preserve the insertion order in almost all cases. This means that programmers who don’t take the time to check on the language specs might mistake this as part of the specs. Blaming this problem on the person for mistaking the specs of the language is hardly fair. For example, in order to avoid this downside in Go, the order when map is iterated is left indefinite on purpose (by using pseudo-random numbers generated at high speeds).


2. Stop Using Shared Key If the Insertion Order Is Wrong

If the new element attempts to insert itself in an order that differs from the order held by shared key, you could always just stop using shared key immediately.


While this looks like the safest route at first glance, it’s difficult to tell just how long you can maintain shared key, as it becomes difficult to predict the amount of resources that will be consumed. Also, when a rarely used path differs from the inserted order, shared key will be disengaged. Even though we’re continuing to use the same-sized dict for around the same amount, the amount of memory used will continue to increase slowly but steadily. These are just some of the problems that may arise from adopting this method.


For programs such as web applications that run for extended periods of time, no one is happy about the fact that the amount of memory used becomes difficult to predict and, on top of that, gradually increases. Choosy programmers should choose to stay away from this method.


3. Stop Using Shared Key Dict

shared key dict is an interesting beast. When it fits just right, it is extremely effective. However, compact ordered dict is more stable and more effective overall. On top of that, in order to support shared key dict, the implementation of dict becomes much more complicated. I tried seeing how many implementations of shared key dict I could remove. Out of a total of 4100 lines, I was able to remove around 500. I simply just deleted them, so if you did a little refactoring, you could probably get rid of even more.


While this is effective, when I built a Python document using Sphinx and measured maxrss using /usr/bine/time, I got the following results:


  • shared: 176312k

  • compact + shared: 158104k

  • compact only: 166888k


As you can see, even if you quit using shared key, the amount of memory usage reduced via the effect on compact dict is larger.


(Of course, this is just the results of a test run on one application. If anyone knows of another practical application that’s good at measuring other indicators, and has stable running times and memory usage amounts when using classes and instances a reasonable amount, please drop me a line. I’d love to chat.)


Also, when deleting shared key and running the remaining portion of the program, you can aim for an even higher level of efficiency than compact + shared by implementing a specialized dict that is separate but even more efficient. The idea i have right now is currently being implemented in POC. If it gets adopted, I’ll share it with you all later.


OrderedDict

I’ll close out this article by adding a little more detail as to how to speed up compact dict by using OrderedDict.


Python 3 includes the move_to_end(key, last=True) method that was not available in Python 2.7.  These keyword arguments are rather tricky to use, but by using move_to_end(key, last=False), you can move elements to the front of the line. (The actual functionality aside, I think this method is extremely poorly named. It should be called move_to_front(key).)


I have an idea. In order to implement this feature, you would need to handle the dk_entries array as a fixed-capacity deque, and not as a fixed-capacity dynamic array. In other words, right now we’re using everything from dk_entries[0] to dk_entries[dk_nentries-1]. However, in addition to this, when adding elements to the front of the line, we’ll add them behind dk_entries and move towards the front of the line when inserting them.


In order to make this idea a reality, we’ll have to create a bizarre version of dk_nentries, modifying it to be able to use the scanning and resizing functions of hash table. That should do the trick just nicely. By adding 1 word (8 bytes) for per OrderedDict, we should be able to cut the amount of memory consumed in half.


All that being said, we still have our hands full with the shared key problem. On top of that, once we are able to preserve the insertion order for dict, the number of chances to use OrderedDict are reduced as a result, leaving us with little motivation to implement the idea. At the very least, I don’t think this will be ready in time for Python 3.6 (unless someone does the heavy lifting for me.)



@methane