Greetings fellow Python enthusiasts, @methane here. compact dict was merged into Python 3.6 back in September (right before it went into beta). As a result of this fortunate turn of events, I received a recommendation and became a Python Core Developer in October of last year.

 

Please allow me to clarify. It’s not like I’m a full-time committer for Python employed by KLab. However, thanks to KLab’s flex-time system, I’ve been able to spend the majority of my work hours contributing to OSS and pouring over code. Especially in the last 3 months, I’ve been spending a lot of time working with Python specifically, so it’s almost like I’m a full-time Python coder.

 

I don’t really get very many opportunities to share this part of my engineering efforts here in Japan, so after being absolutely blown away by what Money Forward’s Urabe-san wrote about in his recent article on ruby-core, I decided to buckle down and write about some of the stuff that’s been going on in the world of Python recently.


Python 3.6 Released

Python 3.6 was released on December 23. There were so many important improvements included in this update that there’s no way I could cover them all in one blog. However, the easiest one to explain (even for people who don’t normally work with Python) is probably f-string.

 

f-string is a feature that enables you to write methods in the middle of strings. Many LL have this feature. However, when used too much, it makes it more difficult to maintain the code you’re writing. I’d like to suggest the following–just as you would write f"{foo.name} = {foo.value}" instead of "{foo.name} = {foo.value}".format(foo=foo), simply use it to replace .format(name=name)  and off you go.

 

Personally, I mostly contributed to speeding up compact dict and asyncio. This is a little off-point, but I thought I’d share this little tidbit: 2 days after Python 3.6 was released, a new hash that’s almost identical to compact dict was implemented in Ruby 2.4.0. What a coincidence.


UTF-8 Support Improved for C locale (Mainly)

In Linux, Python looks at the locale for determining the encoding for Terminal, standard input-output, and file paths.

 

However, C locale in POSIX (also known as POSIX locale) uses ASCII as a general rule. When non-ASCII characters are used, it generates a Unicode Encode Error. As for standard input-output, you can control it with the environment variable PYTHONIOENCODING. However, it can’t be used to set the encodings for command line functions or file paths.

 

C locale is the default locale, so when you don’t define it in crontab, it gets used automatically at somewhat inopportune times–like when the shh sends a LANG=ja_JP.UTF-8 to a server connection that doesn’t have a  ja_JP.UTF-8 locale.  Many engineers choose to use C locale on purpose when they want to avoid translated error messages (it’s hard to write reports in English), or when they don’t want the behavior of commands to change. When creating small Linux environments with containers or other special insertions, apart from C, no other locales even exist in order to conserve space.

 

Thanks to the above reasons and more, sometimes Unicode Encode Errors are generated when using Python in C locale. This especially creates problems for people who aren’t actually Python engineers themselves, but are in a situation in which they are using Python-created tools.

 

Python has a broad base of users. This means that the way these engineers use locales also varies widely. To be brutally honest, there just aren’t too many users out there using C locale who really want to use ASCII. That’s why it’s been proposed that the new default setting for C locale should be changed to UTF-8.

 

(Because this is still in the proposal phase, you can’t use it even if you check out Python’s in-development branch.)


PEP 540: Add a New UTF-8 Mode

We’ve proposed adding a UTF-8 mode to Python. In UTF-8 mode, it ignores the encoding designated by the locale, and the file paths and standard input-output are all set to UTF-8.

 

This mode comes in three basic settings: disabled, enabled, and strict. The difference between enabled and strict is as follows. You should use enabled when you plan on using surrogate escape to handle transparent byte strings that aren’t encoded in UTF-8. With strict, byte strings that aren’t encoded in UTF-8 throw errors. With C locale, the mode is set to enabled by default. This makes it possible to use file paths and read and write standard input-output not encoded in UTF-8. These days you might not ever really use this, but this is extremely useful when mounting external file systems (just like back in the day).

 

For locales other than C, the encoding set by locale uses strict. This means that any data besides UTF-8 generates an error. This is useful when you want to eliminate everything but UTF-8. For example, let’s say you want to catch and eliminate any data that may become corrupted and turn into an illegible string of weird characters before it becomes a problem.

 

This mode can be controlled by environment variables such as PYTHONUTF8 and options like  -X utf-8. For engineers who want to ignore the locale altogether, you can write .bashrc to export PYTHONUTF8=1. You could also write PYTHONUTF8=1 to /etc/environment if you prefer.


PEP 538: Coercing the Legacy C locale to C.UTF-8

This PEP (Python Enhancement Proposal) proposes changing the locale to C.UTF-8 (if the system supports it.) when C locale is the one designating the environment variables. By doing so, not only will it affect Python itself, but we can also expect it to make other libraries operate in UTF-8 instead of ASCII or latin1.

 

For example, this applies to readline when being used by Python’s REPL. We’ve received reports that readline is able to handle UTF-8 conveniently and easily without changing any settings on REPL when being used by Python on an Android device.


Expanding New Calling Conventions (METH_FASTCALL)

Fundamentally speaking, when looking at the way Python calls functions from the perspective of the world of C, ordered parameters are passed as doubles and keyword variables are passed via dict.

 

With the advent of Python 3.6, a new calling convention emerged that took the head pointer array and the number of ordered variables and passed them on, instead of using doubles. This meant that the variables stored in the stack on the calling side didn’t have to keep them as doubles–they were free to pass them on just as they are.

 

Currently, efforts to make use of a new calling convention for “functions that can be called by Python” created in the C language are already underway. In fact, the most important parts have already been finished. Additionally, this is still kind of hush-hush, but apart from functions and methods created by PyMethodDef (name, flag showing which calling convention to use, structure made of function pointers), there is a form that makes entire objects callable. (For engineers familiar with Python, you might want to think of it as a  operator.itemgetter() that returns objects.) The function pointer tp_call is included in the structure made up of this metadata.


Because PyMethodDef traditionally uses flags to offer support for multiple calling conventions, we had to use a special API when making calls from the C language. Until now, tp_call only offered support for standard calling conventions that merely received doubles and dict. Now the external library has a chance of calling the function pointer directly without going through the API.

 

At the same time, in order to maintain compatibility, the function pointer tp_fastcall has been added. For designs that have made room for tp_fastcall already, there is a patch currently under review that embeds a function that automatically converts tp_fastcall to tp_call.

 

At the end of the day, while it may look like we are only calling one variable from the world of Python, this variable is often passed between multiple functions in the world of C. As a result, when both the new and traditional methods are used inside the same code, it creates a risk in which an inordinate number of conversions may take place internally when they don’t need to be. With the advent of Python 3.7, the passing of variables internally has been unified to only use the new method. I believe the new calling conventions are able to flex their true muscle and show of their true power in this environment.