KLabGames Tech Blog - English

KLab develops and provides service for a variety of smartphone games. The world of mobile games is growing by leaps and bounds, and the development process calls for a different set of skills than traditional console games.

The other day, I was able to mark a major milestone off of my list of things to do when a patch I made for Go was officially adopted into the programming language. I am now a contributor to Go.

This patch solved a pesky problem that required scaling up to 16+ cores in order to preserve performance quality when using database/sql.Stmt heavily. How did I go about investigating this problem? I’ll answer that question as I go over the steps involved in integrating my patch into Go.


I help run maintenance for a TechEmpower project called Framework Benchmarks. I’m mainly in charge of Python-related tests. The project involves creating multiple versions of the same app using different programming languages and frameworks, and then noting the associated benchmarks. I’m fascinated by Go, so I play around with it by adding the Gin framework and participating in code reviews when I get the chance.

For Round 9, the most recent round of the Benchmark project held on May 1, 2014, Peak Hosting was added to the live testing environment. This super high-spec environment features duel Xeon Et-2660 v2 machines and a 10 Gigabit Ethernet connection.

This environment produced some unusual results. In other environments, there were a large number of Go frameworks that placed in the top, especially in Java and nginx-based environments. In this particular environment, however, these former success stories perform at an almost mediocre level, losing to lightweight frameworks such as Scala, Node.js, and PHP.

I received a pull request asking me to fix this problem on November 2, 2014. The request called for a way to fix the lock contention triggered by creating multiple instances of *sql.DB and using them in a round-robin fashion. I was initially opposed to the idea. The purpose of this project was not merely to aim for a high score. My goal was to provide practical code samples for each framework to as many people as possible, as well as important KPIs measuring the performance of each framework as vital reference material. Not only was creating multiple instances of *sql.DB not part of my original design, I didn’t even think the idea was practical.

About the Architecture for database/sql

I’m going to take a short detour and talk about the basic architecture for database/sql. Feel free to skip this section if you’ve ever written a program in Go that accesses a database.

database/sql is a lot like PDO for PHP. It goes on top of the drivers used to connect to each database, and helps give rise to a uniform interface. If you’re talking about MySQL for instance, go-sql-drivers/mysql would be the most well-known driver. Users generally don’t use these drivers directly.

sql.DB is the heart of database/sql. It serves as a connection pool that can be used in the following ways.

  1. Send queries directly with DB.Exec() or DB.Query().

  2. Create an Stmt object used to show prepared statements with DB.Prepare(), then use Stmt.Exec() or Stmt.Query().

  3. Create a Tx object used to represent transactions with DB.Begin(), then use Tx.Exec() or Tx.Query().

When using transactions, the Tx object hangs on to each connection it checks out from the database. In any other case, the query is made using connections pooled by the database. However, this does not allow the user to choose a preferred connection. (An API that lets you check out connections without transactions is planned to be added to Go 1.5 in order to cover additional use cases not accounted for here.)

With the first use case, if you don’t have DB.Query() acting as a substitute place holder, DB.Prepare(), Stmt.Exec(), and Stmt.Close() will run internally. However, this means the code triggers three round trips per query, which takes a toll on overall performance. go-sql-drivers/mysql isn’t designed to handle placeholder substitution either. (I drummed up a pull request that takes care of this, but it's still under review.) This means the best option performance-wise is use case number two.

When using the second use case, Stmt will take care of dealing with the results of Prepare() for each driver. Thanks to this sophisticated design, the user doesn’t have to worry about which connection they used Prepare() with. This is the real deciding factor in the problem here.

All things considered, it’s pretty odd to have multiple databases with this type of architecture. The ideal architecture scales one database per DB required as necessary.

Thorough Testing

First, I had to create an environment that was close to the one in which the problem occurred on Peak Hosting. I combined two c3.8xlarge machines running Amazon Web Services (AWS), enabled enhanced networking, and put them both in the same placement group. This gave rise to an environment that could connect a 32-core machine with a stable 10 Gigabit network connection. Amazon Linux AMI has enhanced networking enabled by default, which makes everything a lot easier. If you’re using the AWS Spot Instance, you can probably borrow two c3.8xlarge machines at roughly $0.70/hour.

Next, I turned on Benchmark, keeping a close eye on my CPU performance via top command while trying to recreate the problem. When using a mutual exclusion (mutex) lock, it’s pretty low-cost if you can trigger the lock right away. If the contention turns into a real competition, you’ll use up a bit of your CPU cycle. If the problematic mutex is locked for a long period of time, the number of runnable threads decreases, lowering the CPU usage. If you keep getting multiple locks in a short period of time, you will generate lock contentions more frequently, heating up the competition. Even if your performance doesn’t scale along with the number of cores, then your CPU usage rate will tend to increase. In this particular test, I was able to confirm the latter set of conditions.

Finally, I used net/http/pprof to determine the root of the problem. net/http/pprof not only includes a CPU profiler, it also comes with multiple diagnostic features you can use to take a more informed look at your system. The feature that helps the most with lock contention is the full stack dump. Here’s the actual stack dump.

Let’s start by locating the Mutex.Lock() with a lot of stopped goroutine()’s. In this case, you’ll find a lot of stack traces like the following. You can check it out in the source code here.

goroutine 81 [semacquire]:
/home/ec2-user/local/go/src/sync/mutex.go:66 +0xd3
database/sql.(*Stmt).connStmt(0xc2080c4280, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/ec2-user/local/go/src/database/sql/sql.go:1357 +0xa6
database/sql.(*Stmt).Query(0xc2080c4280, 0xc2089a1be8, 0x1, 0x1, 0x0, 0x0, 0x0)
/home/ec2-user/local/go/src/database/sql/sql.go:1438 +0x120

We’ve gotten pretty close to the root of the problem here, but we can’t be absolutely certain about the culprit just yet. Lock contention is associated with a problem that can be defined by multiplying the frequency of locks gained by the amount of locks held within a certain time period. In the stack dump, on the other hand, the most frequent occurrences show up the most often. By excluding all identical stack dumps from the list, we’ll be able to look for the section containing Stmt.mu locks a bit more easily. That’s when we come across the following stack dump.

goroutine 158 [semacquire]:
/home/ec2-user/local/go/src/sync/mutex.go:66 +0xd3
database/sql.(*DB).connIfFree(0xc208096dc0, 0xc2083080c0, 0x0, 0x0, 0x0)
/home/ec2-user/local/go/src/database/sql/sql.go:695 +0x67
database/sql.(*Stmt).connStmt(0xc2080c4280, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/ec2-user/local/go/src/database/sql/sql.go:1378 +0x316

The same location described above is causing Stmt.mu() to lock. However, from this same section you can see that we’re now waiting for a DB.mu() lock. When you examine the code for Stmt.connStmt() and DB.connIfFree(), connStmt() calls connIfFree() while looping in response to Stmt.css (the list that manages the prepared connection and the stmt pair). Because connIfFree() locks DB.mu every time, we know that DB.mu is being locked at frequent intervals. However, this specific location only comes up once at the same time as one goroutine does when Stmt.mu locks. It appears that it’s coinciding with a DB.mu lock occurring at another location. Next we need to go back to the stack dump and look for a section containing a DB.mu lock. Finally, we’ve found a lock contention for DB.addDep().

# DB.addDep containing DB.mu
goroutine 18 [runnable]:
database/sql.(*DB).addDep(0xc208096dc0, 0x7f0932f0e4d8, 0xc2080c4280, 0x73d340, 0xc2082ec540)
database/sql.(*Stmt).Query(0xc2080c4280, 0xc2087cbbe8, 0x1, 0x1, 0x0, 0x0, 0x0)
/home/ec2-user/local/go/src/database/sql/sql.go:1455 +0x49a

# DB.addDep containing DB.mu. There are multiple stack dumps exactly like this.
goroutine 64 [semacquire]:
/home/ec2-user/local/go/src/sync/mutex.go:66 +0xd3
database/sql.(*DB).addDep(0xc208096dc0, 0x7f0932f0e4d8, 0xc2080c4280, 0x73d340, 0xc2085f0540)
/home/ec2-user/local/go/src/database/sql/sql.go:364 +0x38
database/sql.(*Stmt).Query(0xc2080c4280, 0xc20899fbe8, 0x1, 0x1, 0x0, 0x0, 0x0)
/home/ec2-user/local/go/src/database/sql/sql.go:1455 +0x49a

We finally have a complete grasp of the problem at hand.

  • Stmt.connStmt() is locking Stmt.mu for a (relatively) long period of time, preventing Stmt from moving in parallel.

  • Stmt.connStmt() continuously calls DB.connIfFree() multiple times, causing DB.mu locks to slow down due to lock contention.

  • DB.addDep() is competing with DB.mu for lock contention.

Solution #1: Isolate DB.mu

When you take a closer look at DB.addDep(), it appears to be managing resources using reference counting, then releasing the resources in the last step. In order to preserve the map used for managing reference counting and release processing, the program is locking and excluding DB.mu. However, you can isolate this into an independent mutex.

We can speed up DB.connIfFree() provided that DB.mu doesn’t cause any lock contention. This solves the problem without changing the overall structure of the code.

However, once we actually attempt to isolate the lock, we find that performance only increases by 20%. This outcome is disappointing compared to the solution offered by having multiple instances of DB and calling them in a round-robin fashion. Even if we solve the lock contention problem for DB.mu, Stmt.connStmt() is still locking Stmt.mu from the inside, slowing down the overall process. The lock contention problem for Stmt.mu still has not been solved.

We’ll throw in a patch to isolate the mutex above, then we’ll work on speeding up Stmt.connStmt() next.

Solution #2: Accelerate Stmt.connStmt() and DB.connIfFree()

Stmt.connStmt() calls DB.connIfFree() on a loop for Stmt.css. The purpose of this loop is to use prepared connections that are currently available. This loop is also used to remove prepared connections from Stmt.css once they are closed. However, if DB.mu gets locked, we can quickly determine whether or not a connection is closed or being used, all without referring to any DB members.

Looking at the inside of Stmt.connStmt(), you can lock DB.mu directly from the outside of the Stmt.css loop. At the same time, you can run the connection check right inside the loop. By reducing the number of times DB.connIfFree() is called, we are able to accelerate the process to the same speed as when using multiple databases in a round-robin fashion.

This is the solution that looked the most promising, so I submitted a patch based on it. You can check it out here.

Steps for Adding Patches to Go

Let’s start by reviewing the Contribution Guidelines for Go. For those of you short on time, here’s a quick rundown of the most pertinent points.

First, you must obtain a consensus on your general plan for the revision on Github Issues.

Before you begin your revision, be sure to sign Google’s CLA. (Details here)

Although the repository for Go recently transition from Mercurial to Git, this doesn't mean that the repository moved to Github. Github is being used as an Issue Tracker and a clone of the repository. Since the original repository is located at https://go.googlesource.com/go, if you’re cloning it from Github, you’ll have to make a few minor changes, such as editing the origin marker in git upstream set-url.

Code reviews for Go take place on Gerrit. Head over to the site and log in. Remember to get yourself a password that will be required when you upload patches from the command line tool.

Next, we’ll install the git-codereview tool.

go get -u golang.org/x/review/git-codereview

This tool manages your patches with one branch per code committed, and sends the patch to Gerrit. If you’re already used to Git, the following explanation should sound familiar.

  • git codereview change fix-abc is equivalent to git checkout -b fix-abc.

  • If it’s your first time committing the code, git codreview change is equivalent to git commit. After that, it’s closer to git commit --amend. (You can use -a just like with commit.)

  • git codereview sync is equivalent to git fetch origin && git rebase origin/master.

  • git codereview mail -diff is equivalent to git diff HEAD..master.

  • git codereview mail is used to send your patch to Gerrit.

Your patch will be reviewed once it is uploaded. (Go is constantly under development. I was totally blown away when I received a reply right away when I uploaded a patch on New Year's Day. Do these guys ever take a break?)

Here’s a list of criteria that will be reviewed:

  1. Overall architecture

  2. Code and commit log review

  3. Test structure

  4. Comment review

The points above paint a general picture of things to watch out for as your patch progresses through the different stages of the review process. Keep in mind that your patch will be reviewed by multiple Googlers several times.

Closing Thoughts

At the beginning of the review, we were able to greatly simplify the architecture, as we eliminated the process of using prioritized prepared connections. This idea came up when the issue was being discussed on Github Issue. If I had only taken the time to read the comments carefully, I would’ve been able to save everyone a lot of trouble.

As for the test itself, I was editing the preexisting benchmarks quite easily. For benchmarks measuring parallelism, however, someone pointed out that I should use RunParallel. That reviewer also posted a comment that included a sample of the code that could be used as a draft. If the beginning and end of the original file already include legacy code, then there’s no way they will let you get away with including legacy code in your edits.

Towards the end of the review, I received a lot of advice on my phrasing and the general composition of my English sentences. Some suggested that it was okay to omit the subject of the sentence when writing on the line to the right of the comment. For longer texts that get moved above the original comment though, I was advised to begin each sentence with the appropriate subject. These points weren’t related to my code really, but they were still part of the process. I ended up submitting my patch a total of 13 times before it was finally accepted.

To be completely honest, it would’ve been a lot easier to just report the problem in the Issues section of my test results, and leave the actual revisions to the Googlers. This would’ve made life simpler for both me and the reviewers. However, having multiple Googlers review my code with so much care was an extremely valuable experience. I’m glad they didn’t just say, “We’ll take care of it,” but instead stuck with me till the end of the review process. Thanks Googlers.


Virtual Reality: The New Frontier

There was a huge virtual reality (VR) craze when I was in high school. I remember going to a Virtual Reality Expo with this exhibit that let you experience the weightless magic of riding on a cloud, just like Goku in Dragon Ball. Another attraction had this game with neat, high-speed LEDs that lit up from top to bottom, giving the game unique, 3D-like effects. Needless to say, the thrills were virtually endless.  

These past few years, VR has stepped back into the spotlight. The advent of products like Oculus Rift and Sony’s Project Morpheus have wrought all sorts of advancements in the world of head-mounted display technology. Thanks to cutting-edge computer graphics, improvements in GPUs, and other new technology, an insanely real, fully-immersive virtual reality experience is now well within our grasp.

The most exciting part of this new generation of VR may very well be Google’s Cardboard kit. It stands out from the competition thanks to its affordable price and sheer simplicity. Anyone can get their hands on this kit.

In this post, we’ll get into the nitty-gritty of the Cardboard kit. My goal for this project is to build a game demo starring a robot that’s piloted through the magic of VR.

Nothin’ Hard About Cardboard

Google Cardboard is a VR attachment designed for smartphones. Yes, it is made of cardboard. The design schematics are available for anyone interested in building their own VR headset. All of the required materials (magnifying glass lenses, some cardboard, and a few strips of fairly sturdy tape) can be purchased at your local dollar store.


Bringing Cardboard to Life–the Easy Way

I built two Cardboard VR kits by following the schematics posted on Google’s website. The lenses can be purchased from your local dollar store. All you have to do is buy a magnifying glass and tear it apart. To my surprise, the photo magnifying glass I bought included two lenses. Lucky me! I was able to get my hands on all the lenses I needed for the price of one.


This is very easy to do if you have a laser cutter on hand. Since I didn’t, I printed out a template of what I wanted my VR headset to look like, taped it onto a piece of cardboard, and used a regular set of cutters to cut it out.

Since cardboard boxes don’t usually come with bands to wrap around your head, I decided to create a very basic headband by using some Velcro tape and elastic bands from an arts and crafts kit. Voila!

Here’s my first cardboard headset in all its glory.


Google Cardboard with Unity (Durovis Dive Version)

The first thing I tried was Durovis Dive’s Tracking Tech. I downloaded the SDK found on the developers page (see link) and imported the Unity Package to get started.

I took the “Dive_Camera” Prefab found in the “Dive/Prefabs/” folder I imported and added it to the scene I was working on. This transforms the device into a binocular camera complete with head movement tracking abilities. By setting this camera as your main camera, you can easily adapt the plugin for use with almost any smartphone VR attachment, including Cardboard.

Google Cardboard with Unity (Cardboard SDK Version)

Next, I tried out the official Google Cardboard Unity SDK. As you can see on the Cardboard Developer Page (Unity SDK), you can download the Unity Package from the Download and Samples page. Import the package into your project to create a Cardboard folder.

Just like with Durovis Dive, add the “CardboardMain” Prefab found in the “Cardboard/Prefabs/” folder. This will add camera functionalities to the device. Now you’re ready to roll with Google Cardboard.

Which One to Choose...

I tried both SDKs in this grand experiment, but I didn’t have time to create a comparative analysis of the two methods. I ended up using Durovis Dive for the app simply because the Cardboard Unity SDK wasn’t released when I first started this experiment.

How Does It Work?

When working with VR attachments for smartphones, one of the biggest challenges lies in designing the user interface. With the Cardboard headset, I solved this conundrum by creating a switch based on magnetism and the smartphone’s magnetic sensor. The smartphone’s magnetic sensor detects the changes in magnetism induced by position changes in the two magnets, which in turn allows data to be input from a single button. This setup is pretty sweet.

For the second Cardboard headset, I used a piece of fabric with conductive properties and a lever to create a mechanism that touched the screen whenever a button is pressed. This setup is pretty cool too. On the other hand, both methods of input don’t allow for a lot of interaction. They also generate lag time per interaction, and are a little cumbersome to control. This could create a few hurdles for games with a lot of action.

It Takes Two, Baby

That’s when it hit me. Why not use the smartphone’s speed sensor and touch panel as a controller? Smartphones are flooding the market at an incredible rate. What’s wrong with taking two of my old phones laying around in my desk drawer and using them as controllers?

This plan calls for a combination of three smartphones–one for display, one for controlling the left side of the robot, and one more for controlling the right. My wife and I had an old iPhone 4 and an iPhone 4s we weren’t using any more. Jackpot! I decided to use these as my controllers.

Connect the Controllers to the VR Headset via Wireless Network

Here comes the tricky part. How exactly should one go about connecting the two controller smartphones and the smartphone used for running and displaying the game? I decided to use WebSocket to solve this quandary. By going through a server, I was able to connect the brain smartphone with the controllers using a WebSocket connection.


For the Unity engine being used on the client-side, I used a modified version of KLab’s C# WebSocket for Unity to build a receiver for running the program. It looked a lot like this:

using UnityEngine;
using System.Collections;
using WebSocketSharp;

public class WebSocketClient : MonoBehaviour {
   public const int RIGHT = 0;
   public const int LEFT = 1;
   public const int B1 = 0;
   public const int B2 = 1;
   public Vector3[] accel = new Vector3[2];
   public bool[,] button = new bool[2,2]{{false,false},{false,false}};

   private WebSocket ws;

   // Use this for initialization
   void Start () {
       ws = new WebSocket ("ws://URL/chat/test");
       ws.OnMessage += (object sender, MessageEventArgs e) => {
           string [] message = e.Data.Split(new char[]{':'});
           int controllerNo=RIGHT;// Default Right is bad
           case "R":
               controllerNo = RIGHT;
           case "L":
               controllerNo = LEFT;
           case "AC":
               accel[controllerNo] = new Vector3(float.Parse(message[3]),
           case "B":
               button[controllerNo,int.Parse(message[3])-1] =
                   (message[4].Equals("DOWN")) ? true : false;
       ws.Connect ();

   // Update is called once per frame
   void Update () {

This class takes information from the player’s controller script and other sources in order to pilot the robot in the game.

I made the controller using an HTML file. Not exactly fancy, but it works just fine by simply displaying it in the browser.

<!DOCTYPE html>
<meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0; user-scalable=no;">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">

<body style='user-select: none; -webkit-user-select: none;'>
<script type="text/javascript">
document.addEventListener('touchmove', function(e) {

var ws;
ws = new WebSocket("ws://URL/chat/test");

function send(message){

 var x = evt.accelerationIncludingGravity.x; //side-to-side tilt
 var y = evt.accelerationIncludingGravity.y; //vertical tilt
 var z = evt.accelerationIncludingGravity.z; //up-and-down tilt
<div style="width: 100px; height: 100px; background-color: red; margin: 20px; float: right;" ontouchstart="send('1:L:B:1:DOWN')" ontouchend="send('1:L:B:1:UP')" ></div>
<div style="clear: right;"></div>
<div style="width: 100px; height: 100px; background-color: blue; margin: 20px; float: right;" ontouchstart="send('1:L:B:2:DOWN')" ontouchend="send('1:L:B:2:UP')" ></div>

With this HTML, I changed the ID number for each controller to create two independent controllers―one for the left side of the robot, and one for the right.

The server used to relay controller messages is only used to broadcast messages received on the client through a simple WebSocket connection. I used this Clojure and Aleph chatroom service for reference.

An Ode to Cyber Troopers Virtual-On

It’s hard to talk about piloting a futuristic robot with two joysticks without bringing up Cyber Troopers Virtual-On, one of the most innovative titles in the “giant fighting robots” gaming genre. I borrowed a few ideas from the series for my game. The angle at which each device is tilted is detected by the built-in acceleration sensor. This in turn is used to move the robot forward when you tilt both left and right hand controller phones forward. Tilt them both backwards to move the robot in reverse. Move them forward or backward in different directions to make the robot spin around. Tilt them horizontally in the same direction to move the robot side to side. Spread the controllers apart horizontally to make the robot jump, and bring the controllers closer together to make the robot move down. Classy.

The screen on the controller phones includes buttons for attacking and boosting. Tapping the buttons triggers the desired effect.


As you can see in the photo above, I used an in-browser HTML page for the controller screen. (Not a lot of whistles and bells here.) The red square is the attack button, and the blue one is the boost button. It works on both Android and IOs devices, but the acceleration sensor is based on a different axis for each OS, so be careful.

Everybody Do the Robot

Here’s a shot of me using my Cardboard headset. Looking good.


And here’s a screenshot of the actual display.



This experiment asks a lot since you need three phones to make it work. All things considered though, it’s not that hard to get your hands on three smartphones these days. When I started scrounging around the house, I found a 3GS, 4, 4s, 5, 5s, 6, and a first-generation iPad, and that’s just for iOS! For Android, we had an HTC Aria, a Kindle Fire HD, and a Kobo Arc. You probably have at least three old phones lying around somewhere.

There’s a little bit of latency that comes into play when you send messages from controllers through the server via the WebSocket connection, but it’s barely noticeable once you start playing. With a little in-game tweaking, you should be able to cover that lag up quite nicely so that it’s virtually unnoticeable.

Overall, this project was a blast. I loved making the VR headsets. Time was flying and so was I. My only regret is that this article fails to convey the full extent of how much fun this project really was! My goal was to create a robot that you could climb in and pilot yourself. The demo’s gameplay was pretty lack-luster, but I'm looking forward to taking what I learned from this project and applying it to the games we’ll make in the future.

Thanks for reading!

Back to Top