Raster
Cover Image

After a whole lot of development over in Enlightenment land, finally EFL 1.20 is out. Over 1600 commits since 1.19 from over 60 different authors.

We've been improving our code quality. Coverity has been scanning our code now for several years, and we've worked hard to pay attention and address issues brought up there. We're now down to 0.03 bugs per 1000 lines of code. As a comparison Qt gets 0.72 bugs per 1000 lines of code, glib gets 0.44, whilst the Linux kernel gets 0.47 and OpenSSL gets 0.32. Note that these numbers to go up and down each scan. This does not mean EFL is bug-free. Far from it. But we are paying attention to catching bugs and issues that can be caught and fixed. We take improving code stability seriously.

Image description

We've been improving our Wayland and DRM display support. Amongst many other things, atomic mode switching and multiseat support. I personally run Enlightenment (git master) with EFL on Raspberry Pi 3 (ARM + VC4), O-DROID XU3/XU4 (ARM + Mali-6xx), several Intel devices (touchscreen laptops with i5, i7 and baytrail cores) and a laptop with NVIDIA nouveau driven graphics for testing. The below is some of my collection, most running in Wayland mode and some in X11 (2 of them are the ARM devices above).

Image description

We've been working to imporve and flesh out our core debug daemon and infrastructure. There has been work on hooking Clouseau into this and there's talk about also adding the work on the profiling viewer code into Clouseau to have a complete debugging one-stop-shop tool for inspecting object trees and timeline execution and much more.

Now if only I can get my hand back from malloc so I can get some work done...

Image description

Raster

Firstly, let me preface this with, the facts at hand are based off this article, and are assuming it is true as written at the time of writing this post here

So ... it seems Grsecurity want to sue Bruce Perens for expressing his opinion that their added restrictions to the GPL license are in violation of it. Let's just leave that alone for now and get to another matter. The matter in the title of this post.

Bruce Perens

First, Bruce Perens is someone I happen to know, and deeply respect. He has been a pillarof the Open Source and Linux Community for decades. He's one of those people who helped build it. I first met him about 20 years ago, and from personal direct expereience can tell you that Bruce is a careful, considered and intelligent person. Also if there is one thing he REALLY knows, it's about Open source. Licenses and Community. Bruce is the kind of guy that God himself turns to when looking for advice about Open Source license issues.

Going after Bruce Perens legally for an opinion, that is IMHO very likely reliable, isn't just about a disagreement, but it's about simply not understanding what Open Source is all about. Specifically Open Source when it comes to the Linux kernel and GPLv2. Forget legal facts here. I'm talking about honor. Open Source is an honor system (backed by a set of licenses), that basically goes like this:

"Here you go. I (and possibly many others) wrote this code, and am/are giving it to you on the basis that you can use it and do what you want as long as you also share it along and don't restrict anyone further than that, and that they get all the rights and freedoms I gave to you. If you don't follow this you re not following the Open Source way".

Let's not get bogged down in the detals of GPL vs. LGPL vs. GPL2 vs. 3 vs. BSD vs. Apache etc. ... the above still is a core part of what Open Source is all about. This is not about nitpicking legalities. It's about the honor, or lack thereof in this move.

So I'm going to make the claim that if you trust Grsecurity because they claim to understand Open Source in any way, then I'd say you should re-think that.

Bruce is the kind of guy that helped build this amazing Open Source world ... and if you go after Bruce for standing up for Open Source licenses like GPL, then you not only are going after him, but you are going after all of us who helped build this Open Source world so many millions of people and billions of dollars of business depend on, and you are sending a clear signal that you just don't get the honor system this world of Open Source is built on, and you re likely a threat to it instead.

Congratulations Grsecurity. You have made it onto my "naughty list". A lump of coal for you this Crhistmas, and for many after that.

And Bruce, you have my support, my respect, and my gratitude.

Raster

Using GOTO in your code is awesome and you should do it more

There. I said it. It needs to be said. Given the plethora of "GOTO is evil. Never use it" that goes around from your professors teaching you to code through to colleagues and coding guidelines, someone has to take the side of the poor maligned GOTO.

Poor little GOTO. Show some love people!

Now let's get back to the real world. I was recently hammering away at a hot path in our code trying to lower the overhead. Suffice the code looked something like this (pseudocode written specially here for illustration - not the exact original code):

if (shared) lock();
if (data == INVALID) {
  log_error("blah %s:%i %s() -> %p\n", __FILE__, __LINE__ __FUNC__, data);
  return NULL;
}
// smallish body of code hunting through some nested tables using data
if (shared) unlock();
return realdata;

Now some more background. lock() and unlock() are static inline to avoid function call overhead as they are very often used in hot paths and the cost of pushing stuff on the stack then doing a call, and returning in addition to the real function they call would not be a great idea to impose as they provide a level of portability across OS's so really we want the compiler to inline the real OS function without the developer needing to know what it is. As well it provides some basic error checking and logging so you don't have to di it all the time yourself, so they are not that trivial. The point is that the locking & the code inside the error handling if with logging were not that trivial pieces of code, so effectively if (shared) lock(); becomes something more like:

if (shared) {
  if (lock == VALID) {
    if (!do_lock(lock)) {
      log_error("lock fail %s:%i %s() -> %p\n", __FILE__, __LINE__ __FUNC__, lock);
    }
  }
}

And similarly for unlock().

Now this code above that may need to lock a shared resource, go hunt down some data, possibly unlock and return it as well as handle errors along the way. Now I was doing some profiling and this little segment of code was eating up 6-7% of our CPU time (a really hot path), and yes there are caches to avoid full lookups (in the "smallish body of code) etc. but still this was way too much overhead.

So I spent time re-organizing it like:

if (shared) {
  lock();
  if (data == INVALID) {
    log_error("blah %s:%i %s() -> %p\n", __FILE__, __LINE__ __FUNC__, data);
    return NULL;
  }
  // smallish body of code hunting through some nested tables using data
  unlock();
} else {
  if (data == INVALID) {
    log_error("blah %s:%i %s() -> %p\n", __FILE__, __LINE__ __FUNC__, data);
    return NULL;
  }
  // smallish body of code hunting through some nested tables using data
}
return realdata;

And yet still it used 6-7%. I stared at the data from perf and then something stood out. Reading the assembly matching the C and and how often code was actually following a code path. Firstly the shared code path would be rare right now, thus a reason to avoid locking and unlocking. Yes locking is EXPENSIVE. A lock + unlock can take between 6ns to 160ns depending on CPU generation and architecture (I've checked the numbers across various x86 CPUs and ARM). The same work without locking is almost 0ns. Locking is not free. Every lock+unlock cycle is a cost and it is not cheap. You have to balance the number of lock+unlocks with the time something stays locked, as well as trying to design lockless algorithms or implementations. I noticed the amount of ASM commands involved in the lock() and unlock() was maybe about 15-20 such instructions. I also noticed the code for if (data == INVALID) { ... } was maybe 20 or so instructions too as it pushed various things on the stack and did a call. Then it dawned on me...

There is a separate instruction cache, and the CPU will be fetching code 1 cacheline at a time (or more in advance, depending), and 1 cacheline generally is 64 bytes... and averaging maybe 4 bytes per instruction ... 16 insturctions is all that will fit in a cacheline... and if we are doing a compare + branch and SKIPPING almost a whole cacheline worth of instructions to jump to another set of instructions which is not cached yet (as we execute a lot of code outside of this hot function, thus at least polluting the L1 cache if not more)... so we're most likely getting a cache miss here every time because compilers STILL like to order code as written. At least thats what perf was displaying with the ASM and the matching C code next to it and how often it saw that line being hit. Of course this is obvious... we all know this, but we always think of our DATA, not our CODE when it comes to cache hits. Code can get a miss too and when it does, the CPU stalls. We just never keep this in mind when writing code. Continually focusing on our data and if it will be nicely cached we forget the poor instructions that also are data our CPU must consume and avoid cache misses for.

So time to bring out the GOTO to handle exceptions that are rare, moving all the "rare case" code to the end of the function, out of the way of the most common path of execution, helping with reducing caceh misses. So code became something like this:

if (!shared) {
  if (data != INITIALIZED) goto doinit_shared;
  doinit_shared_back:
  if (data == INVALID) goto err_invalid;
  // smallish body of code hunting through some nested tables using data
} else {
  lock();
  if (data != INITIALIZED) goto doinit_shared;
  doinit_shared_back:
  if (data == INVALID) goto err_invalid;
  // smallish body of code hunting through some nested tables using data
  unlock();
  return realdata;
}
return realdata;
doinit_shared:
  // a few lines of initting data here
  goto doinit_shared_back;
doinit:
  // a few lines of initting data here
  goto doinit_back;
err_invalid:
  log_error("blah %s:%i %s() -> %p\n", __FILE__, __LINE__ __FUNC__, data);
  return NULL;

And presto. We dropped down to about 2.5% CPU usage. Moving the "common case" of non-shared data to the early code helped. Moving the big fat amount of instructions involved in taking and releasing a lock helped. Moving the "rare case" of data not being valid and the error handling helped. All of this moved rarely executed code out of the hot path and out of the L1 cachelines that were fetched, so the CPU could stall much less.

Remember this. GOTO is not the pure evil everyone keeps telling you it is. It has its use cases. It's an excellent way of isolating code for handling exceptions out of the hot path. Compilers still are not that bright and even if you use things like __builtin_expect() to hint at what the likely or unlikely result will be, it's good to make your code flow more easily by having the reader run through the common cases nicely without having to skip over large sections of unlikely code. They can look into the exceptions later on when needed as there is a goto telling them where to look for the code handling that. This also helps the CPU simply perform better and in this case it led to a 2-3 times speedup. The lesson here is that nothing is totally evil. Everything should be taken with a grain of salt. Of course you should not use GOTO to replace for or while loops. You shouldn't use it for general flow control, But for moving code out of the hot path and handling exceptions, it makes your code more readable AND provides major performance improvements that are worth any percieved ugliness.

GOTO IS AWESOME

P.S. The actual function in question is _eo_obj_pointer_get() here and you'd need a lot of exra backgrouind to really know what this is and why it's here, so I wasn't going to explain all o that background too