Avoiding Gitastrophy
by beetlefeet

At Agworld we use git (and github). No surprises there.

Our source control and release workflow is loosely based off of A successful git branching model

On occasion we run into significant issues, Or as we like to call them ‘gitastrophies’.

Often these problems stem from something simple that could have been avoided. I hope to do a few posts outlining an issue we’ve had and hopefully some means for fixing the error and future prevention.

Problem #1 - rebase followed by pull

This is a relatively minor issue but worth keeping an eye out for. Lets say you’ve been working on a feature branch and made 2 commits. The log looks this:

* 011eba2 2013-04-02 | Fix race condition in multithreading (HEAD, origin/feature/multithreaded, feature/multithreaded
* 7b70092 2013-04-02 | Initial work on multithreading [Jack Casey]
* 32cd3f5 2013-04-02 | Starting point for thimble rb library. [Jack Casey]
* 184961a 2013-04-02 | Initial commit. [Jack Casey]

In the meantime someone has been working and committed to master. And say you’ve pushed your feature branch to the server (for collaboration or CI etc). So you rebase and squash your commits into one and rebase the lot onto the latest state of master while you’re at it. Now the log looks like this:

* cdc265c 2013-04-02 | Added multithreading support. (HEAD, feature/multithreaded) [Jack Casey]
* 7e14b88 2013-04-02 | Renamed this great library to thumble. (origin/master, origin/HEAD, master) [Jason Hutchens]
* 32cd3f5 2013-04-02 | Starting point for thimble rb library. [Jack Casey]
* 184961a 2013-04-02 | Initial commit. [Jack Casey]

So tidy!

Then you want to push your changes (still in the feature branch). So you git pull and git push and All looks good! Buuuut you just merged your old commits with your new commits! Git is ‘clever’ and doesn’t re-apply the same change twice so the codebase comes out the other end looking ok but take a look at your git log:

*   cd4727b 2013-04-02 | Merge branch 'feature/multithreaded' of /Users/jack/dev/trump into feature/multithreaded (HEA
|\
| * 011eba2 2013-04-02 | Fix race condition in multithreading (origin/feature/multithreaded) [Jack Casey]
| * 7b70092 2013-04-02 | Initial work on multithreading [Jack Casey]
* | cdc265c 2013-04-02 | Added multithreading support. [Jack Casey]
* | 7e14b88 2013-04-02 | Renamed this great library to thumble. (origin/master, origin/HEAD, master) [Jason Hutchens]
|/
* 32cd3f5 2013-04-02 | Starting point for thimble rb library. [Jack Casey]
* 184961a 2013-04-02 | Initial commit. [Jack Casey]

Eww…

This isn’t the end of the world but is just plain messy, and if you were rebasing in the first place then presumably your goal was to make the history cleaner.

The way to avoid this is to be careful about merging with the remote tracking branch. Git will actually warn you about this by forcing you to enter a commit message. Just make sure to think about merges. Here we wanted to force push but accidentally pulled. Force push is (apart from an awesome star wars telekinetic power) pretty dangerous and definitely a topic for a future post!

So the prevention is to get out of the habit of just randomly typing git pull and if possible get into the habit of not using git pull much at all, instead use git fetch and git merge separately.

The fix for this is to (hopefully) realise what you’ve done immediately and to git reset —hard to before the bad merge, fixing your local branch before it’s pushed.

In this case the command would be:

git reset --hard cdc265c

Or to at least fix the feature branch remotely before it’s merged upstream. If you’ve pushed something ugly and redundant like that into eg master. I’d recommend just leaving it. C’est la vie. It can be a reminder of a lesson learned.

Rewriting history on a ‘community’ or often used branch (ie master) is almost always a huge no no.

PS:

I regularly use a command I found somewhere on the net called git hist as a variation of git log. It’s wonderful and everyone should use it all the time. I used it for the git log output in this post.

alias.hist=log --pretty=format:"%h %ad | %s%d [%an]" --graph --date=short

PPS:

The “Pro Git” book ( http://git-scm.com/book ) is really useful, please read it if you use git!

The Importance of Integration Benchmarking
by Mindful Mindlessness

Of late and not so late times I’ve been working with a number of my colleagues on a platform for sample analysis. It’s interesting work purely as a matter of the domain, but technically it’s been interesting - or frustrating - as the system needed to translate between a number of external systems. This sort of necessitates some fairly chunky Rosetta Stones and the machinery required to swap in the correct one.

To put this into more practical terms we translate sample results from any number of labs to our internal representation that we can then use to run mathematical models from any number of external modelling systems. Aside from basic translations like this there are also translations of model definitions that use various languages of their own. We also allow users to update a result on a sample in our system and then send these updates to the lab. 

We went with the pattern of using JSON to represent essentially everything in the system that would require translation. We wrote a library that has a number of public API endpoints and contains strategies for each modelling system we want to talk to. Some use REST, some don’t, some use JSON externally, some don’t and we can handle all of this by converting to and from a standard set of objects and then to and from JSON for the public API.

So far this is all pretty standard stuff; except it was so slow.

After some searching we noticed that the JSON conversions for some calls seemed to be taking quite a long time. We though it might be the library we were using but our testing harness didn’t show any outstanding slow down. However what we did notice was that garbage collection was running whilst the JSON was being generated.

In the testing harness with a much smaller heap the GC runs were adding negligible time to allocate more. However when running in the context of a  our application with a much larger heap we were losing multiple seconds in GC. 

In reality we’d made a decision here that JSON was what we were always going to use but we were using a library that enabled us to do a whole lot more. Swapping this out for a more lightweight - and arguably more readable - option reduced our GC runs to less than one from the original six or more. Most calls are now at least and order of magnitude quicker.

Long story to get to the point, but I think it’s important to make sure you’ve got the entire context before making your decision. If you’re benchmarking something then take into account how it’s operating within your entire stack.

Sangaku: Finding the Pole of Inaccessibility
by Jason Hutchens

Here at Agworld we deal a lot with Paddock boundaries. One of the things we’d like to know is where best to place an icon indicating where the paddock is, given its boundary. This isn’t a trivial problem. Consider a paddock in the shape of the letter ‘L’. The naïve approach of putting the icon in the centre of the bounding box obviously will break down badly in this case.

A better solution might be to find the centroid of the polygon that defines the paddock boundary. But the algorithms that do that are complicated, needing work-arounds for concave polygons and breaking down altogether for self-intersecting polygons. Plus, the centroid may often be rather close to a vertex, which is less than ideal.

What we really want to find is the centre of the blobbiest bit of the paddock. Basically the place that a human being would point to and say “put the icon there”. For every conceivable shape. Automagically.

Turns out that the best solution would be to find the so-called “Pole of Inaccessibility” for the paddock. That’s defined (by Wikipedia) as “the most distant point from the coastline”. I first heard about it from Stephen Fry on his QI programme, but have since done some research to figure out how to estimate a solution via the following iterative technique. Because it’s an NP-Hard problem, this technique isn’t guaranteed to find the maxima, but it does a pretty fast approximation.

The Algorithm

Step 1: Place an AABB Around the Polygon

This is easy; you just build up an AABB (which stands for Axis-Aligned Bounding Box) by looking at all of the points in the polygon. You then make it 10% larger, to avoid edge-cases. Note that here we’re also showing the initial solution of placing the icon in the centre of the bounding box.

Step 2: Place 11 Horizontal and Vertical Lines

Now we superimpose a grid of 11 horizontal and 11 vertical lines over the AABB. These mark the places where we’ll be looking for intersections between the grid lines and the polygon. Again, this is easy to do, as each line has it’s X or Y co-ordinate fixed. It is therefore quite trivial, and fast, to test each line segment in the polygon for intersection with each of the 22 grid lines.

Step 3: Find All Internal Line Segments

Consider a single grid line, horizontal or vertical. In the previous step, we marked the grid line whenever it intersected the polygon. It’s plain to see that, when you move along the grid line, the first mark denotes a transition from outside to inside, the second from inside to outside, and so on. It’s therefore very quick to find a bunch of horizontal and vertical line segments that are internal to the polygon. We can also quickly calculate the mid-point for each of these.

Step 4: Score Each Line Mid-Point

Now we iterate through each of these line segments and calculate their fitness, using some criterion. Here I’m trying to find the centre of the largest internal circle, so the fitness function is based on that. If you wanted to find the best place to display a text label, which is much wider than it is high, then you’d vary the fitness function.

Step 6: Rinse and Repeat

By this stage we’ve found our first candidate solution. We can improve that by shrinking the AABB to 70% of its size, centring it on the candidate solution, and going back to step 2. We use some kind of stopping criterion to break out of this loop. Perhaps when the AABB is very small, or when the candidate solution only changes by some small epsilon value, or whatever.

The Implementation

We’ve published two Ruby gems that implement this algorithm. The first, Sangaku, is a simple, generic 2D geometry library. Nothing special, but lightweight and suited to the problem. The second, Sangaku Eyeball, uses the Gosu 2D games library to allow you to draw polygons and then find the Pole of Inaccessibility for them. It was used to generate the screenshots above. Try them out!

The performance of .to_json in Rails sucks and there’s nothing you can do about it
by Jason Hutchens

As the application we’re working evolves away from a monolithic Rails app and towards a loosely-coupled collection of services, we find that we’re dealing more and more with large JSON blobs.

Recently it became apparent that the performance of our JSON wrangling wasn’t up to snuff. Which was weird, because I proved to myself quite early on that we were most definitely using the json/ext variant of the json gem, which is implemented in C, and should be plenty fast enough.

Just to make sure, I fired up a Rails console and checked. Yep, json/ext was in play. So I then ran an experiment to make sure that it was as fast as I supposed, by writing a simple test:

#!/usr/bin/env ruby

require 'benchmark'

def profile_json(num = 10)
  data = Hash.new
  key = 'aaa'
  1000.times { data[key.succ!] = data.keys }
  times = num.times.map do
    1000 * Benchmark.realtime { data.to_json }
  end
  times.reduce(:+)/num.to_f
end

require 'json/pure'
puts "'json/pure' -> #{profile_json} ms"

require 'json/ext'
puts "'json/ext' -> #{profile_json} ms"

Running this from the command line, outside of rails, convinced me of the speed advantage:

'json/pure' -> 1651.64189338684 ms
'json/ext' -> 55.1620960235596 ms

We should have be getting reasonable JSON performance in Rails. But we weren’t. To investigate further, I ran the same test with rails runner instead (which is pretty much the equivalent of copy-pasting it into the rails console; it basically fires up Rails before executing your code). This is what I got:

'json/pure' -> 1912.54553794861 ms
'json/ext' -> 1919.99847888947 ms

Whaaaa? Abysmal performance. And changing the test to use yajl/json_gem made no difference to performance when the test was performed within our Rails app.

A bit of investigation revealed the culprit. In 2010, in order to fix a low priority bug (that ActiveModel::Errors#to_json generated invalid JSON), the following code was introduced to active_support:

# Hack to load json gem first so we can overwrite its to_json.
begin
  require 'json'
rescue LoadError
end

# The JSON gem adds a few modules to Ruby core classes containing :to_json definition, overwriting
# their default behavior. That said, we need to define the basic to_json method in all of them,
# otherwise they will always use to_json gem implementation, which is backwards incompatible in
# several cases (for instance, the JSON implementation for Hash does not work) with inheritance
# and consequently classes as ActiveSupport::OrderedHash cannot be serialized to json.
[Object, Array, FalseClass, Float, Hash, Integer, NilClass, String, TrueClass].each do |klass|
  klass.class_eval do
    # Dumps object in JSON (JavaScript Object Notation). See www.json.org for more info.
    def to_json(options = nil)
      ActiveSupport::JSON.encode(self, options)
    end
  end
end

Mmmm-mmmm, I love me my hacks.

Here, the json gem is shanghaid into loading, even if you are deliberately requiring it after rails when trying to figure out how to make your .to_json performance not suck. Then, the to_json method is patched for all common classes, including Array and Hash, effectively neutering it.

You’ll find this code in active_support/core_ext/object/to_json. If you do anything in your app to cause that file to be parsed (such as, say, using the authlogic gem, like we do, or, say, using active_record) then your JSON performance will be irrevocably poor if you use .to_json on objects to JSONify them.

Yes, I know you can generate JSON in other ways that wouldn’t fall foul of this hackery. But we use .to_json a lot, and it’s not just us; many of the gems that we use also use .to_json. Indeed, there are 40,000 public repositories on GitHub that use .to_json in Ruby code. It’s popular. And gems or frameworks shouldn’t neuter these kinds of things if they can at all help it.

I reported a bug (https://github.com/rails/rails/issues/9212), although I’m pretty sure it’ll be a hard fight to get any kind of fix through. I reckon the original bug (https://rails.lighthouseapp.com/projects/8994/tickets/4890) should have a more specific fix that doesn’t nix JSON performance for everyone. If that’s not possible, then the existing fix should be made optional, with a flag in configuration for disabling it. We’ll see how it goes.

In the meantime we’ve monkeypatched around the problem by applying a similar change as the one in active_support, but this time using multi_json and oj (which is the fastest and the most rails-compatible JSON gem I could find) in concert.

And we’re happy to report that, in initial tests, we’re seeing JSON views render in 8ms instead of 500ms. Good times!

Infinity and JSON
by Jason Hutchens

It seems that many JSON parsers/generators aren’t idempotent when it comes to Infinity.

We had a problem recently where a floating point result from a .NET server was returned in a JSON string like this:

{"name": Infinity}

That ain’t valid JSON, according to RFC 4627. Here’s what happened when we try to parse it in Ruby:

>> require 'json'
=> true
>> result = '{"name": Infinity}'
=> "{\"name\": Infinity}"
>> JSON.parse(result)
JSON::ParserError: 216: unexpected token at ' Infinity}'

Luckily, we can work-around this problem as follows:

>> result.gsub!(/Infinity/, '1e1000')
=> "{\"name\": 1e1000}"
>> data = JSON.parse(result)
=> {"name"=>Infinity}

Great! So now we can do manipulations on the Ruby hash. But, later, if we render that Hash as JSON, then we’ll get problems.

>> data.to_json
JSON::GeneratorError: 775: Infinity not allowed in JSON

What happens on the JavaScript side? Much the same thing, it turns out:

> result = '{"name": Infinity}'
  "{"name": Infinity}"
> JSON.parse(result)
  SyntaxError: Unexpected token I

Even better, the solution is exactly the same:

> result = result.replace(/Infinity/g,"1e1000")
  "{"name": 1e1000}"
> data = JSON.parse(result)
  v Object
    result: Infinity
    > __proto__: Object

JavaScript does comply with RFC 4627, however, unlike the JSON libraries that we use in Ruby (and presumably unlike whatever the .NET service was doing):

> JSON.stringify(data)
  "{"name":null}"

But the best solution, I think, would be to use the ‘1e1000’ hack to preserve the value of ‘Infinity’ from end-to-end.

Ruby’s Inspect Considered Harmful
by Jason Hutchens

If you’ve ever worked with C++ professionally then you’ve most likely read Scott Meyers’ wonderful Effective C++, a collection of specific ways of improving the code you write.

Consider, for example, Item 27: Minimise Casting. Lots of in-depth detail, followed by a few rules to follow when writing new code, or reviewing code written by your colleagues. I’ve worked at more than one place that derived their coding standards, and company hairdo, directly from Scott’s book.

I’ve often thought that there should be a similar collection of wisdom for Ruby. If there was, I’m sure it’d begin like this:

Item 1: Implement Inspect for all Non-Trivial Classes

All Ruby classes derive from Object, which provides to_s and inspect. Both return a string representation of the object instance. General wisdom is that to_s is for displaying the instance to the user, whereas inspect is for displaying the instance to the developer. After all, puts calls to_s, and irb calls inspect to display the return value of issued commands.

If you don’t implement to_s yourself, the default implementation will be used, which simply prints the class name and the address of the instance:

VALUE
rb_any_to_s(VALUE obj)
{
    const char *cname = rb_obj_classname(obj);
    VALUE str;

    str = rb_sprintf("#<%s:%p>", cname, (void*)obj);
    OBJ_INFECT(str, obj);

    return str;
}

For example:

$ irb
>> Object.new.to_s
=> "#<Object:0x104a4b898>"

If you do implement to_s, it’s probably because you want to format the instance for display to the end user. But, in general, you probably don’t normally implement to_s, do you? You normally build up the string representation of the object manually, at the point where it’s needed. Right?

What’s more interesting is the default version of inspect. It uses the default version of to_s if there aren’t any instance variables. If there are, however, then it uses the class name and the address of the instance followed by a generated string:

static VALUE
rb_obj_inspect(VALUE obj)
{
    extern int rb_obj_basic_to_s_p(VALUE);

    if (TYPE(obj) == T_OBJECT && rb_obj_basic_to_s_p(obj)) {
        int has_ivar = 0;
        VALUE *ptr = ROBJECT_IVPTR(obj);
        long len = ROBJECT_NUMIV(obj);
        long i;

        for (i = 0; i < len; i++) {
            if (ptr[i] != Qundef) {
                has_ivar = 1;
                break;
            }
        }

        if (has_ivar) {
            VALUE str;
            const char *c = rb_obj_classname(obj);

            str = rb_sprintf("-<%s:%p", c, (void*)obj);
            return rb_exec_recursive(inspect_obj, obj, str);
        }
        return rb_any_to_s(obj);
    }
    return rb_funcall(obj, rb_intern("to_s"), 0, 0);
}

The generated string results from calling inspect on all of the instance variables recursively, with care taken to short-circuit the recursion to prevent infinite loops by returning an ellipses wherever an object that’s already part of the string being built is detected:

inspect_obj(VALUE obj, VALUE str, int recur)
{
    if (recur) {
  rb_str_cat2(str, " ...");
    }
    else {
  rb_ivar_foreach(obj, inspect_i, str);
    }
    rb_str_cat2(str, ">");
    RSTRING_PTR(str)[0] = '#';
    OBJ_INFECT(str, obj);

    return str;
}

Astute readers will note that I skipped a third case: if a custom version of to_s has been defined, then inspect will use that instead. I glossed over that because this implementation is peculiar to 1.9.x, and differs from the 1.8.x behaviour which, according to Matz in this Ruby issue, is due to return in 2.x and beyond.

In either case, you can see how it works in this example (note the presence of @bar in the stringified return value of foo, the last command issued):

$ irb
>> class Foo; attr_accessor :bar; end
=> nil
>> foo = Foo.new ; foo.bar = 5 ; foo
=> #<Foo:0x1076fb398 @bar=5>

The built-in implementation of inspect may all seem well and good, and useful for debugging. It is handy to see the instance variables. But there’s a lot of custom C code in the Ruby interpreter to support it (I’ve omitted quite a bit), and you’d think it would be easy to leave inspect out of the core language altogether, given how easy it is to build a clone in pure Ruby. In fact, that’s precisely what awesome_print does.

And what about really complex classes? If the string returned by inspect is many screens long, then it’s not really all that useful for debugging, is it?

All good points. But there’s another, insidious drawback to the default inspect that leads to severe performance issues. Both NameError and NoMethodError call inspect when constructing their exception message, and there’s simply no way to monkeypatch around that. Observe:

static VALUE
name_err_mesg_to_str(VALUE obj)
{
    VALUE *ptr, mesg;
    TypedData_Get_Struct(obj, VALUE, &name_err_mesg_data_type, ptr);

    mesg = ptr[0];
    if (NIL_P(mesg)) return Qnil;
    else {
  const char *desc = 0;
  VALUE d = 0, args[NAME_ERR_MESG_COUNT];

  obj = ptr[1];
  switch (TYPE(obj)) {
    case T_NIL:
      desc = "nil";
      break;
    case T_TRUE:
      desc = "true";
      break;
    case T_FALSE:
      desc = "false";
      break;
    default:
      d = rb_protect(rb_inspect, obj, 0);
      if (NIL_P(d) || RSTRING_LEN(d) > 65) {
    d = rb_any_to_s(obj);
      }
      desc = RSTRING_PTR(d);
      break;
  }
  if (desc && desc[0] != '#') {
      d = d ? rb_str_dup(d) : rb_str_new2(desc);
      rb_str_cat2(d, ":");
      rb_str_cat2(d, rb_obj_classname(obj));
  }
  args[0] = mesg;
  args[1] = ptr[2];
  args[2] = d;
  mesg = rb_f_sprintf(NAME_ERR_MESG_COUNT, args);
    }
    if (OBJ_TAINTED(obj)) OBJ_TAINT(mesg);
    return mesg;
}

Now, that’s an awful lot of C, but the sinister lines are the following two, from around the middle of the previous code sample:

      d = rb_protect(rb_inspect, obj, 0);
      if (NIL_P(d) || RSTRING_LEN(d) > 65) {

This calls inspect on the object and THROWS AWAY THE RESULT IF IT’S LONGER THAN SIXTY-FIVE CHARACTERS SORRY FOR SHOUTING!

Now, seriously, OMGWTF. You are going to hit that 65-character limit really easily, with the simplest class that contains a couple of instance variables. And what happens when inspect exceeds 65 characters? Well, in that case, NameError and NoMethodError just fall back to using the default implementation of to_s.

Sheesh.

Exception in View Freezes Rails

Which brings us around to the insidious bug that I investigated when I first joined Agworld.

Every so often, in development and staging and production, our server would grind to a halt, chewing up CPU and allocating memory by the bucketloads. The only solution was to hard kill the process. We found a bug report that seemed relevant. and started contributing (although at the time the bug title was something that suggested WEBrick and Sprockets as the culprit, so it wasn’t as obvious as it is nowadays).

That bug was very active, and it was clear that other developers working on large-scale Ruby apps were suffering. After a few late nights, we tracked it down to NameError and NoMethodError, together with the fact that calling inspect on core Rails objects resulted in very long strings being built, especially if you’re running a large app with lots of routes and classes cached in memory.

Literally the following was happening: A hapless developer would make a typo in a view. They’d visit the erroneous page in their browser. Ruby would throw a NameError. Something up the call-chain would access the message accessor of the exception object. Ruby would call inspect on the object that threw the exception. Half an hour would pass as a massive string was constructed, as practically every object in the Ruby heap was stringified and concatenated. Ruby would calculate the length of the resulting string, determine that 193852374 or whatever is a bit more than 65, throw away the string that it had spent the last 30 minutes constructing, and finally call to_s on the object instead.

Insanity.

We monkeypatched Rails to prevent all of this from happening, as follows:

module ActionDispatch
  module Routing
    class RouteSet
      alias inspect to_s
    end
  end
end

That is, we found that ActionDispatch::Routing::RouteSet was the gateway to the explosive inspect, and nipped things in the bud by redefining inspect at that point. The result for hapless developer was an instant, useful error message and backtrace.

The fact that it took 7 months for this fix to finally make it into Rails 3.2.3 stable (and yet be mysteriously absent from the changelog) is another story.

Overall, though, the wisdom is this: you probably don’t want to use the built-in inspect. You probably should be using awesome_print when debugging (and you can even wire up irb to use it). Really long inspect strings aren’t that useful in log files. So why not patch Object::inspect with something more useful, or at least implement inspect on non-trivial classes by hand?

Even better would be if the default version of inspect imposed an upper-limit on the length of the generated string, or accepted parameters for maximum string length and recursion depth. And if it was never used internally by the interpreter at all; I really can’t fathom the wisdom of name_err_mesg_to_str preferring inspect to to_s at all.

Item 2: Prefer « to + When Concatenating Strings

Nah, I kid, I’m no Scott Meyers. But, seriously, you should hardly ever be using + for string concatenation. Think of the heap!

We dev for iOS here :)

Distinctly Uncountable
by Jason Hutchens

We use the wonderful Scout service to keep track of the performance of our servers. Alert notifications get pushed into Lighthouse for triage, as we need to make sure that all performance bottlenecks are dealt with, even if that means just changing Scout’s alert thresholds to minimise false positives.

Today I started investigating this one…

The memory usage of one of our passenger processes on staging spiked badly. We use Oink to annotate our rails log with information about memory usage and object allocations, and that allowed me to quickly track down the culprit.

... staging rails[8744]: Started GET "/widgets ...

A quick look at the code for WidgetsController::index shed no light on the issue. After all, it should only have been rendering a paginated list of widgets. That’s never been expensive before.

Object Allocations

More digging revealed that an admin user had made the request. Admins have access to all widgets in the system. Even so, retrieving the first 10 ActiveRecord objects for rendering should be cheap. What gives?

Oink was once again helpful:

Instantiation Breakdown: Total: 250099 | Widgets: 250000...

Erm… so we instantiated 250K ActiveRecord objects for Widget alone? When we should only have asked for 10? Examination of the log showed one query to retrieve the 10 widgets, followed by another to retrieve them all.

Complex Queries

A bit of debugging soon revealed the culprit. Kaminari, the “clean, powerful, customizable and sophisticated paginator for Rails 3”. It turns out that in late 2011 bug #160 had been closed by including a patch that uses “length” instead of “count” to calculate the number of records in the list being paginated.

I breathed a sigh of relief when I read that the patch only applied to “complex queries”. I then inhaled a gasp of despair when I looked at the code:

    uses_distinct_sql_statement = c.to_sql =~ /DISTINCT/i
    if uses_distinct_sql_statement
      c.length
    else
      c = c.count
      c.respond_to?(:count) ? c.count : c
    end

That’s right, dear reader. A “complex query”, according to Kaminari, is one that contains the word “distinct” somewhere in the SQL. Bah!

Count vs. Length

As you should already be aware, count will ask the database to do the heavy lifting for us. It’s nice and fast and cheap. On the other hand, length will cause the query to be run as normal, building up an array of ActiveRecord objects, and then calculating the length of this array. A middle ground not explored by the Kaminari patch would be to run the full query without instantiating ActiveRecord objects (which you really should avoid doing in lots of cases, because it’s ecksy, which means “expensive” in Australian).

The performance of all three approaches can be measured with this code (which requires the use of REE, for GC statistics):

def measure
  GC.clear_stats 
  time = Benchmark.realtime { puts "ANSWER = #{yield}" }
  usage = GC.allocated_size / ( 1024.0 ** 2 )
  puts "(allocated %.2fMB and took %.2fs)" % [usage, time]
end

GC.enable_stats

# Replace this with whichever query you want to measure
query = Widget.where(:sprocket_id => [13, 42, 137, 65538])

measure { query.count }

measure { ActiveRecord::Base.connection.select_values(query.to_sql).length }

measure { query.length }

Loading that into the Rails Console gives the following results for us:

> load '~/profile_length.rb'
ANSWER = 35958
(allocated 0.02MB and took 0.06s)
ANSWER = 35958
(allocated 20.40MB and took 1.58s)
ANSWER = 35958
(allocated 164.51MB and took 7.64s)

That is, using .count takes hardly any time and memory, the halfway house of executing the raw SQL is painfully slow, while what Kaminari is doing is downright alarming. All this just to render the paginator. Eek.

Delayed Discovery

If this patch went in six months ago, why are we only just discovering it now? Well, we’d previously been running our own fork of Kaminari, and that was made prior to the patch landing. We’ve only very recently cleaned out these stale forks from our Gemfile.

And, although it may seem strange that we’re the first ones to notice this performance problem, it ain’t the first time it’s happened to us, as you’ll find out in future posts to this blog.

Wormy Rabbit Hole

Unfortunately, we seem to have opened up a can of worms with this one. While going down a rabbit hole. The patch was submitted to Kaminari as a work-around to a bug in ActiveRecord; very possibly bug #5554. Or bug #1003.

We’ve done some investigation with a virginal Rails install, which you can clone on GitHub. When you do that, run the tests. These are the ones that fail:

test "we can count distinct posts" do
  query = Post.select('DISTINCT ON(number) *')
  count = query.count
  length = query.length
  assert count == length
end
test "we can sum distinct posts" do
  query = Post.select('DISTINCT ON(number) *')
  sum = query.sum(:number)
  reduce = query.reduce(0) { |s, p| s + p.number }
  assert sum == reduce
end

We create 1k posts for testing, and each post has a number that is generated via rand(1..10).to_i. The query therefore only selects 10 posts. The query generated to do the select looks like this:

SELECT DISTINCT ON(number) * FROM "posts"

Unfortunately, the query generated to do the count looks like this:

SELECT COUNT(*) FROM "posts"

And the query generated to do the sum looks like this:

SELECT SUM("posts"."number") AS sum_id FROM "posts"

Wonky pagination is one thing, but I shudder to think that there may be Rails users out there generating mission-critical reports using distinct queries and sum.