Video

Want to see the full-length video right now for free?

Sign In with GitHub for Free Access

Notes

To round off our discussion of Advanced ActiveRecord querying, we're going to discuss aggregations.

We'll be doing some counts, minimums, maximums, averages, and sums; how to perform these aggregations after grouping rows according to certain criteria; and how to filter based on the results of these calculations, all via the database (rather than doing it ourself with Ruby).

Domain Model

Our models will be the same as in the previous lesson, but we're going to add a salary column on the people table:

class Person < ActiveRecord::Base
  belongs_to :manager, class_name: "Person", foreign_key: :manager_id
  has_many :employees, class_name: "Person", foreign_key: :manager_id
  belongs_to :location
  belongs_to :role
end
People.all
id name role_id location_id manager_id salary
1 Eve 2 2 NULL 50000
2 Bill 2 1 NULL 40000
3 Wendell 1 1 1 35000
4 Christie 1 1 1 30000
5 Sandy 1 3 2 45000

The count method

Let's start out really simple: count the number of people.

Person.count
SELECT COUNT(*)
FROM "people";
count
5

The average method

Slightly more complicated: find the average salary across all people.

Person.average(:salary)
SELECT AVG(salary)
FROM "people";
avg
40000.000000000000

Combining aggregations with other queries

You can also combine these methods with the other queries that we've been talking about:

Person.
  joins(:role).
  where(roles: { billable: false }).
  sum(:salary)
SELECT SUM(people.salary)
FROM "people"
INNER JOIN "roles"
  ON "roles"."id" = "people"."role_id"
WHERE "roles"."billable" = 'f';
sum
90000

Aggregate by category with the group method

Let's next find the average salary by role. We often see this kind of work done in pure Ruby, which wastes database hits and a lot of memory.

Instead, a neat thing you can do if you want the count or average of a bunch of things but broken down by category, is use ActiveRecord's group method:

Person.
  joins(:role).
  group("roles.name").
  average(:salary)

which generates SQL like this:

SELECT roles.name, AVG(people.salary)
FROM "people"
INNER JOIN "roles"
  ON "roles.id" = "people"."role_id"
GROUP BY roles.name;

and retrieves data like this:

name avg
Manager 45000.000000000000
Developer 36666.666666666667

The database does all the work for us in one query.

How ActiveRecord aliases tables

Next, let's count employees for each person:

Person.
  joins(:employees).
  group("people.name").
  count("employees_people.id")

which generates SQL like this:

SELECT people.name, COUNT(employees_people.id)
FROM "people"
INNER JOIN "people" "employees_people"
  ON "employees_people"."manager_id" = "people"."id"
GROUP BY people.name;
name count
Eve 2
Bill 1

Wait, count employees_people.id? Where did that come from? Well, since the :employees association is not named conventionally (i.e., it's not named after the model), ActiveRecord makes up an alias for the table that it can guarantee won't be ambiguous.

This naming is consistent, so you can rely on it without worrying too much that it will break your queries some day.

Aggregation return values are terminal

It's important to realize that the return value when you use group is a hash, with the keys being the unique values you are grouping by, and the values are the aggregation.

count, sum, average, etc just returned scalar values.

In either case, once you go down the road of aggregating, you can no longer continue to chain further relation methods on, since you've left ActiveRecord land.

Count employees for each person

However, we notice that in our last result, we don't have entries for people who have no employees. Why is that? Well, [it's the same issue we've had previously][] -- inner joins don't return rows that have no matches in the joined set.

Our solution will be the same, a custom left join:

Person.
  joins("LEFT JOIN people employees ON employees.manager_id = people.id").
  group("people.name").
  count("employees.id")

which generates SQL like this:

SELECT people.name, COUNT(employees.id)
FROM "people"
LEFT JOIN people employees
  ON employees.manager_id = people.id GROUP BY people.name;

and retrieves the data that we want:

name count
Christie 0
Sandy 0
Wendell 0
Eve 2
Bill 1

This also has the benefit of allowing us to choose our own name for the relation, rather than relying on ActiveRecord's conventional alias.

People with lower than average salaries at their location

Let's get a little crazy and find people with lower than average salaries at their location:

Person.
  joins(
    "INNER JOIN (" +
      Person.
        select("location_id, AVG(salary) as average").
        group("location_id").
        to_sql
    ") salaries " \
    "ON salaries.location_id = people.location_id"
  ).
  where("people.salary < salaries.average")

Tricky! Let's jump into the middle of this query first,

Person.
  select("location_id, AVG(salary) as average").
  group("location_id").
  to_sql

This is essentially the same thing as Person.group("location_id").average(:salary), but this way we can call to_sql on it (since it is still a relation, rather than the scalar result).

We now have a virtual table of the average salary at each location, aliased as salaries, which we join on to the people table (using their location_id to match up rows).

Finally, with the new columns available, we can use a simple where clause to filter our results down. The resulting SQL looks like this:

SELECT "people".*
FROM "people"
INNER JOIN (
  SELECT location_id, AVG(salary) as average
  FROM "people"
  GROUP BY "people"."location_id"
) salaries
  ON salaries.location_id = people.location_id
WHERE (people.salary < salaries.average)

and retrieves data like this:

people salaries
id name role_id location_id manager_id salary location_id average
4 Christie 1 1 1 30000 1 35000.000000000000

Highest salaried people ordered by name

Finally, let's find the highest salaried people, ordered by name. It would be relatively straightforward to find the highest salaried people, ordered by salary:

Person.order("salary DESC").limit(3)

but if we try to order that by name, we'll be in trouble; and then it's tempting to fall back into Ruby and sort. But again, we want the database to do as much work as possible.

So, we can use a window function:

Person.
  joins(
    "INNER JOIN (" +
      Person.
        select("id, rank() OVER (ORDER BY salary DESC)").
        to_sql
    ") salaries " \
    "ON salaries.id = people.id"
  ).
  where("salaries.rank <= 3").
  order(:name)

Again, we're creating a virtual table that has a person's ID and rank (by salary), and then joining that onto the people table. We can then filter and order as usual. This generates the following SQL:

SELECT "people".*
FROM "people"
INNER JOIN (
  SELECT id, rank() OVER (ORDER BY salary DESC)
  FROM "people"
) salaries
  ON salaries.id = people.id
WHERE (salaries.rank <= 3)
ORDER BY "people"."name" ASC;

and retrieves the data we want:

people salaries
id name role_id location_id manager_id salary id rank
2 Bill 2 1 NULL 40000 2 3
1 Eve 2 3 NULL 50000 1 1
5 Sandy 1 3 2 45000 5 2

Success!

Conclusion

So, it turns out that for almost every common use case where you need to query your data, ActiveRecord will either have a built-in method, or it will give you a way to plug in your own SQL.

We have a final exercise for you to get a taste of aggregations, but then that's it! We hope you've enjoyed learning these advanced techniques for querying your ActiveRecord models, and we'll see you in the forums!