I wrote Octoshark a while ago but never really put any words about it on my blog. Now that I have been using it successfully in production for about a year handling millions of API requests per day and sending millions of emails per day for both GoDaddy Email Marketing and Mad Mimi, let’s do that.

Octoshark

What is Octoshark?

Octoshark is a simple ActiveRecord connection manager. It provides connection switching mechanisms that can be used in various scenarios like master-slave, sharding or multi-tenant architecture.

There are many other gems that solve the problem of accessing data from different databases and develop more complex features for the architecture they are designing for. Just to mention few: Octopus, Apartment, db-charmer, makara, but there are many many more.

I played with some of them and took the architecture of the system I was designing from monolithic, through sharding to multi-tenant. Octoshark was born on that journey as the minimal building block that I needed to migrate the system from one architecture to another.

Octoshark was written with the following goals in mind:

  • be simple and do one thing - connection switching
  • be easy to maintain and upgrade with Rails versions
  • patch ActiveRecord as little as possible

Octoshark Connection Managers

Octoshark has two connection managers: ConnectionPoolsManager for managing connection pools using persistent connections and ConnectionManager for managing non-persistent connections. It depends on the application performance and scaling requirements which one to use. Both can be combined and multiple instances of each manager can be used at the same time.

If the number of consumers (application and worker servers) is somewhat limited, ConnectionPoolsManager would be the preferred option. Standard Rails application has a single connection pool per ActiveRecord class and connection spec, and ConnectionPoolsManager makes it possible to use multiple connection pools.

For big infrastructures with lots of consumers where max connections limit on database servers is reached and horizontal scale is a need, ConnectionManager is the option to use. Because it uses non-persistent connections it comes up with a performance penalty for re-establishing connections over and over again. Some ActiveRecord plugins that depend on having an active database connection at bootup time will need a change in order to work with the non-persistent connections.

How to use Octoshark

Create a connection pools manager:

CONN_MANAGER = Octoshark::ConnectionPoolsManager.new({
  c1: config1,
  c2: config2
})

c1 and c2 are identifiers for the pools and config1 and config2 are standard ActiveRecord database configs like:

config = {
  adapter:  'mysql2',
  host:     'localhost',
  port:     3306,
  database: 'database',
  username: 'root',
  password: 'pass',
  pool:     3,
  encoding: 'utf8',
  reconnect: false
}

To switch a connection using a specific pool:

CONN_MANAGER.with_connection(:c1) do |connection|
  connection.execute("SELECT 1")
end

Multiple with_connection blocks can be nested:

CONN_MANAGER.with_connection(:c1) do
  # run queries on connection specified with :c1

  CONN_MANAGER.with_connection(:c2) do
    # run queries on connection specified with :c2
  end

  # run queries on connection specified with :c1
end

When using a single connection pool per database server, database connections can be switched by using the second and optional argument of with_connection method. This option is MySQL specific for now and it uses the USE database_name statement to switch the connection.

CONN_MANAGER.with_connection(:c1, database_name) do |connection|
  connection.execute("SELECT 1")
end

Using non-persistent connections with Octoshark::ConnectionManager has the same API:

CONN_MANAGER = Octoshark::ConnectionManager.new

Opening a new connection, executing query and closing a connection:

CONN_MANAGER.with_connection(config) do |connection|
  connection.execute("SELECT 1")
end

Using Octoshark with ActiveRecord models

ActiveRecord model can use Octoshark connection if we override the Model.connection method.

class Post < ActiveRecord::Base
  def self.connection
    CONN_MANAGER.current_connection
  end
end

Or, to change the connection of many models, define a module and include it in models.

module ShardingModel
  extend ActiveSupport::Concern

  module ClassMethods
    def connection
      CONN_MANAGER.current_connection
    end
  end
end

To use a specific database connection for a model, just open a with_connection block:

CONN_MANAGER.with_connection(:c1) do
  # run queries on c1
  Post.first
end

CONN_MANAGER.current_connection returns the active connection while the execution is in the with_connection block or raises Octoshark::Error::NoCurrentConnection outside of the with_connection block. In some designs, falling back to the default database connection in the application might be preferable and that can be done using CONN_MANAGER.current_or_default_connection.

Connection switching is usually done on the entry-point of the application. In Rails applications that is from within an around_filter for controllers and similarly for background jobs:

around_filter :select_connection

def select_connection(&block)
  CONN_MANAGER.with_connection(current_user.connection, &block)
end

Conclusion

Scaling Rails horizontally and building sharding or multi-tenant architecture is an exciting project. It involves reuse and customization of some ActiveRecord components. I wish ActiveRecord connection management with Rails was done in a way to allow for extention or replacement instead of the shutgun integration like using ActiveRecord::Base.clear_active_connections! to return current thread connections back to the pool after each request.

With that in mind, I think it’s a matter of how much of ActiveRecord we patch and how carefully we reuse its building blocks. Everything comes up with a maintenance cost, try optimize by minimizing it.