Jekyll2023-12-26T15:54:05+00:00https://dalibornasevic.com/posts.rssDalibor NasevicSenior Principal Engineer at GoDaddy, specializing in Ruby, AWS and Email DeliveryImplementing Application Layer Encryption in Ruby on Rails applications with Asherah2023-05-23T07:00:00+00:002023-05-23T07:00:00+00:00https://dalibornasevic.com/posts/rails-application-layer-encryption-asherah<p><em>This blog post was originally published on the <a href="https://www.godaddy.com/engineering/2023/05/23/application-layer-encryption-in-ruby-on-rails-with-asherah/">GoDaddy Engineering Blog</a>.</em></p>
<p style="text-align: center">
<img src="/images/rails-application-layer-encryption-asherah/cover.jpg" alt="Blue, red and black padlocks" />
</p>
<p>The public cloud revolutionized the way we store and access data, but it also introduced new security challenges. This is because it involves sharing resources and infrastructure with multiple users, creating a risk of unauthorized access and data breaches. When we migrate our web services to the public cloud, in addition to storage layer data encryption and end-to-end encryption in transit, we implement application-layer encryption to protect customer-sensitive data like Personally Identifiable Information (PII). This article explores how the <a href="https://github.com/godaddy/asherah-ruby">Asherah</a> Application Encryption SDK works and how we encrypt PII data in our Ruby on Rails applications.</p>
<h2 id="what-is-application-layer-encryption-and-why-do-we-need-it">What is Application Layer Encryption and why do we need it?</h2>
<p>Application Layer Encryption is the process of encrypting data by the application that received or generated the data. The data is encrypted before it is transported over a network or saved to a database, restricting access to the data only within the application’s memory space. It differs from storage layer encryption, which can protect the data stored in a database when the server is powered off or the storage media is stolen. However, when the database server is running and authorized users or applications access the data, encryption at the storage layer is not sufficient to protect the data.</p>
<h2 id="what-is-asherah-and-how-does-it-work">What is Asherah and how does it work?</h2>
<p>Asherah is an <a href="https://github.com/godaddy/asherah">application-layer encryption SDK</a> developed by GoDaddy that uses envelope encryption and has a hierarchical data encryption model. At the top of the hierarchy, the master key is managed by a Hardware Security Module (HSM) or <a href="https://github.com/godaddy/asherah/blob/master/docs/KeyManagementService.md#aws-kms">Key Management Service (KMS)</a>. Below that, there are system and intermediate keys. At the lowest level, there are data row records that represent the individual encrypted rows.</p>
<p><img src="/images/rails-application-layer-encryption-asherah/key_hierarchy.png" alt="Key Hierarchy" /></p>
<p>The following is a brief overview of how the data and encrypted keys are stored at the data layer using a few sample data structures to illustrate the encryption pattern.
Note: Go to the <a href="https://github.com/godaddy/asherah/blob/master/docs/DesignAndArchitecture.md">Asherah design and architecture page</a> for more information.</p>
<p>Let’s say we have PII data that we want to encrypt, starting at the row level (or in Ruby on Rails terminology, at the model level). The Asherah SDK generates a data row key to encrypt that row data. The final payload that we need to store on the row level is named the data row record. It has a reference to its parent key called the intermediate key that is used to encrypt the data row key:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"Data"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<base64(encrypted_data)>"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Key"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"Created"</span><span class="p">:</span><span class="w"> </span><span class="mi">1534553138</span><span class="p">,</span><span class="w">
</span><span class="nl">"Key"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<base64(encrypted_key)>"</span><span class="p">,</span><span class="w">
</span><span class="nl">"ParentKeyMeta"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"KeyId"</span><span class="p">:</span><span class="w"> </span><span class="s2">"_IK_123_marketing_email"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Created"</span><span class="p">:</span><span class="w"> </span><span class="mi">1534553075</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Asherah generates an intermediate key unless one already exists for the given partition. Partitions create a distinct chain of encryption keys and are a way to isolate the encrypted data and limit the blast radius. Usually, we choose the primary resource id for a partition id (i.e., <code class="language-plaintext highlighter-rouge">user_id</code>). The intermediate key envelope points to its parent key (the system key):</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"_IK_123_marketing_email"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Created"</span><span class="p">:</span><span class="w"> </span><span class="mi">1534553075</span><span class="p">,</span><span class="w">
</span><span class="nl">"Key"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<base64(encrypted_key)>"</span><span class="p">,</span><span class="w">
</span><span class="nl">"ParentKeyMeta"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"KeyId"</span><span class="p">:</span><span class="w"> </span><span class="s2">"_SK_marketing_email"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Created"</span><span class="p">:</span><span class="w"> </span><span class="mi">1534553054</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Asherah generates a system key unless one exists or is expired. By default, system keys have a lifespan of 90 days, after which Asherah generates a new key. This action also initiates the creation of new intermediate keys. The <code class="language-plaintext highlighter-rouge">key_meta</code> in the system key envelope specifies the parent key used to encrypt it.</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"_SK_marketing_email"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Created"</span><span class="p">:</span><span class="w"> </span><span class="mi">1534553054</span><span class="p">,</span><span class="w">
</span><span class="nl">"Key"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<base64(key_meta)>"</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>A parent key of the system key can be:</p>
<ul>
<li>A static key (used for testing only), or</li>
<li>A HSM or KMS</li>
</ul>
<p>When using AWS KMS, Asherah first generates a data key with it. This data key is the master key used to encrypt the system keys. The data key is encrypted by the KMS and stored in the <code class="language-plaintext highlighter-rouge">encryptedKek</code>. During a decrypt operation, the KMS initially decrypts the data key, which in turn decrypts the system key. The system key then decrypts the intermediate key, and the intermediate key decrypts the data row key. The data key is encrypted with multiple AWS regions to support a fallback when a region is unavailable.</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"encryptedKey"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<base64(encrypted_key)>"</span><span class="p">,</span><span class="w">
</span><span class="nl">"kmsKeks"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"region"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<aws_region>"</span><span class="p">,</span><span class="w">
</span><span class="nl">"arn"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<arn>"</span><span class="p">,</span><span class="w">
</span><span class="nl">"encryptedKek"</span><span class="p">:</span><span class="w"> </span><span class="s2">"<base64(key_encrypted_key)"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="err">...</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>The default cipher that Asherah uses for encryption is AES-256-GCM.</p>
<h2 id="why-not-use-aws-kms-directly">Why not use AWS KMS directly?</h2>
<p>You might wonder why we don’t use AWS KMS directly for each encrypt and decrypt operation. We can, but consider the following:</p>
<ul>
<li>Performance - it will cause increased latency with each encrypt or decrypt call to AWS KMS.</li>
<li>Pricing - it will increase AWS KMS costs if we don’t cache system keys and intermediate keys in memory and minimize AWS KMS calls.</li>
</ul>
<h2 id="what-is-a-secure-memory">What is a Secure Memory?</h2>
<p>Asherah implements <a href="https://github.com/godaddy/asherah/blob/master/docs/Internals.md#secure-memory">Secure Memory</a> to safely generate, store, and cache encryption keys. By using a secure memory heap, it guards against memory leaks with swapping, core dumps, debugger memory scans, and CPU vulnerabilities like Spectre. A secure memory heap is not part of the language-managed memory, but it can be implemented using some known native calls.</p>
<p>To allocate secure memory, the following steps must be performed:</p>
<ul>
<li>check memory lock limit (getrlimit)</li>
<li>allocate memory (mmap)</li>
<li>disable swap (mlock)</li>
<li>disable core dumps (madvise)</li>
<li>write secret bytes to memory location</li>
<li>set no access (mprotect)</li>
<li>wipe secret bytes from managed memory</li>
</ul>
<p>To read from secure memory, the following steps must be performed:</p>
<ul>
<li>change memory address to read-only mode (mprotect)</li>
<li>read secret bytes from memory location</li>
<li>change memory address to no access (mprotect)</li>
<li>encrypt or decrypt with the secret</li>
<li>wipe secret bytes from managed memory</li>
</ul>
<h2 id="how-to-use-asherah-in-ruby-on-rails-applications">How to use Asherah in Ruby on Rails applications</h2>
<p><a href="https://github.com/godaddy/asherah-ruby">Asherah-Ruby</a> is a Ruby FFI wrapper around the <a href="https://github.com/godaddy/asherah/tree/master/go">Asherah Go</a> implementation of the application-layer encryption SDK. The Asherah Go implementation is exposed to Ruby via the <a href="https://github.com/godaddy/asherah-cobhan/blob/main/libasherah.go">asherah-cobhan’s Go wrapper</a> and compiled to a native shared library with <a href="https://pkg.go.dev/cmd/cgo">Cgo</a>. Currently supported platforms for Asherah Ruby are Linux and Darwin operating systems for x64 and ARM64 CPU architectures.</p>
<p>To configure the Asherah library in a Ruby on Rails application, we must first install the <a href="https://rubygems.org/gems/asherah">Asherah</a> gem. After installing the gem, we need to create the following migration for the <code class="language-plaintext highlighter-rouge">encryption_key</code> table to store the system and intermediate keys. Asherah supports MySQL and DynamoDB <a href="https://github.com/godaddy/asherah/blob/master/docs/Metastore.md">metastores</a>, and can be extended to support additional adapters. For our test, we will use MySQL.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">CreateEncryptionKey</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">7.0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">up</span>
<span class="n">execute</span><span class="p">(</span><span class="s2">"
CREATE TABLE encryption_key (
id VARCHAR(255) NOT NULL,
created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
key_record TEXT NOT NULL,
PRIMARY KEY (id, created),
INDEX (created)
);
"</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">down</span>
<span class="n">drop_table</span> <span class="ss">:encryption_key</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We have to create an initializer to configure Asherah. To do so, we set the <code class="language-plaintext highlighter-rouge">service_name</code> and <code class="language-plaintext highlighter-rouge">product_id</code> used for the key naming. We configure <code class="language-plaintext highlighter-rouge">metastore</code>, and <code class="language-plaintext highlighter-rouge">connection_string</code> for the keys storage. We need a separate <code class="language-plaintext highlighter-rouge">connection_string</code> from the default Active Record connection because Asherah Go manages the connection for writing and reading the encrypted keys. Then we configure <code class="language-plaintext highlighter-rouge">enable_session_caching</code> for performance and specify the <code class="language-plaintext highlighter-rouge">kms</code> details. We use a static key in development and test environments, and in the production environment, we use the AWS KMS service. Here is the Asherah configuration:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Asherah</span><span class="p">.</span><span class="nf">configure</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="n">config</span><span class="p">.</span><span class="nf">service_name</span> <span class="o">=</span> <span class="s1">'marketing'</span>
<span class="n">config</span><span class="p">.</span><span class="nf">product_id</span> <span class="o">=</span> <span class="s1">'email'</span>
<span class="n">config</span><span class="p">.</span><span class="nf">metastore</span> <span class="o">=</span> <span class="s1">'rdbms'</span>
<span class="n">config</span><span class="p">.</span><span class="nf">enable_session_caching</span> <span class="o">=</span> <span class="kp">true</span> <span class="c1"># default: false</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">connection_db_config</span><span class="p">.</span><span class="nf">configuration_hash</span>
<span class="n">config</span><span class="p">.</span><span class="nf">connection_string</span> <span class="o">=</span> <span class="s2">"</span><span class="si">#{</span><span class="n">c</span><span class="p">[</span><span class="ss">:username</span><span class="p">]</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">c</span><span class="p">[</span><span class="ss">:password</span><span class="p">]</span><span class="si">}</span><span class="s2">@tcp(</span><span class="si">#{</span><span class="n">c</span><span class="p">[</span><span class="ss">:host</span><span class="p">]</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">c</span><span class="p">[</span><span class="ss">:port</span><span class="p">]</span><span class="si">}</span><span class="s2">)/</span><span class="si">#{</span><span class="n">c</span><span class="p">[</span><span class="ss">:database</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span>
<span class="k">if</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'ASHERAH_KMS_ENABLED'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'true'</span>
<span class="n">config</span><span class="p">.</span><span class="nf">kms</span> <span class="o">=</span> <span class="s1">'aws'</span>
<span class="n">config</span><span class="p">.</span><span class="nf">preferred_region</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s1">'AWS_REGION'</span><span class="p">)</span>
<span class="n">config</span><span class="p">.</span><span class="nf">region_map</span> <span class="o">=</span> <span class="p">{</span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s1">'AWS_REGION'</span><span class="p">)</span> <span class="o">=></span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s1">'KMS_KEY_ARN'</span><span class="p">)</span> <span class="p">}</span>
<span class="k">elsif</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">development?</span> <span class="o">||</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">test?</span>
<span class="n">config</span><span class="p">.</span><span class="nf">kms</span> <span class="o">=</span> <span class="s1">'static'</span> <span class="c1"># The static key used for encryption is `thisIsAStaticMasterKeyForTesting` (defined in Asherah Go)</span>
<span class="k">else</span>
<span class="k">raise</span> <span class="s2">"Asherah client not configured for: </span><span class="si">#{</span><span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Once we have all that set, we can call the <code class="language-plaintext highlighter-rouge">encrypt</code> and <code class="language-plaintext highlighter-rouge">decrypt</code> operations with Asherah:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">partition_id</span> <span class="o">=</span> <span class="s1">'user_1'</span>
<span class="n">data</span> <span class="o">=</span> <span class="s1">'user@example.com'</span>
<span class="n">encrypted_data</span> <span class="o">=</span> <span class="no">Asherah</span><span class="p">.</span><span class="nf">encrypt</span><span class="p">(</span><span class="n">partition_id</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
<span class="n">decrypted_data</span> <span class="o">=</span> <span class="no">Asherah</span><span class="p">.</span><span class="nf">decrypt</span><span class="p">(</span><span class="n">partition_id</span><span class="p">,</span> <span class="n">encrypted_data</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="how-to-integrate-asherah-in-ruby-on-rails-models">How to integrate Asherah in Ruby on Rails models</h2>
<p>In Ruby on Rails models, we frequently use open schema columns of type <code class="language-plaintext highlighter-rouge">text</code> and leverage <a href="https://api.rubyonrails.org/classes/ActiveRecord/Store.html">ActiveRecord::Store</a> with JSON serialization. That way, we store data without having to run migrations for each new column we add. We’ll start by creating the table <code class="language-plaintext highlighter-rouge">users</code> with text column <code class="language-plaintext highlighter-rouge">params</code> to store personally identifiable information like <code class="language-plaintext highlighter-rouge">name</code> and <code class="language-plaintext highlighter-rouge">email</code>. Let’s create the migration:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">CreateUsers</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">7.0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">change</span>
<span class="n">create_table</span> <span class="ss">:users</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">text</span> <span class="ss">:params</span>
<span class="n">t</span><span class="p">.</span><span class="nf">timestamps</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Each model that implements application layer encryption needs to include the <code class="language-plaintext highlighter-rouge">DataEncryption</code> module we’ll define below. This module defines the <code class="language-plaintext highlighter-rouge">data_encryption</code> method used to specify the encrypted attributes’ <code class="language-plaintext highlighter-rouge">name</code> and <code class="language-plaintext highlighter-rouge">email</code> and how we reference them from the model. For the <code class="language-plaintext highlighter-rouge">partition_id</code>, we use the <code class="language-plaintext highlighter-rouge">global</code> value, but if we had a parent account model, we could partition by the <code class="language-plaintext highlighter-rouge">account_id.</code> Next, we’ll define the <code class="language-plaintext highlighter-rouge">User</code> model:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">DataEncryption</span>
<span class="n">store</span> <span class="ss">:params</span><span class="p">,</span> <span class="ss">accessors: </span><span class="p">[</span><span class="ss">:enc_data</span><span class="p">],</span> <span class="ss">coder: </span><span class="no">JSON</span>
<span class="n">data_encryption</span> <span class="ss">:raw_data</span><span class="p">,</span> <span class="ss">:enc_data</span><span class="p">,</span> <span class="ss">store_name: :params</span><span class="p">,</span> <span class="ss">accessors: </span><span class="p">[</span><span class="ss">:name</span><span class="p">,</span> <span class="ss">:email</span><span class="p">]</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">partition_id</span>
<span class="s1">'global'</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">DataEncryption</code> module defines <code class="language-plaintext highlighter-rouge">before_save</code> and <code class="language-plaintext highlighter-rouge">after_find</code> callbacks to ensure proper encryption and decryption of data when models are saved or retrieved from the database. The models that include it must define the <code class="language-plaintext highlighter-rouge">partition_id</code> for the encryption session. The <code class="language-plaintext highlighter-rouge">data_encryption</code> method expects the following arguments:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">raw_data</code> - a virtual attribute that holds the raw data</li>
<li><code class="language-plaintext highlighter-rouge">enc_data</code> - an attribute to store the encrypted data</li>
<li><code class="language-plaintext highlighter-rouge">store_name</code> - the name of the store where <code class="language-plaintext highlighter-rouge">enc_data</code> will be stored</li>
</ul>
<p>Next, we will define the <code class="language-plaintext highlighter-rouge">DataEncryption</code> module:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">DataEncryption</span>
<span class="kp">extend</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">Concern</span>
<span class="no">DataEncrypt</span> <span class="o">=</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:raw_attr_name</span><span class="p">,</span> <span class="ss">:enc_attr_name</span><span class="p">,</span> <span class="ss">:store_name</span><span class="p">)</span>
<span class="n">included</span> <span class="k">do</span>
<span class="n">class_attribute</span> <span class="ss">:data_encrypt</span><span class="p">,</span> <span class="ss">default: </span><span class="kp">nil</span>
<span class="n">before_save</span> <span class="ss">:encrypt_data_callback</span>
<span class="n">after_find</span> <span class="ss">:decrypt_data_callback</span>
<span class="k">end</span>
<span class="n">class_methods</span> <span class="k">do</span>
<span class="k">def</span> <span class="nf">data_encryption</span><span class="p">(</span><span class="n">raw_attr_name</span><span class="p">,</span> <span class="n">enc_attr_name</span><span class="p">,</span> <span class="ss">store_name: </span><span class="p">,</span> <span class="ss">accessors: </span><span class="p">[])</span>
<span class="nb">self</span><span class="p">.</span><span class="nf">data_encrypt</span> <span class="o">=</span> <span class="no">DataEncrypt</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">raw_attr_name</span><span class="p">,</span> <span class="n">enc_attr_name</span><span class="p">,</span> <span class="n">store_name</span><span class="p">)</span>
<span class="n">attribute</span> <span class="n">raw_attr_name</span><span class="p">,</span> <span class="ss">default: </span><span class="o">-></span> <span class="p">{</span> <span class="no">HashWithIndifferentAccess</span><span class="p">.</span><span class="nf">new</span> <span class="p">}</span>
<span class="n">accessors</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">accessor</span><span class="o">|</span>
<span class="n">define_method</span><span class="p">(</span><span class="n">accessor</span><span class="p">)</span> <span class="k">do</span>
<span class="n">public_send</span><span class="p">(</span><span class="n">raw_attr_name</span><span class="p">)[</span><span class="n">accessor</span><span class="p">]</span>
<span class="k">end</span>
<span class="n">define_method</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="n">accessor</span><span class="si">}</span><span class="s2">="</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">value</span><span class="o">|</span>
<span class="n">public_send</span><span class="p">(</span><span class="n">raw_attr_name</span><span class="p">)[</span><span class="n">accessor</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">encrypt_data_callback</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">public_send</span><span class="p">(</span><span class="n">data_encrypt</span><span class="p">.</span><span class="nf">raw_attr_name</span><span class="p">)</span>
<span class="k">if</span> <span class="n">data</span><span class="p">.</span><span class="nf">present?</span> <span class="o">||</span> <span class="n">public_send</span><span class="p">(</span><span class="n">data_encrypt</span><span class="p">.</span><span class="nf">enc_attr_name</span><span class="p">).</span><span class="nf">present?</span>
<span class="n">public_send</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="n">data_encrypt</span><span class="p">.</span><span class="nf">enc_attr_name</span><span class="si">}</span><span class="s2">="</span><span class="p">,</span> <span class="n">encrypt_data</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">decrypt_data_callback</span>
<span class="n">enc_data</span> <span class="o">=</span> <span class="n">public_send</span><span class="p">(</span><span class="n">data_encrypt</span><span class="p">.</span><span class="nf">enc_attr_name</span><span class="p">)</span>
<span class="k">if</span> <span class="n">enc_data</span><span class="p">.</span><span class="nf">present?</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">decrypt_data</span><span class="p">(</span><span class="n">enc_data</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Store</span><span class="o">::</span><span class="no">IndifferentCoder</span><span class="p">.</span><span class="nf">as_indifferent_hash</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">public_send</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="n">data_encrypt</span><span class="p">.</span><span class="nf">raw_attr_name</span><span class="si">}</span><span class="s2">="</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">encrypt_data</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="no">Asherah</span><span class="p">.</span><span class="nf">encrypt</span><span class="p">(</span><span class="n">partition_id</span><span class="p">,</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">decrypt_data</span><span class="p">(</span><span class="n">enc_data</span><span class="p">)</span>
<span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="no">Asherah</span><span class="p">.</span><span class="nf">decrypt</span><span class="p">(</span><span class="n">partition_id</span><span class="p">,</span> <span class="n">enc_data</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="how-to-search-encrypted-pii-data">How to search encrypted PII data</h2>
<p>Our PII data is encrypted and stored in the database, but we can’t search for it because it is not indexed. One way to implement a search for encrypted PII data is to use a cryptographic technique called a blind index. Blind indexes are created by applying a one-way cryptographic hash function to the data, generating a unique fixed-length string that represents the data without revealing the actual content. To further enhance the security of the hashed data, we use a pepper, a secret key added to the input of the hashing function to create a peppered hash. Next, we’ll define the hashing function:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Hasher</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">hash</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="no">Digest</span><span class="o">::</span><span class="no">SHA256</span><span class="p">.</span><span class="nf">hexdigest</span><span class="p">(</span><span class="n">value</span><span class="p">.</span><span class="nf">downcase</span> <span class="o">+</span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s1">'HASHING_PEPPER'</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>To implement a blind index, we will add a column named <code class="language-plaintext highlighter-rouge">hashed_email</code> with an index to the table <code class="language-plaintext highlighter-rouge">users</code>. That way, we’ll be able to search for an exact match of the hashed email (though we still can’t do a full-text search or use LIKE queries). Next, we’ll add the migration:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">AddHashedEmailToUsers</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">7.0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">change</span>
<span class="n">add_column</span> <span class="ss">:users</span><span class="p">,</span> <span class="ss">:hashed_email</span><span class="p">,</span> <span class="ss">:string</span>
<span class="n">add_index</span> <span class="ss">:users</span><span class="p">,</span> <span class="ss">:hashed_email</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We can then add a <code class="language-plaintext highlighter-rouge">before_validation</code> callback to our model to hash the data for the PII columns and define helper class methods like <code class="language-plaintext highlighter-rouge">find_by_email</code>. Finally, we’ll extend the <code class="language-plaintext highlighter-rouge">User</code> model with the following code:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="n">before_validation</span> <span class="ss">:hash_pii_columns</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">find_by_email</span><span class="p">(</span><span class="n">email</span><span class="p">)</span>
<span class="n">where</span><span class="p">(</span><span class="ss">hashed_email: </span><span class="no">Hasher</span><span class="p">.</span><span class="nf">hash</span><span class="p">(</span><span class="n">email</span><span class="p">)).</span><span class="nf">take</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">hash_pii_columns</span>
<span class="nb">self</span><span class="p">.</span><span class="nf">hashed_email</span> <span class="o">=</span> <span class="no">Hasher</span><span class="p">.</span><span class="nf">hash</span><span class="p">(</span><span class="n">email</span><span class="p">)</span> <span class="k">if</span> <span class="n">email</span><span class="p">.</span><span class="nf">present?</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="important-considerations-for-production-deployments-of-asherah-ruby">Important considerations for production deployments of Asherah-Ruby</h2>
<p>The following are some things to consider before deploying Asherah-Ruby to production:</p>
<ul>
<li>The minimal overhead is about 33% of the payloads due to Base64 encoding of encrypted data.</li>
<li>Warm up Asherah with a dummy encrypt call to decrypt the master key with KMS and cache it in memory before handling any requests:
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Rails</span><span class="p">.</span><span class="nf">configuration</span><span class="p">.</span><span class="nf">after_initialize</span> <span class="k">do</span>
<span class="no">Asherah</span><span class="p">.</span><span class="nf">encrypt</span><span class="p">(</span><span class="s1">'global'</span><span class="p">,</span> <span class="s1">'warmup'</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div> </div>
</li>
<li>Use glibc-based Linux distributions because the Go standard library has incompatibility and causes C-shared builds to <a href="https://github.com/golang/go/issues/13492">fail with musllibc</a>.</li>
<li>You might need to pass ENV variables from Ruby to Go as with the <code class="language-plaintext highlighter-rouge">AWS_CONTAINER_CREDENTIALS_RELATIVE_URI</code> ENV var when running in AWS Fargate containers. Go <code class="language-plaintext highlighter-rouge">os.Getenv()</code> does not see variables set by C.setenv() as reported in this <a href="https://github.com/golang/go/issues/44108">issue</a> and documented in the <a href="https://github.com/golang/go/wiki/cgo#environmental-variables">wiki</a>.
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">AWS_ECS_ENV_VAR_NAME</span> <span class="o">=</span> <span class="s1">'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI'</span>
<span class="no">Asherah</span><span class="p">.</span><span class="nf">set_env</span><span class="p">(</span><span class="no">AWS_ECS_ENV_VAR_NAME</span> <span class="o">=></span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="no">AWS_ECS_ENV_VAR_NAME</span><span class="p">))</span> <span class="k">if</span> <span class="no">ENV</span><span class="p">[</span><span class="no">AWS_ECS_ENV_VAR_NAME</span><span class="p">].</span><span class="nf">present?</span>
</code></pre></div> </div>
</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p><a href="https://github.com/godaddy/asherah">Asherah</a>’s cross-language support, secure memory management, and granularity with the hierarchical key encryption model are some of the key features that help us minimize attack exposure and increase the security of our customer data. <a href="https://github.com/godaddy/asherah/blob/master/docs/Internals.md#ttl-and-expiredrevoked-keys">Revoking keys</a> due to a suspected compromise is also built into the key rotation model. We have been using Asherah successfully in production for a few years now. We’ve iterated through a few different distributions of it for Ruby projects specifically, using an Asherah Go sidecar, a pure Ruby implementation of Asherah, and finally landing on <a href="https://github.com/godaddy/asherah-ruby">Asherah-Ruby</a> that’s using Asherah Go under the hood. With version 7 of Ruby on Rails, we saw the light of the built-in <a href="https://guides.rubyonrails.org/active_record_encryption.html">Active Record Encryption</a> for encrypting data at the application layer. It’s great to see more alternative solutions bringing their features and advantages.</p>
<p><em>*) Cover Photo Attribution: Photo by <a href="https://unsplash.com/@glenncarstenspeters?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Glenn Carstens-Peters</a> on <a href="https://unsplash.com/photos/tagHjCxTHEw?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a></em></p>This blog post was originally published on the GoDaddy Engineering Blog.Optimizing Email Batch API with bulk inserts2022-09-12T07:00:00+00:002022-09-12T07:00:00+00:00https://dalibornasevic.com/posts/rails-bulk-insert-mysql<p><em>This blog post was originally published on the <a href="https://www.godaddy.com/engineering/2022/09/12/rails-bulk-insert-mysql/">GoDaddy Engineering Blog</a>.</em></p>
<p style="text-align: center">
<img src="/images/rails-bulk-insert-mysql/cover.jpg" alt="Train in motion" />
</p>
<p>Rails 6 introduced the <a href="https://api.rubyonrails.org/classes/ActiveRecord/Persistence/ClassMethods.html#method-i-insert_all">insert_all</a> ActiveRecord API for inserting multiple records into the database with a single SQL INSERT statement. It has an option to select the <code class="language-plaintext highlighter-rouge">returning</code> columns, but it is available only for PostgreSQL using its <code class="language-plaintext highlighter-rouge">RETURNING</code> SQL clause and not MySQL. This blog post explores how we optimized our Email Batch API by using Rails bulk inserts with MySQL and the details of calculating the auto-incrementing IDs for records.</p>
<h2 id="improving-our-email-api">Improving Our Email API</h2>
<p>Our Email API has a multi-tenant architecture providing a database for each customer. It accepts millions of emails daily and provides a Batch API for enqueuing up to 50 messages per single batch request. The Batch API inserts these records one by one, enqueues background workers to build and deliver the emails, and returns the message IDs to the client for an eventual status check later.</p>
<p>Our change aims to improve the Batch API performance by inserting messages in bulk while preserving the original API design and returning the message IDs to the client. Our API runs on-premise using MySQL and in AWS using Aurora MySQL, and the change must be compatible with both.</p>
<h2 id="mysql-information-functions">MySQL Information Functions</h2>
<p>Although MySQL does not support a <code class="language-plaintext highlighter-rouge">RETURNING</code> clause, it provides <code class="language-plaintext highlighter-rouge">LAST_INSERT_ID()</code> and <code class="language-plaintext highlighter-rouge">ROW_COUNT()</code> functions that can help us calculate the auto-incrementing ID values from the connection session. The <a href="https://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_last-insert-id">LAST_INSERT_ID()</a> function returns the first automatically generated value successfully inserted for an <code class="language-plaintext highlighter-rouge">AUTO_INCREMENT</code> column in a table. And the <a href="https://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_row-count">ROW_COUNT()</a> function returns the number of rows affected by the previous SQL statement.</p>
<p>So, it seems simple enough to calculate the auto-incrementing IDs based on these two values.</p>
<h2 id="auto_increment-handling-in-innodb">AUTO_INCREMENT Handling in InnoDB</h2>
<p>Before we go any further, we need to review the <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html">innodb-auto-increment-handling</a> because the type of inserts, the lock mode, and the replication type can have implications on whether the IDs will be consecutive and be the same on the replicas as on the source. We need to ensure the generated IDs are without any gaps to be able to calculate their values with the functions reliably.</p>
<p>The type of multiple-row inserts we do are:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="nv">`messages`</span> <span class="p">(</span><span class="nv">`template_id`</span><span class="p">,</span> <span class="nv">`params`</span><span class="p">,</span> <span class="nv">`created_at`</span><span class="p">,</span> <span class="nv">`processed`</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="k">NULL</span><span class="p">,</span> <span class="s1">'content'</span><span class="p">,</span> <span class="s1">'2022-02-04 15:12:24'</span><span class="p">,</span> <span class="k">FALSE</span><span class="p">),</span>
<span class="p">(</span><span class="k">NULL</span><span class="p">,</span> <span class="s1">'content'</span><span class="p">,</span> <span class="s1">'2022-02-04 15:12:24'</span><span class="p">,</span> <span class="k">FALSE</span><span class="p">)</span>
</code></pre></div></div>
<p>These inserts fall into the category of <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html#:~:text=mode%E2%80%9D%20inserts.-,%E2%80%9CSimple%20inserts%E2%80%9D,-Statements%20for%20which">simple inserts</a>:</p>
<blockquote>
<p>Statements for which the number of rows to be inserted can be determined in advance (when the statement is initially processed). This includes single-row and multiple-row INSERT and REPLACE statements that do not have a nested subquery, but not INSERT … ON DUPLICATE KEY UPDATE.</p>
</blockquote>
<p>There are three types of lock modes for MySQL: traditional (0), consecutive (1), and interleaved (2). If the only statements we execute are “simple inserts,” then there are no gaps in the numbers generated for any lock mode. We use the “consecutive” lock mode with MySQL 5.7:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">irb</span><span class="p">(</span><span class="n">main</span><span class="p">):</span><span class="mo">001</span><span class="p">:</span><span class="mi">0</span><span class="o">></span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">connection</span><span class="p">.</span><span class="nf">execute</span><span class="p">(</span><span class="s2">"SELECT @@innodb_autoinc_lock_mode;"</span><span class="p">).</span><span class="nf">to_a</span>
<span class="o">=></span> <span class="p">[[</span><span class="mi">1</span><span class="p">]]</span>
</code></pre></div></div>
<p>There are three types of binary log formats: <code class="language-plaintext highlighter-rouge">STATEMENT</code>, <code class="language-plaintext highlighter-rouge">ROW</code>, and <code class="language-plaintext highlighter-rouge">MIXED</code>. When using statement-based replication and interleaved lock mode combination, there are no guarantees for auto-increment values to be the same on the replicas as on the source. But, when using row-based or mixed-format replication and any auto-increment lock mode, auto-increment values will be the same on the replicas as on the source. We run our binary log in <code class="language-plaintext highlighter-rouge">MIXED</code> format.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mf">2.7</span><span class="o">.</span><span class="mi">4</span> <span class="p">:</span><span class="mo">001</span> <span class="o">></span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">connection</span><span class="p">.</span><span class="nf">execute</span><span class="p">(</span><span class="s2">"SHOW VARIABLES LIKE 'binlog_format';"</span><span class="p">).</span><span class="nf">to_a</span>
<span class="o">=></span> <span class="p">[[</span><span class="s2">"binlog_format"</span><span class="p">,</span> <span class="s2">"MIXED"</span><span class="p">]]</span>
</code></pre></div></div>
<p>Here’s the <code class="language-plaintext highlighter-rouge">messages</code> table schema we do bulk inserts against:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="nv">`messages`</span> <span class="p">(</span>
<span class="nv">`id`</span> <span class="nb">int</span><span class="p">(</span><span class="mi">11</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="n">AUTO_INCREMENT</span><span class="p">,</span>
<span class="nv">`template_id`</span> <span class="nb">int</span><span class="p">(</span><span class="mi">11</span><span class="p">)</span> <span class="k">DEFAULT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="nv">`params`</span> <span class="nb">mediumtext</span> <span class="nb">CHARACTER</span> <span class="k">SET</span> <span class="n">utf8mb4</span><span class="p">,</span>
<span class="nv">`created_at`</span> <span class="nb">timestamp</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="nv">`processed`</span> <span class="nb">tinyint</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">DEFAULT</span> <span class="s1">'0'</span><span class="p">,</span>
<span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="nv">`id`</span><span class="p">),</span>
<span class="k">KEY</span> <span class="nv">`index_messages_on_template_id`</span> <span class="p">(</span><span class="nv">`template_id`</span><span class="p">)</span>
<span class="p">)</span> <span class="n">ENGINE</span><span class="o">=</span><span class="n">InnoDB</span> <span class="n">AUTO_INCREMENT</span><span class="o">=</span><span class="mi">1</span> <span class="k">DEFAULT</span> <span class="n">CHARSET</span><span class="o">=</span><span class="n">utf8</span> <span class="n">ROW_FORMAT</span><span class="o">=</span><span class="n">COMPRESSED</span> <span class="n">KEY_BLOCK_SIZE</span><span class="o">=</span><span class="mi">8</span>
</code></pre></div></div>
<h2 id="thread-safety-of-last_insert_id">Thread Safety of LAST_INSERT_ID()</h2>
<p>Given that <a href="https://dev.mysql.com/doc/refman/8.0/en/information-functions.html#function_last-insert-id">LAST_INSERT_ID()</a> isolation is <strong>per-connection</strong> and Rails’s <a href="https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html">ConnectionPool</a> is thread-safe, using <code class="language-plaintext highlighter-rouge">LAST_INSERT_ID()</code> is safe with our case.</p>
<blockquote>
<p>The ID that was generated is maintained in the server on a <strong><em>per-connection basis</em></strong>. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column <strong><em>by that client</em></strong>. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.</p>
</blockquote>
<h2 id="bulk-insert-and-calculate-auto-incrementing-ids">Bulk Insert and Calculate Auto-Incrementing IDs</h2>
<p>To convert the individual Rails model saves to a single bulk insert, we collect the attributes for all models before the final bulk insert. The following code with inline comments shows how we collect the models’ attributes.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">attributes</span> <span class="o">=</span> <span class="n">messages_params</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">message_params</span><span class="o">|</span>
<span class="c1"># Initialize Message object</span>
<span class="n">message</span> <span class="o">=</span> <span class="no">Message</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">message_params: </span><span class="n">message_params</span><span class="p">)</span>
<span class="c1"># Set timestamps</span>
<span class="no">Message</span><span class="p">.</span><span class="nf">all_timestamp_attributes_in_model</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="nb">name</span><span class="o">|</span>
<span class="n">message</span><span class="p">.</span><span class="nf">_write_attribute</span><span class="p">(</span><span class="nb">name</span><span class="p">,</span> <span class="no">Message</span><span class="p">.</span><span class="nf">current_time_from_proper_timezone</span><span class="p">)</span>
<span class="k">end</span>
<span class="c1"># Run the necessary model callbacks</span>
<span class="p">[</span><span class="ss">:validation</span><span class="p">,</span> <span class="ss">:save</span><span class="p">,</span> <span class="ss">:create</span><span class="p">].</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">kind</span><span class="o">|</span> <span class="n">message</span><span class="p">.</span><span class="nf">run_callbacks</span><span class="p">(</span><span class="n">kind</span><span class="p">)</span> <span class="p">}</span>
<span class="c1"># Collect message attributes for bulk insert</span>
<span class="n">attribute_names</span> <span class="o">=</span> <span class="no">Message</span><span class="p">.</span><span class="nf">column_names</span> <span class="o">-</span> <span class="p">[</span><span class="no">Message</span><span class="p">.</span><span class="nf">primary_key</span><span class="p">]</span>
<span class="n">attribute_names</span><span class="p">.</span><span class="nf">each_with_object</span><span class="p">({})</span> <span class="k">do</span> <span class="o">|</span><span class="nb">name</span><span class="p">,</span> <span class="n">object</span><span class="o">|</span>
<span class="n">object</span><span class="p">[</span><span class="nb">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">message</span><span class="p">.</span><span class="nf">_read_attribute</span><span class="p">(</span><span class="nb">name</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Once we’ve built the array of attributes, we can call <code class="language-plaintext highlighter-rouge">insert_all!</code>.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Message</span><span class="p">.</span><span class="nf">insert_all!</span><span class="p">(</span><span class="n">attributes</span><span class="p">)</span>
</code></pre></div></div>
<p>We use <code class="language-plaintext highlighter-rouge">insert_all!</code> instead of <code class="language-plaintext highlighter-rouge">insert_all</code> for the bulk insert so if an issue occurs, it fails the whole insert, and no rows are inserted. For instance, <code class="language-plaintext highlighter-rouge">insert_all!</code> raises the <code class="language-plaintext highlighter-rouge">ActiveRecord::RecordNotUnique</code> error if any rows violate a unique index when it’s present on the table.</p>
<p>After inserting the records, we can calculate the auto-incrementing IDs by retreiving the <code class="language-plaintext highlighter-rouge">LAST_INSERT_ID()</code> value from the <code class="language-plaintext highlighter-rouge">Mysql2::Client</code> object using the <code class="language-plaintext highlighter-rouge">last_id</code> method:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mysql_client</span> <span class="o">=</span> <span class="no">Message</span><span class="p">.</span><span class="nf">connection</span><span class="p">.</span><span class="nf">instance_variable_get</span><span class="p">(</span><span class="ss">:@connection</span><span class="p">)</span>
<span class="n">last_id</span> <span class="o">=</span> <span class="n">mysql_client</span><span class="p">.</span><span class="nf">last_id</span>
</code></pre></div></div>
<p>When inserting multiple rows using a single INSERT statement, the <code class="language-plaintext highlighter-rouge">last_id</code> returns only the value generated for the first inserted row.</p>
<p>To get the <code class="language-plaintext highlighter-rouge">ROW_COUNT()</code> function value, we can call the <code class="language-plaintext highlighter-rouge">affected_rows</code> method on the <code class="language-plaintext highlighter-rouge">Mysql2::Client</code>, but since the IDs are consecutive numbers, we can simply add the loop <code class="language-plaintext highlighter-rouge">index</code> to the <code class="language-plaintext highlighter-rouge">last_id</code> and set the message ID:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">messages</span><span class="p">.</span><span class="nf">each_with_index</span> <span class="k">do</span> <span class="o">|</span><span class="n">message</span><span class="p">,</span> <span class="n">index</span><span class="o">|</span>
<span class="n">message</span><span class="p">.</span><span class="nf">id</span> <span class="o">=</span> <span class="n">last_id</span> <span class="o">+</span> <span class="n">index</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="performance-improvements">Performance Improvements</h2>
<p>By deploying this change to our Email API running with MySQL 5.7, we saw a decrease of about 35% for the average transaction duration time of the Batch API request. The time decrease percentage is for the Batch API request and not just the MySQL insert time.</p>
<p><img src="/images/rails-bulk-insert-mysql/bulk_insert_mysql.png" alt="Bulk inserts MySQL" /></p>
<p>And for the Email API in AWS running with Aurora MySQL 5.7, the change decreased the average transaction duration time of the Batch API request by about 65%.</p>
<p><img src="/images/rails-bulk-insert-mysql/bulk_insert_aurora.png" alt="Bulk inserts AWS Aurora MySQL" /></p>
<h2 id="summary">Summary</h2>
<p>MySQL does not support a <code class="language-plaintext highlighter-rouge">RETURNING</code> clause for getting the auto-incrementing IDs for bulk inserts, but it provides the <code class="language-plaintext highlighter-rouge">LAST_INSERT_ID()</code> information function that helps us calculate them. By introducing bulk inserts, we significantly improved the transaction duration times of our Email Batch API requests. The change had a more significant effect on AWS Aurora MySQL, presumably due to its engine optimization. A simpler application model with minimal callbacks and validation logic makes introducing such a change more feasible.</p>
<p><em>*) Cover Photo Attribution: Photo by Marek Piwnicki: https://www.pexels.com/photo/train-in-motion-8991549/</em></p>This blog post was originally published on the GoDaddy Engineering Blog.Running Puma in AWS2022-01-10T19:00:00+00:002022-01-10T19:00:00+00:00https://dalibornasevic.com/posts/running-puma-in-aws<p><em>This blog post was originally published on the <a href="https://www.godaddy.com/engineering/2022/01/10/running-puma-in-aws/">GoDaddy Engineering Blog</a>.</em></p>
<p style="text-align: center">
<img src="/images/puma-aws/puma-logo.png" alt="Puma Logo" />
</p>
<p>In the past couple of years, we have been on our <a href="https://www.godaddy.com/engineering/2021/05/07/godaddys-journey-to-the-cloud/">journey to the cloud</a> migrating our web services to AWS. In this blog post, we share what we learned about deploying Puma web server to AWS by migrating our email delivery service written in Ruby to AWS.</p>
<h2 id="what-is-puma">What is Puma?</h2>
<p><a href="https://puma.io/">Puma</a> is the <a href="https://www.ruby-toolbox.com/categories/web_servers">most popular</a> Ruby web server used in production per the <a href="https://rails-hosting.com/2020/#which-rails-servers-are-you-using-in-production">Ruby on Rails Community Survey Results</a>. It is a fast and reliable web server that we use for deploying containerized Ruby applications at GoDaddy.</p>
<h2 id="end-to-end-ssl">End-to-end SSL</h2>
<p>The web components of our email delivery service run on Kubernetes. The Kubernetes service is behind an ALB Ingress Controller managed by <a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/">AWS Load Balancer Controller</a>. Every web request has end-to-end encryption in transit. Application Load Balancer (ALB) reinitializes TLS and Puma server terminates TLS in the Kubernetes pod. The Kubernetes pod is a docker container running a Ruby on Rails application with Puma.</p>
<h2 id="loading-certificates-from-memory">Loading certificates from memory</h2>
<p>When a container starts, the application initialization process retrieves the SSL certificates from AWS Secrets Manager and configures them with Puma on the fly. We <a href="https://github.com/puma/puma/pull/2728">contributed a change</a> to Puma’s MiniSSL C extension to allow setting <code class="language-plaintext highlighter-rouge">cert_pem</code> and <code class="language-plaintext highlighter-rouge">key_pem</code> strings without persisting them on disk for security reasons. This new functionality is available through the <code class="language-plaintext highlighter-rouge">ssl_bind</code> Puma DSL and will be available in the next Puma version (> 5.5.2).</p>
<p>With the following sample we fetch and configure the certificate for our API component:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/puma.rb</span>
<span class="n">config</span> <span class="o">=</span> <span class="no">AwsDeploy</span><span class="o">::</span><span class="no">Config</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"RAILS_ENV"</span><span class="p">))</span>
<span class="n">certificate_downloader</span> <span class="o">=</span> <span class="no">AwsCertificateDownloader</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">api_port</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"PORT_API"</span><span class="p">)</span>
<span class="n">api_cert</span> <span class="o">=</span> <span class="n">certificate_downloader</span><span class="p">.</span><span class="nf">download</span><span class="p">(</span><span class="s2">"/Cert/</span><span class="si">#{</span><span class="n">config</span><span class="p">.</span><span class="nf">api_host_name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">ssl_bind</span> <span class="s1">'0.0.0.0'</span><span class="p">,</span> <span class="n">api_port</span><span class="p">,</span> <span class="p">{</span>
<span class="ss">cert_pem: </span><span class="n">api_cert</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="ss">:cert</span><span class="p">),</span>
<span class="ss">key_pem: </span><span class="n">api_cert</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="ss">:key</span><span class="p">),</span>
<span class="ss">no_tlsv1: </span><span class="kp">true</span><span class="p">,</span>
<span class="ss">no_tlsv1_1: </span><span class="kp">true</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We run two other application components on different ports and hosts in the same Puma process using a similar config to the above.</p>
<h2 id="warmup-for-slow-clients">Warmup for slow clients</h2>
<p>We use our application-layer encryption SDK, <a href="https://github.com/godaddy/asherah">Asherah</a>, to encrypt all data with Personally Identifiable Information (PII) in the database. Each data row gets encrypted with a data row key, that gets encrypted with an intermediate key, then a system key, and a master key stored in AWS Key Management Service (KMS).</p>
<p>Asherah client initialization is an expensive operation that involves HTTP requests to AWS KMS service and database calls to retrieve the system and intermediate keys. To avoid availability issues during process restarts (deploys, daily node rotation) we have to warm up clients with slow initialization inside Puma <code class="language-plaintext highlighter-rouge">on_worker_boot</code> block.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">on_worker_boot</span> <span class="k">do</span>
<span class="no">AsherahClient</span><span class="p">.</span><span class="nf">encrypt</span><span class="p">(</span><span class="s1">'warmup'</span><span class="p">,</span> <span class="no">EncryptionPartition</span><span class="o">::</span><span class="no">GLOBAL</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Without a warmed-up Asherah client and when there is a spike of requests during a deployment, we would experience availability problems as seen on the ALB monitoring graphs below. Warming up the Asherah client before targets get added in service resolves that issue.</p>
<p><img src="/images/puma-aws/alb.png" alt="ALB Monitoring" /></p>
<h1 id="alb-monitoring">ALB monitoring</h1>
<p>AWS Load Balancers Monitoring page gives us a good overview of incoming requests and response statuses.</p>
<p>We need to distinguish between status codes returned from the targets:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">HTTP 2XXs</code> (HTTPCode_Target_2XX_Count)</li>
<li><code class="language-plaintext highlighter-rouge">HTTP 3XXs</code> (HTTPCode_Target_3XX_Count)</li>
<li><code class="language-plaintext highlighter-rouge">HTTP 4XXs</code> (HTTPCode_Target_4XX_Count)</li>
<li><code class="language-plaintext highlighter-rouge">HTTP 5XXs</code> (HTTPCode_Target_5XX_Count)</li>
</ul>
<p>and status codes generated by the load balancers:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">ELB 4XXs</code> (HTTPCode_ELB_4XX_Count)</li>
<li><code class="language-plaintext highlighter-rouge">ELB 5XXs</code> (HTTPCode_ELB_5XX_Count)</li>
<li><code class="language-plaintext highlighter-rouge">HTTP 500s</code> (HTTPCode_ELB_500_Count)</li>
<li><code class="language-plaintext highlighter-rouge">HTTP 502s</code> (HTTPCode_ELB_502_Count)</li>
<li><code class="language-plaintext highlighter-rouge">HTTP 503s</code> (HTTPCode_ELB_503_Count)</li>
<li><code class="language-plaintext highlighter-rouge">HTTP 504s</code> (HTTPCode_ELB_504_Count)</li>
</ul>
<p>Errors generated by the targets will appear in the Exception Monitoring and/or Application Performance Monitoring (APM) systems and are easier to find and resolve than <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html#load-balancer-http-error-codes">errors generated by the AWS ELB</a> (Elastic Load Balancer).</p>
<p>In our experience and specific to our infrastructure setup, <code class="language-plaintext highlighter-rouge">HTTP 500s</code> errors are blocks coming from the AWS Web Application Firewall (WAF), <code class="language-plaintext highlighter-rouge">HTTP 502s</code> errors are because of TCP connection issues or SSL handshake issues, <code class="language-plaintext highlighter-rouge">HTTP 503s</code> errors happen when there are no targets and <code class="language-plaintext highlighter-rouge">HTTP 504s</code> errors are due to capacity issues i.e. when there are not enough targets.</p>
<h2 id="keep-alive-timeout">Keep-Alive timeout</h2>
<p>In our setup, ALB uses Keep-Alive connections with Puma and we noticed a small but consistent rate of <code class="language-plaintext highlighter-rouge">HTTP 502s</code> errors during quiet hours. That was happening because Puma’s <a href="https://github.com/puma/puma/blob/master/lib/puma/const.rb">default persistent timeout</a> is 20 seconds (<code class="language-plaintext highlighter-rouge">PERSISTENT_TIMEOUT = 20</code>) and the ALB <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html#connection-idle-timeout">connection idle timeout</a> is 60 seconds. In such quiet intervals, it happens for Puma to close the connection before ALB does and then ALB serves 502 Bad Gateway error to the client.</p>
<p><img src="/images/puma-aws/alb-keep-alive.png" alt="ALB Monitoring" /></p>
<p>By configuring <code class="language-plaintext highlighter-rouge">persistent_timeout</code> for Puma to a value bigger than ALB connection idle timeout (60 seconds) + ALB connect timeout (10 seconds) we resolved that issue:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">persistent_timeout</span><span class="p">(</span><span class="mi">75</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="handling-blasts-of-requests">Handling blasts of requests</h2>
<p>When we send campaigns with a significant volume of recipients hosted by a single ISP, sometimes we get back a spike of web requests from the ISP that checks the links with their abuse and spam protection system. Some of the ISPs use a wide range of IPs and the <a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-type-rate-based.html">AWS WAF IP rate limiting per 5-minute time span</a> often does not catch the complete blast of requests and we get 504 Gateway timeout:</p>
<p><img src="/images/puma-aws/alb-blast.png" alt="Blast of Requests" /></p>
<blockquote>
<p>The load balancer failed to establish a connection to the target before the connection timeout expired (10 seconds).</p>
</blockquote>
<p>These 504 errors are a result of open timeouts from ALB to targets i.e. requests from ALB are waiting for 10 seconds and are not able to connect to the target socket. The reason for that is some slow requests that saturate the queue and with the blast of requests the socket backlog gets full resulting in the operating system not accepting new connections. Puma allows configuring the backlog value that determines the size of the queue for unaccepted connections.</p>
<p>We made a <a href="https://github.com/puma/puma/pull/2780">change to Puma</a> to allow setting the <code class="language-plaintext highlighter-rouge">backlog</code> value with the <code class="language-plaintext highlighter-rouge">ssl_bind</code> DSL that we use. It’s interesting that, although Puma sets the backlog size to 1024 by default, its actual value depends on the OS value for max socket connections; i.e., it is capped by the <code class="language-plaintext highlighter-rouge">net.core.somaxconn</code> sysctl value. We can check the system value with <code class="language-plaintext highlighter-rouge">sysctl net.core.somaxconn</code> or <code class="language-plaintext highlighter-rouge">cat /proc/sys/net/core/somaxconn</code>. On older Linux kernels (before linux-5.4) the default was set to 128 and on newer, it is 4096 (<a href="https://github.com/torvalds/linux/blob/ca2ef2d9f2aad7a28d346522bb4c473a0aa05249/Documentation/networking/ip-sysctl.rst#tcp-variables">reference</a>).</p>
<p>To set that with Puma’s <code class="language-plaintext highlighter-rouge">ssl_bind</code> DSL, we just provide the appropriate <code class="language-plaintext highlighter-rouge">backlog</code> value with:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ssl_bind</span> <span class="s1">'0.0.0.0'</span><span class="p">,</span> <span class="n">tracking_port</span><span class="p">,</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="ss">backlog: </span><span class="mi">4096</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In our particular case, it makes sense to increase the <code class="language-plaintext highlighter-rouge">backlog</code> value to prevent dropping requests while the blast of requests lasts at a cost of slightly increased latency for that short duration. That resolution came after we first evaluated capacity increase options and optimized the end-points by moving expensive operations to background jobs. These responses are in the range of 1-10 milliseconds and tools like <a href="https://github.com/SamSaffron/lru_redux">lru_redux</a> for in-process memory caching are extremely helpful.</p>
<p>Another thing to check is whether the liveness probe is the same as the readiness probe as it can <a href="https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html">worsen such high-load situations</a> by restarting the pods. If the liveness probe is the same as the readiness probe, we can increase the <code class="language-plaintext highlighter-rouge">failureThreshold</code> for the liveness probe to a bigger value (10 for example).</p>
<p>Consider also relaxing the readiness probe in this situation by increasing its timeout. That helped us reduce errors like “SSL_read: shutdown while in init” that we were seeing for Redis connections. They seem to happen when Kubernetes takes the pod out of service due to failing readiness probes during that blast of requests, and then the ongoing requests to other Puma threads in the same process gets canceled which results in 502 errors in addition to the 504 errors.</p>
<h2 id="graceful-shutdown-and-pod-termination">Graceful shutdown and pod termination</h2>
<p>When terminating a pod, Kubernetes first sends a <code class="language-plaintext highlighter-rouge">SIGINT</code> and, if the pod does not stop within the <code class="language-plaintext highlighter-rouge">terminationGracePeriodSeconds</code>, Kubernetes sends <code class="language-plaintext highlighter-rouge">SIGKILL</code> to forcefully stop it. When Kubernetes terminates a pod, the command to remove the endpoint from the service and the <code class="language-plaintext highlighter-rouge">SIGINT</code> signal execute in parallel. That could cause some requests to get dropped because the pod is terminating and that would result in 502/504 errors.</p>
<p>An easy way to work around that limitation and to have a <a href="https://learnk8s.io/graceful-shutdown">greceful-shutdown</a> is to add a sleep interval before the Puma process stops. To achieve that we use a <code class="language-plaintext highlighter-rouge">preStop</code> hook and, in our testing, we landed on a sleep interval of 40 seconds that is enough time for Kubernetes’ <code class="language-plaintext highlighter-rouge">Endpoints Controller</code> async reaction and for <code class="language-plaintext highlighter-rouge">kube-proxy</code> to update <code class="language-plaintext highlighter-rouge">iptable</code> rules. We are also increasing the <code class="language-plaintext highlighter-rouge">terminationGracePeriodSeconds</code> to 70 seconds that applies to the total time (both PreStopp hook + container stop) to allow for 30 seconds for Puma to process queued requests before it receives <code class="language-plaintext highlighter-rouge">SIGKILL</code>.</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">terminationGracePeriodSeconds</span><span class="pi">:</span> <span class="m">70</span>
<span class="na">containers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="pi">{{</span> <span class="nv">include "application.apps.name" .</span> <span class="pi">}}</span>
<span class="na">image</span><span class="pi">:</span> <span class="pi">{{</span> <span class="nv">include "container.image" .</span> <span class="pi">}}</span>
<span class="na">args</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">bundle</span><span class="nv"> </span><span class="s">exec</span><span class="nv"> </span><span class="s">puma"</span><span class="pi">]</span>
<span class="na">lifecycle</span><span class="pi">:</span>
<span class="na">preStop</span><span class="pi">:</span>
<span class="na">exec</span><span class="pi">:</span>
<span class="na">command</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">sh"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">-c"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">sleep</span><span class="nv"> </span><span class="s">40"</span><span class="pi">]</span>
</code></pre></div></div>
<h2 id="puma-stats-and-auto-scaling">Puma stats and auto-scaling</h2>
<p>Queue Time is an important metric to monitor and should feed into the auto-scaling configuration. But, AWS ALB does not provide the <a href="https://forums.aws.amazon.com/message.jspa?messageID=396283">X-Request-Start</a> header and we cannot calculate the queue time dynamically. We can enable, download and parse load balancer <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html">access logs</a> to calculate queue time after the fact like this:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">queue_time</span> <span class="o">=</span> <span class="n">time</span> <span class="o">-</span> <span class="n">request_creation_time</span> <span class="o">-</span> <span class="n">request_processing_time</span> <span class="o">-</span> <span class="n">response_processing_time</span> <span class="o">-</span> <span class="n">target_processing_time</span>
</code></pre></div></div>
<p>We need a dynamic value to use for auto-scaling, and we can calculate the Puma business metric using the following formula:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">puma_business</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">sum</span><span class="p">(</span><span class="n">pool_capacity</span><span class="p">)</span> <span class="o">/</span> <span class="n">sum</span><span class="p">(</span><span class="n">max_threads</span><span class="p">))</span> <span class="o">*</span> <span class="mi">100</span>
</code></pre></div></div>
<p>These Puma values used in the calculation are available from <a href="https://puma.io/puma/file.stats.html">Puma.stats</a>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
<span class="s2">"started_at"</span><span class="p">:</span> <span class="s2">"2021-12-27T15:19:09Z"</span><span class="p">,</span>
<span class="s2">"backlog"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s2">"running"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"pool_capacity"</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span>
<span class="s2">"max_threads"</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="s2">"requests_count"</span><span class="p">:</span> <span class="mi">6</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="load-balancing">Load balancing</h2>
<p>In regards to load balancing, we need to consider whether to run Puma in single or cluster mode. The advantage of Puma cluster mode is that it can better deal with slow, CPU-bound responses because the queue is shared between more than one worker. Puma will route requests to worker processes that have the capacity, yielding <a href="https://www.speedshop.co/2015/07/29/scaling-ruby-apps-to-1000-rpm.html">better queue time</a>.</p>
<p>AWS ALB supports the <a href="https://aws.amazon.com/about-aws/whats-new/2019/11/application-load-balancer-now-supports-least-outstanding-requests-algorithm-for-load-balancing-requests/">Least Outstanding Requests</a> algorithm for load balancing requests in addition to the default, Round Robin algorithm. The Least Outstanding Requests algorithm is not ideal in case there is a problematic pod that quickly returns error responses and all upcoming requests gets routed to it unless we have quick health checks to react to the falling target.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Deploying Puma and tuning its performance to adequately provision resources involves lots of details to consider and analyze. Warming up slow clients, tuning keep-alive timeouts, graceful shutdowns, and optimizing the backlog queue size are essential to ensure the service can respond to high loads with minimal latency and without interruption. Loading SSL certificates directly from Secrets Manager, end-to-end SSL encryption in transit, and implementing application-layer encryption are required to secure customer data in the cloud. Monitoring for Puma metrics, in addition to ALB monitoring, would be great to have. We’ll be exploring using <a href="https://prometheus.io/">Prometheus</a> for monitoring Puma metrics and configuring auto-scaling based on the Puma business metric. In the lack of such monitoring, analyzing access logs could bring useful insights and ideas on what to tweak next.</p>This blog post was originally published on the GoDaddy Engineering Blog.Distributed cron for Rails apps with Sidekiq Scheduler2018-10-15T21:00:00+00:002018-10-15T21:00:00+00:00https://dalibornasevic.com/posts/distributed-cron-for-rails-apps-with-sidekiq-scheduler<p><em>This blog post was originally published on the <a href="https://www.godaddy.com/engineering/2018/10/15/distributed-cron-for-rails-apps-with-sidekiq-scheduler/">GoDaddy Engineering Blog</a>.</em></p>
<p style="text-align: center">
<img src="/images/sidekiq_scheduler.png" alt="Sidekiq Scheduler" />
</p>
<p>We are heavy users of <a href="https://github.com/mperham/sidekiq">Sidekiq</a>. Sidekiq is a Ruby background jobs processing library that uses Redis for storage and is widely used in Ruby on Rails applications. It has a nice ecosystem that allows extending its functionality with plugins.</p>
<p>One such plugin that helped us run distributed cron, reduce maintenance costs and simplify our deployments is <a href="https://github.com/moove-it/sidekiq-scheduler">Sidekiq Scheduler</a>. We will discuss the motivation for migrating from OS based cron to distributed cron using Sidekiq Scheduler and the benefits we get from it.</p>
<h2 id="our-deployment-setup">Our deployment setup</h2>
<p>We maintain some legacy Ruby on Rails applications along with new Ruby on Rails microservices. We build our new microservices with the <a href="/2018/06/28/amazon-eks/">public cloud</a> in mind and deploy them on <a href="https://kubernetes.io/">Kubernetes</a>. We deploy our legacy applications with <a href="https://github.com/capistrano/capistrano">Capistrano</a> while we work on migrating them to the public cloud. We landed on a strategy for deploying cron jobs that works well for us in both scenarios.</p>
<p>With our standard Capistrano deploys, we deploy an application to web servers that handle web requests and to worker servers that process background jobs.</p>
<p>The web servers deploy is consistent and all running processes are <a href="https://www.phusionpassenger.com/">Phusion Passenger</a> instances. The workers deploy is more complex. Besides deploying the Sidekiq processes, it deploys cron jobs to a specific worker server and depending on the application it might deploy other stand-alone runner processes to specific worker servers.</p>
<h2 id="what-are-the-main-problems-with-this-setup">What are the main problems with this setup?</h2>
<p>There are two main problems with this setup that we want to resolve:</p>
<ol>
<li>
<p>Single point of failure</p>
<p>The crons and the runner procesess are each deployed to a specific server respectively. In case of an issue like a network or out of memory incident, we risk having a partial failure in how the service operates.</p>
</li>
<li>
<p>Running tasks twice at the same time</p>
<p>If a cron job needs to run frequently and it has a long processing time, there is nothing to prevent an overlap with the next cron schedule. With experimental canary deploys, human error is possible too, that could result in deploying the crons or the runner process to more than one server.</p>
</li>
</ol>
<h2 id="distributed-cron-with-sidekiq-scheduler">Distributed cron with Sidekiq Scheduler</h2>
<p>Let’s first start with a brief introduction to how Sidekiq Scheduler works and then we will discuss its benefits over OS based cron jobs and look at some of the alternatives.</p>
<p><a href="https://github.com/moove-it/sidekiq-scheduler">Sidekiq Scheduler</a> is a lightweight job scheduling extension for Sidekiq. It uses <a href="https://github.com/jmettraux/rufus-scheduler">Rufus Scheduler</a> under the hood, that is itself an in-memory scheduler.</p>
<p>Sidekiq Scheduler extends Sidekiq by starting a Rufus Scheduler thread in the same process, loading and maintaining the schedules for it. By starting Sidekiq Scheduler in all Sidekiq processes distributed on all hosts we get a distibuted cron solution that resolves the single point of failure issue.</p>
<p>Running Sidekiq Scheduler on multiple hosts could have some <a href="https://github.com/moove-it/sidekiq-scheduler#notes-about-running-on-multiple-hosts">issues</a>. Although, we exclusively use the <code class="language-plaintext highlighter-rouge">cron</code> type of schedules, we still couple the cron jobs in Sidekiq Scheduler with using a Sidekiq plugin for unique jobs. That covers the uniqueness goal and also guarantees that no duplicate cron jobs run at the same time until the cron job finishes with success.</p>
<p>Each Sidekiq process running Sidekiq Scheduler will first try to register the cron job to get a lock and only then enqueue it. The increased load to Redis when every single process tries to get a lock is acceptable for us because Redis capacity allows for that.</p>
<h2 id="configuring-and-using-sidekiq-scheduler">Configuring and using Sidekiq Scheduler</h2>
<p>We have a custom config for Sidekiq Scheduler that allows for more control over sharing configs between environments. In an initializer, we require <code class="language-plaintext highlighter-rouge">sidekiq-scheduler</code> and its UI component and configure the Sidekiq server:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/initializers/sidekiq.rb</span>
<span class="nb">require</span> <span class="s1">'sidekiq'</span>
<span class="nb">require</span> <span class="s1">'sidekiq/web'</span>
<span class="nb">require</span> <span class="s1">'sidekiq-scheduler'</span>
<span class="nb">require</span> <span class="s1">'sidekiq-scheduler/web'</span>
<span class="no">Sidekiq</span><span class="p">.</span><span class="nf">configure_server</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="n">config</span><span class="p">.</span><span class="nf">on</span><span class="p">(</span><span class="ss">:startup</span><span class="p">)</span> <span class="k">do</span>
<span class="no">SidekiqScheduler</span><span class="o">::</span><span class="no">Scheduler</span><span class="p">.</span><span class="nf">instance</span><span class="p">.</span><span class="nf">rufus_scheduler_options</span> <span class="o">=</span> <span class="p">{</span> <span class="ss">max_work_threads: </span><span class="mi">1</span> <span class="p">}</span>
<span class="no">Sidekiq</span><span class="p">.</span><span class="nf">schedule</span> <span class="o">=</span> <span class="no">ConfigParser</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="no">Rails</span><span class="p">.</span><span class="nf">root</span><span class="p">,</span> <span class="s2">"config/sidekiq_scheduler.yml"</span><span class="p">),</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">)</span>
<span class="no">SidekiqScheduler</span><span class="o">::</span><span class="no">Scheduler</span><span class="p">.</span><span class="nf">instance</span><span class="p">.</span><span class="nf">reload_schedule!</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Rufus Scheduler starts <a href="https://github.com/moove-it/sidekiq-scheduler#notes-about-connection-pooling">28 threads</a> by default. Because its job is only to enqueue Sidekiq jobs and Sidekiq workers will do the actual execution, we can decrease the <code class="language-plaintext highlighter-rouge">max_work_threads</code> to 1.</p>
<p><code class="language-plaintext highlighter-rouge">ConfigParser.parse</code> is a small utility function that converts the YAML config to a hash:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'yaml'</span>
<span class="nb">require</span> <span class="s1">'erb'</span>
<span class="k">class</span> <span class="nc">ConfigParser</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">environment</span><span class="p">)</span>
<span class="no">YAML</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="no">ERB</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">IO</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="n">file</span><span class="p">)).</span><span class="nf">result</span><span class="p">)[</span><span class="n">environment</span><span class="p">]</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Sidekiq Scheduler config looks like this:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/sidekiq_scheduler.yml</span>
<span class="na">default</span><span class="pi">:</span> <span class="nl">&default</span>
<span class="na">active_mailings</span><span class="pi">:</span>
<span class="na">class</span><span class="pi">:</span> <span class="s">ActiveMailingsWorker</span>
<span class="na">cron</span><span class="pi">:</span> <span class="s1">'</span><span class="s">*/10</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">America/Phoenix'</span>
<span class="na">scheduled_mailings</span><span class="pi">:</span>
<span class="na">class</span><span class="pi">:</span> <span class="s">ScheduledMailingsWorker</span>
<span class="na">cron</span><span class="pi">:</span> <span class="s1">'</span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">America/Phoenix'</span>
<span class="na">development</span><span class="pi">:</span>
<span class="s"><<</span><span class="pi">:</span> <span class="nv">*default</span>
<span class="na">staging</span><span class="pi">:</span>
<span class="s"><<</span><span class="pi">:</span> <span class="nv">*default</span>
<span class="na">production</span><span class="pi">:</span>
<span class="s"><<</span><span class="pi">:</span> <span class="nv">*default</span>
</code></pre></div></div>
<p>Rufus Scheduler allows for seconds precision with an optional cron expression format consisting of a six fields time specifier where the first one is for the seconds. Per that config example, we specify a run of <code class="language-plaintext highlighter-rouge">ActiveMailingsWorker</code> every 10 seconds and a run of <code class="language-plaintext highlighter-rouge">ScheduledMailingsWorker</code> every minute.</p>
<p>By default, when no timezone is set with the cron string, it uses the Rails’ configured timezone in <code class="language-plaintext highlighter-rouge">config/application.rb</code>. We have an option to change it if we need to.</p>
<p>The scheduled tasks are standard Sidekiq workers:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ActiveMailingsWorker</span>
<span class="kp">include</span> <span class="no">Sidekiq</span><span class="o">::</span><span class="no">Worker</span>
<span class="n">sidekiq_options</span> <span class="ss">queue: :cron</span><span class="p">,</span> <span class="ss">unique_for: </span><span class="mi">30</span><span class="p">.</span><span class="nf">minutes</span>
<span class="k">def</span> <span class="nf">perform</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="benefits-of-using-sidekiq-scheduler-vs-os-based-cron-jobs">Benefits of using Sidekiq Scheduler vs OS based cron jobs</h2>
<p>There are some other benefits of using Sidekiq Scheduler vs OS based cron jobs that are worth discussing:</p>
<ol>
<li>
<p>No process bootup wait time</p>
<p>Each time OS based cron jobs run, it takes time for the process to bootup before it executes. Depending on the app size, it could take from seconds to minutes. That means the cron execution is always delayed. With Sidekiq Scheduler, it’s an already running thread as part of the Sidekiq process and there are no bootup delays.</p>
</li>
<li>
<p>Seconds precision</p>
<p>The most frequent an OS based cron job can run is minutes frequency. Because Rufus Scheduler runs in-memory it can schedule jobs every second.</p>
</li>
<li>
<p>Error monitoring</p>
<p>When OS based cron jobs fail, we can log errors to log files and remember to check them later. With Sidekiq Scheduler, the cron jobs are normal Sidekiq jobs and the standard Sidekiq UI and application error monitoring mechanisms apply.</p>
</li>
<li>
<p>Consistency</p>
<p>We can write <code class="language-plaintext highlighter-rouge">rake</code> tasks, custom scripts or rails runners and configure the OS based cron jobs to call them. While there are ways to test all these types of tasks, it’s more consistent when we define cron jobs as normal Sidekiq workers.</p>
</li>
<li>
<p>Run it everywhere</p>
<p>Cron jobs run as part of Sidekiq workers and that makes it easy to deploy cron jobs in different environments. From production, staging to running the cron jobs locally.</p>
</li>
</ol>
<h2 id="converting-runner-processes-to-sidekiq-scheduler">Converting runner processes to Sidekiq Scheduler</h2>
<p>Our runner processes are responsible for operations like booting up scheduled mailings, throttling operations or sending mailing batches. These tasks need to run more frequently than once a minute, which is the minimum frequency for OS based cron jobs.</p>
<p>Rufus Scheduler allows for seconds frequency and we can convert these runner processes into normal Sidekiq jobs scheduled and enqueued by Sidekiq Scheduler. With that we get a consistent workers deploy that is as simple as the apps deploy resulting in all running instances being Sidekiq workers.</p>
<h2 id="look-at-some-alternatives">Look at some alternatives</h2>
<ul>
<li>
<p>An alternative solution is using Sidekiq Enterprise feature for <a href="https://github.com/mperham/sidekiq/wiki/Ent-Periodic-Jobs">Periodic Jobs</a>. It has a standard crontab format that does not have seconds frequency and the <a href="https://github.com/mperham/sidekiq/wiki/Ent-Leader-Election">Leader Election</a> feature can help implement a custom seconds frequency.</p>
</li>
<li>
<p><a href="https://github.com/ondrejbartas/sidekiq-cron">Sidekiq Cron</a> is another valid alternative. It uses the internal Sidekiq’s <code class="language-plaintext highlighter-rouge">Sidekiq::Poller</code> and has fewer dependencies, but also does not allow for seconds frequency.</p>
</li>
<li>
<p><a href="https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/">Kubernetes Cron Jobs</a> is another alternative when deploying to Kubernetes. It documents its <a href="https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations">limitations</a>, long bootup process and no seconds frequency make it not ideal.</p>
</li>
</ul>
<h2 id="final-thoughts">Final thoughts</h2>
<p>We have been running Sidekiq Scheduler in production for few months and it’s working reliably. We use the <code class="language-plaintext highlighter-rouge">cron</code> type of schedules exclusively and we use a Sidekiq plugin for unique jobs that guard us against the <a href="https://github.com/moove-it/sidekiq-scheduler#notes-about-running-on-multiple-hosts">potential of duplicate jobs</a>.</p>This blog post was originally published on the GoDaddy Engineering Blog.Implementing a custom Redis and in-memory bloom filter2018-09-11T20:00:00+00:002018-09-11T20:00:00+00:00https://dalibornasevic.com/posts/redis-ruby-bloom-filter<p><em>This blog post was originally published on the <a href="https://www.godaddy.com/engineering/2018/09/11/redis-ruby-bloom-filter/">GoDaddy Engineering Blog</a>.</em></p>
<p style="text-align: center">
<img src="/images/bloom_filter.png" alt="Bloom Filter" />
</p>
<p>In our email marketing and delivery products (<a href="https://www.godaddy.com/online-marketing/email-marketing">GoDaddy Email Marketing</a> and <a href="https://madmimi.com">Mad Mimi</a>) we deal with lots of data and work with some interesting data structures like bloom filters. We made an optimization that involved replacing an old bloom filter built in-memory and stored on Amazon S3 with a combination of a Redis bloom filter and an in-memory bloom filter. In this blog post we’ll go through the reasoning for this change as well as the details of the bloom filter implementation we landed on. Let’s first start with a brief introduction to bloom filters.</p>
<h3 id="what-is-a-bloom-filter">What is a bloom filter?</h3>
<p><a href="https://en.wikipedia.org/wiki/Bloom_filter">A Bloom filter</a> is a space-efficient probabilistic data structure, designed to test whether an element is a member of a set. Because of its probabilistic nature, it can guess if an element is in a set with a certain precision or tell for sure if an element is not in a set. That is an important detail to design around as we’ll see later. If you’re curious about the math involved, check out this <a href="https://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/">blog post</a> for more details.</p>
<h3 id="what-is-the-real-problem-we-are-solving">What is the real problem we are solving?</h3>
<p>In our email delivery products, each plan places limit on the number of unique contacts our customers can send emails to in a billing cycle. An interesting abuse scenario happens when a customer uploads a list of email addresses, sends a campaign to that list, deletes the list, and then imports another list with different email addresses and sends another campaign. We call this scenario “deleting and replacing” and to prevent it we need to keep a history of contacts that have received emails in a billing cycle.</p>
<h3 id="the-naive-solution">The naive solution</h3>
<p>The naive solution would be to check against the history of delivered emails in a billing cycle. While that might work for smaller data sets, it causes a performance problem when dealing with billions of contacts. That is where the opportunity for using the bloom filter data structure emerges.</p>
<h3 id="initial-bloom-filter-implementation">Initial bloom filter implementation</h3>
<p>Initially, we used the C-implementation from <a href="https://github.com/igrigorik/bloomfilter-rb">bloomfilter-rb</a> by building a bloom filter in-memory and uploading it to Amazon S3.</p>
<p>There were issues with this approach, the two most important ones being:</p>
<ul>
<li>concurrency: sending multiple campaigns at the same time overrides the filter</li>
<li>slow updates / restricted to bulk updates: fetching files from S3 is not fast and updating the filter for one-off sends is expensive and not doable</li>
</ul>
<p>With the re-design, we need a solution that will solve these problems.</p>
<h3 id="bloom-filter-implementation">Bloom filter implementation</h3>
<p>Our bloom filter will have as a dependency our <code class="language-plaintext highlighter-rouge">User</code> model. Let’s say the <code class="language-plaintext highlighter-rouge">User</code> model has three attributes: <code class="language-plaintext highlighter-rouge">id</code>, <code class="language-plaintext highlighter-rouge">max_contacts</code> and <code class="language-plaintext highlighter-rouge">billing_cycle_started_at</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">User</span> <span class="o">=</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:id</span><span class="p">,</span> <span class="ss">:max_contacts</span><span class="p">,</span> <span class="ss">:billing_cycle_started_at</span><span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">500</span><span class="p">,</span> <span class="no">Time</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">2018</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mo">01</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>
</code></pre></div></div>
<p>Here is our bloom filter implementation:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'zlib'</span>
<span class="k">class</span> <span class="nc">BloomFilter</span>
<span class="c1"># http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/</span>
<span class="c1"># 10 bits for 1% error approximation</span>
<span class="c1"># ~5 bits per 10 fold reduction in error approximation</span>
<span class="no">BITS_PER_ERROR_RATE</span> <span class="o">=</span> <span class="p">{</span>
<span class="mi">1</span> <span class="o">=></span> <span class="mi">10</span><span class="p">,</span>
<span class="mf">0.1</span> <span class="o">=></span> <span class="mi">15</span><span class="p">,</span>
<span class="mf">0.01</span> <span class="o">=></span> <span class="mi">20</span>
<span class="p">}</span>
<span class="no">HASH_FUNCTIONS_COEFICIENT</span> <span class="o">=</span> <span class="mf">0.7</span> <span class="c1"># Math.log(2)</span>
<span class="nb">attr_reader</span> <span class="ss">:error_rate</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="ss">error_rate: </span><span class="p">)</span>
<span class="vi">@user</span> <span class="o">=</span> <span class="n">user</span>
<span class="vi">@error_rate</span> <span class="o">=</span> <span class="n">error_rate</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">indexes_for</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="n">hash_functions</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span> <span class="no">Zlib</span><span class="p">.</span><span class="nf">crc32</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="n">key</span><span class="p">.</span><span class="nf">to_s</span><span class="p">.</span><span class="nf">strip</span><span class="p">.</span><span class="nf">downcase</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">i</span><span class="o">+</span><span class="n">seed</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span> <span class="o">%</span> <span class="n">size</span> <span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">hash_functions</span>
<span class="vi">@hash_functions</span> <span class="o">||=</span> <span class="p">(</span><span class="n">bits</span> <span class="o">*</span> <span class="no">HASH_FUNCTIONS_COEFICIENT</span><span class="p">).</span><span class="nf">ceil</span><span class="p">.</span><span class="nf">to_i</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">seed</span>
<span class="vi">@seed</span> <span class="o">||=</span> <span class="n">since</span><span class="p">.</span><span class="nf">to_i</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">since</span>
<span class="vi">@since</span> <span class="o">||=</span> <span class="vi">@user</span><span class="p">.</span><span class="nf">billing_cycle_started_at</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">size</span>
<span class="vi">@size</span> <span class="o">||=</span> <span class="n">bits</span> <span class="o">*</span> <span class="vi">@user</span><span class="p">.</span><span class="nf">max_contacts</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">bits</span>
<span class="vi">@bits</span> <span class="o">||=</span> <span class="no">BITS_PER_ERROR_RATE</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="n">error_rate</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">fingerprint</span>
<span class="vi">@fingerprint</span> <span class="o">||=</span> <span class="p">[</span><span class="vi">@user</span><span class="p">.</span><span class="nf">id</span><span class="p">,</span> <span class="vi">@user</span><span class="p">.</span><span class="nf">max_contacts</span><span class="p">,</span> <span class="n">seed</span><span class="p">].</span><span class="nf">join</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The most important part of the bloom filter is the method that generates the indexes for a given key, <code class="language-plaintext highlighter-rouge">indexes_for(key)</code>.</p>
<p>Here’s an example usage:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bloom_filter</span> <span class="o">=</span> <span class="no">BloomFilter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="ss">error_rate: </span><span class="mi">1</span><span class="p">)</span>
<span class="n">bloom_filter</span><span class="p">.</span><span class="nf">indexes_for</span><span class="p">(</span><span class="s1">'user1@example.com'</span><span class="p">)</span>
<span class="c1"># [2872, 110, 3108, 2498, 4409, 751, 2861]</span>
<span class="n">bloom_filter</span><span class="p">.</span><span class="nf">indexes_for</span><span class="p">(</span><span class="s1">'user2@example.com'</span><span class="p">)</span>
<span class="c1"># [3992, 2262, 1788, 1970, 3185, 4135, 4957]</span>
</code></pre></div></div>
<p>As a hashing function we use <a href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check">CRC32</a> with a custom seed per user that is the <code class="language-plaintext highlighter-rouge">billing_cycle_started_at</code> and the number of hashing functions based on the error rate (in this example we use an error rate of 1%).</p>
<p>For the bloom filter to return consistent hashing indexes during a user’s billing cycle, the input parameters it depends on (<code class="language-plaintext highlighter-rouge">error_rate</code>, <code class="language-plaintext highlighter-rouge">@user.billing_cycle_started_at</code> and <code class="language-plaintext highlighter-rouge">@user.max_contacts</code>) should not change for the billing cycle until it gets reset. That is the <code class="language-plaintext highlighter-rouge">fingerprint</code> that, as we’ll see later, we’ll use as a redis key for the Redis bloom filter.</p>
<h3 id="redis-bloom-filter">Redis bloom filter</h3>
<p>Redis supports <code class="language-plaintext highlighter-rouge">getbit</code> and <code class="language-plaintext highlighter-rouge">setbit</code> operations for the <a href="https://redis.io/commands#string">String</a> type that make the individual updates simple. There is a special data type for bloom filters called <a href="https://redislabs.com/blog/rebloom-bloom-filter-datatype-redis/">rebloom</a> if you want to explore it, but here we’ll just use a standard Redis type.</p>
<p>Here is our Redis bloom filter implementation:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'redis'</span>
<span class="k">class</span> <span class="nc">RedisBloomFilter</span>
<span class="no">MAX_TTL</span> <span class="o">=</span> <span class="mi">31</span> <span class="o">*</span> <span class="mi">24</span> <span class="o">*</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span> <span class="c1"># max days in a month</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
<span class="vi">@user</span> <span class="o">=</span> <span class="n">user</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="n">keys</span><span class="p">)</span>
<span class="n">existing_indexes</span> <span class="o">=</span> <span class="n">redis</span><span class="p">.</span><span class="nf">pipelined</span> <span class="k">do</span>
<span class="n">keys</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span>
<span class="n">bloom</span><span class="p">.</span><span class="nf">indexes_for</span><span class="p">(</span><span class="n">key</span><span class="p">).</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">index</span><span class="o">|</span> <span class="n">redis</span><span class="p">.</span><span class="nf">setbit</span><span class="p">(</span><span class="n">filter_key</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">new_keys_count</span> <span class="o">=</span> <span class="n">keys</span><span class="p">.</span><span class="nf">length</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">count</span> <span class="p">{</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span>
<span class="n">existing_indexes</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="n">bloom</span><span class="p">.</span><span class="nf">hash_functions</span><span class="p">,</span> <span class="n">bloom</span><span class="p">.</span><span class="nf">hash_functions</span><span class="p">].</span><span class="nf">include?</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">total</span> <span class="o">=</span> <span class="n">redis</span><span class="p">.</span><span class="nf">incrby</span><span class="p">(</span><span class="n">counter_key</span><span class="p">,</span> <span class="n">new_keys_count</span><span class="p">)</span>
<span class="k">if</span> <span class="n">total</span> <span class="o">==</span> <span class="n">new_keys_count</span>
<span class="n">redis</span><span class="p">.</span><span class="nf">expire</span><span class="p">(</span><span class="n">filter_key</span><span class="p">,</span> <span class="no">MAX_TTL</span><span class="p">.</span><span class="nf">to_i</span><span class="p">)</span>
<span class="n">redis</span><span class="p">.</span><span class="nf">expire</span><span class="p">(</span><span class="n">counter_key</span><span class="p">,</span> <span class="no">MAX_TTL</span><span class="p">.</span><span class="nf">to_i</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">count</span>
<span class="n">redis</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">counter_key</span><span class="p">).</span><span class="nf">to_i</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">include?</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="n">values</span> <span class="o">=</span> <span class="n">redis</span><span class="p">.</span><span class="nf">pipelined</span> <span class="k">do</span>
<span class="n">bloom</span><span class="p">.</span><span class="nf">indexes_for</span><span class="p">(</span><span class="n">key</span><span class="p">).</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">index</span><span class="o">|</span> <span class="n">redis</span><span class="p">.</span><span class="nf">getbit</span><span class="p">(</span><span class="n">filter_key</span><span class="p">,</span> <span class="n">index</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
<span class="o">!</span><span class="n">values</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">field</span>
<span class="n">redis</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">filter_key</span><span class="p">)</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">redis</span>
<span class="vi">@redis</span> <span class="o">||=</span> <span class="no">Redis</span><span class="p">.</span><span class="nf">new</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">bloom</span>
<span class="vi">@bloom</span> <span class="o">||=</span> <span class="no">BloomFilter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vi">@user</span><span class="p">,</span> <span class="ss">error_rate: </span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">filter_key</span>
<span class="vi">@filter_key</span> <span class="o">||=</span> <span class="s2">"bloom:filter:</span><span class="si">#{</span><span class="n">key_suffix</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">counter_key</span>
<span class="vi">@counter_key</span> <span class="o">||=</span> <span class="s2">"bloom:counter:</span><span class="si">#{</span><span class="n">key_suffix</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">key_suffix</span>
<span class="vi">@key_suffix</span> <span class="o">||=</span> <span class="n">bloom</span><span class="p">.</span><span class="nf">fingerprint</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">RedisBloomFilter</code> uses the <code class="language-plaintext highlighter-rouge">BloomFilter</code> implementation to produce the indexes that it manipulates in Redis. It also implements a counter of how many unique elements are added to the filter by increasing the count when it detects a unique insert. Using an error rate of 1% for the bloom filter means that the count can be for 1% lower than the actual count and in our case that is totally fine as we allow for a bigger grace overage to customer plans. It uses redis <code class="language-plaintext highlighter-rouge">pipelined</code> that sends operations in batch to avoid latency and improve performance by about 5-6 times. It also sets a TTLs on the keys to expire them after a month and it exposes the field for the in-memory filter.</p>
<p>Here’s an example usage:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">redis_bloom_filter</span> <span class="o">=</span> <span class="no">RedisBloomFilter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
<span class="n">redis_bloom_filter</span><span class="p">.</span><span class="nf">insert</span><span class="p">([</span><span class="s1">'user1@example.com'</span><span class="p">,</span> <span class="s1">'user2@example.com'</span><span class="p">])</span>
<span class="n">redis_bloom_filter</span><span class="p">.</span><span class="nf">count</span>
<span class="c1"># => 2</span>
<span class="n">redis_bloom_filter</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="s1">'user1@example.com'</span><span class="p">)</span>
<span class="c1"># => true</span>
<span class="n">redis_bloom_filter</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="s1">'user2@example.com'</span><span class="p">)</span>
<span class="c1"># => true</span>
<span class="n">redis_bloom_filter</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="s1">'user3@example.com'</span><span class="p">)</span>
<span class="c1"># => false</span>
</code></pre></div></div>
<h3 id="in-memory-bloom-filter">In-memory Bloom filter</h3>
<p>With the Redis implementation we solved half of the problem. We have a way to concurrently and quickly add elements to the bloom filter in Redis, but we still need a way to check if a bloom filter could accept a given set of elements without actually inserting the elements in the filter. This is useful when we want to prevent a list import before importing the list or stop a campaign from sending before starting it.</p>
<p>To achieve that, we need an in-memory filter that we can initialize with the state of the Redis bloom filter and <a href="https://github.com/peterc/bitarray">bitarray</a> can help us with that. We have an important <a href="https://github.com/peterc/bitarray/pull/9">PR</a> that changes the storage representation i.e. the bits order in bitarray to match the way Redis stores them internally and a way to initialize a bitarray with a given field. To test it, you can fetch the <code class="language-plaintext highlighter-rouge">BitArray</code> that includes that patch from <a href="https://gist.github.com/dalibor/70b9f118b545880ece6381513e0123d2">here</a>.</p>
<p>Here is the implementation of the in-memory bloom filter:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">TemporaryBloomFilter</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
<span class="vi">@user</span> <span class="o">=</span> <span class="n">user</span>
<span class="vi">@bloom</span> <span class="o">=</span> <span class="no">BloomFilter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vi">@user</span><span class="p">,</span> <span class="ss">error_rate: </span><span class="mi">1</span><span class="p">)</span>
<span class="vi">@redis_filter</span> <span class="o">=</span> <span class="no">RedisBloomFilter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vi">@user</span><span class="p">)</span>
<span class="vi">@count</span> <span class="o">=</span> <span class="vi">@redis_filter</span><span class="p">.</span><span class="nf">count</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">count</span>
<span class="vi">@count</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="n">keys</span><span class="p">)</span>
<span class="n">keys</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span>
<span class="n">previous_indexes</span> <span class="o">=</span> <span class="vi">@bloom</span><span class="p">.</span><span class="nf">indexes_for</span><span class="p">(</span><span class="n">key</span><span class="p">).</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">index</span><span class="o">|</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">bit_array</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
<span class="n">bit_array</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">value</span>
<span class="p">}</span>
<span class="vi">@count</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">previous_indexes</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">include?</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="o">!</span><span class="vi">@bloom</span><span class="p">.</span><span class="nf">indexes_for</span><span class="p">(</span><span class="n">key</span><span class="p">).</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">index</span><span class="o">|</span> <span class="n">bit_array</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="p">}.</span><span class="nf">include?</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">over_limit?</span>
<span class="n">plan_over_limit_count</span> <span class="o">></span> <span class="mi">0</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">plan_over_limit_count</span>
<span class="vi">@count</span> <span class="o">-</span> <span class="vi">@user</span><span class="p">.</span><span class="nf">plan_contacts</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">bit_array</span>
<span class="vi">@bit_array</span> <span class="o">||=</span> <span class="n">prepare_bit_array</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">prepare_bit_array</span>
<span class="n">field</span> <span class="o">=</span> <span class="vi">@redis_filter</span><span class="p">.</span><span class="nf">field</span><span class="p">.</span><span class="nf">to_s</span>
<span class="n">current_field_length</span> <span class="o">=</span> <span class="n">field</span><span class="p">.</span><span class="nf">length</span>
<span class="n">max_field_length</span> <span class="o">=</span> <span class="p">(</span><span class="vi">@bloom</span><span class="p">.</span><span class="nf">size</span> <span class="o">/</span> <span class="mi">8</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">current_field_length</span> <span class="o"><</span> <span class="n">max_field_length</span>
<span class="n">field</span> <span class="o">+=</span> <span class="s2">"</span><span class="se">\0</span><span class="s2">"</span> <span class="o">*</span> <span class="p">(</span><span class="n">max_field_length</span> <span class="o">-</span> <span class="n">current_field_length</span><span class="p">)</span>
<span class="k">end</span>
<span class="no">BitArray</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vi">@bloom</span><span class="p">.</span><span class="nf">size</span><span class="p">,</span> <span class="n">field</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And an example usage:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">temporary_bloom_filter</span> <span class="o">=</span> <span class="no">TemporaryBloomFilter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
<span class="n">temporary_bloom_filter</span><span class="p">.</span><span class="nf">insert</span><span class="p">([</span><span class="s1">'user3@example.com'</span><span class="p">,</span> <span class="s1">'user4@example.com'</span><span class="p">,</span> <span class="s1">'user5@example.com'</span><span class="p">])</span>
<span class="n">temporary_bloom_filter</span><span class="p">.</span><span class="nf">count</span>
<span class="c1"># => 5</span>
<span class="n">temporary_bloom_filter</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="s1">'user5@example.com'</span><span class="p">)</span>
<span class="c1"># => true</span>
<span class="n">temporary_bloom_filter</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="s1">'user6@example.com'</span><span class="p">)</span>
<span class="c1"># => false</span>
</code></pre></div></div>
<h3 id="performance">Performance</h3>
<p>Ruby’s in-memory implementation is few times slower than the C-implementation in <a href="https://github.com/igrigorik/bloomfilter-rb">bloomfilter-rb</a>, but still fast enough as it can process 1 million items in 5-10 seconds both calculating hash functions and doing BitArray inserts.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">total_items</span> <span class="o">=</span> <span class="mi">1_000_000</span>
<span class="n">t1</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">now</span>
<span class="n">bf</span> <span class="o">=</span> <span class="no">BloomFilter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="ss">error_rate: </span><span class="mi">1</span><span class="p">)</span>
<span class="n">ba</span> <span class="o">=</span> <span class="no">BitArray</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">total_items</span><span class="p">)</span>
<span class="n">total_items</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span>
<span class="n">bf</span><span class="p">.</span><span class="nf">indexes_for</span><span class="p">(</span><span class="s2">"user</span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">@example.com"</span><span class="p">).</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">j</span><span class="o">|</span>
<span class="n">ba</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="kp">true</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">t2</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">now</span>
<span class="nb">puts</span> <span class="n">t2</span><span class="o">-</span><span class="n">t1</span>
<span class="c1"># => 7.485282645</span>
</code></pre></div></div>
<p>Redis performance is pretty solid as well. It can handle around 70-80k operations per second and when using <code class="language-plaintext highlighter-rouge">pipelined</code> mode for our batches of 350, we get 5-6 times more operations:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">$</span> <span class="n">redis</span><span class="o">-</span><span class="n">benchmark</span> <span class="o">-</span><span class="n">q</span> <span class="o">-</span><span class="n">n</span> <span class="mi">100000</span> <span class="o">-</span><span class="no">P</span> <span class="mi">350</span>
<span class="no">PING_INLINE</span><span class="p">:</span> <span class="mf">373134.31</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">PING_BULK</span><span class="p">:</span> <span class="mf">421940.94</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">SET</span><span class="p">:</span> <span class="mf">369003.69</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">GET</span><span class="p">:</span> <span class="mf">396825.38</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">INCR</span><span class="p">:</span> <span class="mf">344827.59</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">LPUSH</span><span class="p">:</span> <span class="mf">362318.84</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">LPOP</span><span class="p">:</span> <span class="mf">389105.06</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">SADD</span><span class="p">:</span> <span class="mf">353356.91</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">SPOP</span><span class="p">:</span> <span class="mf">361010.81</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">LPUSH</span> <span class="p">(</span><span class="n">needed</span> <span class="n">to</span> <span class="n">benchmark</span> <span class="no">LRANGE</span><span class="p">):</span> <span class="mf">370370.34</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">LRANGE_100</span> <span class="p">(</span><span class="n">first</span> <span class="mi">100</span> <span class="n">elements</span><span class="p">):</span> <span class="mf">61050.06</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">LRANGE_300</span> <span class="p">(</span><span class="n">first</span> <span class="mi">300</span> <span class="n">elements</span><span class="p">):</span> <span class="mf">17494.75</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">LRANGE_500</span> <span class="p">(</span><span class="n">first</span> <span class="mi">450</span> <span class="n">elements</span><span class="p">):</span> <span class="mf">11043.62</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">LRANGE_600</span> <span class="p">(</span><span class="n">first</span> <span class="mi">600</span> <span class="n">elements</span><span class="p">):</span> <span class="mf">7965.59</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
<span class="no">MSET</span> <span class="p">(</span><span class="mi">10</span> <span class="n">keys</span><span class="p">):</span> <span class="mf">202839.75</span> <span class="n">requests</span> <span class="n">per</span> <span class="n">second</span>
</code></pre></div></div>
<h3 id="conclusion">Conclusion</h3>
<p>This custom implementation of a bloom filter turned out pretty solid and robust in our production environment. We have a Kibana dashboard monitoring the bloom filter updates over time giving us much better insights than our previous implementation.</p>This blog post was originally published on the GoDaddy Engineering Blog.Refactoring Rails configs for deploy to Kubernetes2018-02-27T07:30:00+00:002018-02-27T07:30:00+00:00https://dalibornasevic.com/posts/refactoring-rails-configs-for-deploy-to-kubernetes<p>Recently, I worked on a project to containerize one of our Rails apps. The goal was to add per pull request verification deploys to Kubernetes as part of the CICD pipeline. During that work I faced a need to re-design how we manage the configs in the application and I will share some thoughts about the approach. But before we jump of that, let’s explain the concept of per pull request verification deploys.</p>
<h3 id="per-pull-request-verification-deploys">Per Pull Request verification deploys</h3>
<p>We use Jenkins for Continuous Integration and Continuous Delivery (CICD). Whenever we merge a pull request to the master branch (CI), the pipeline will deploy the changes to the target environments (CD). It starts by deploying to staging environments and finishes with a deploy to production environment.</p>
<p>We introduced another verification step to this pipeline. On each successful pull request build, it deploys the changes to a short-lived location. This temporary deploy is used for QA, manual testing and verification against more realistic data and environment before deploying to production. Once verified, the pull request can get merged in master that triggers the automated production deploy.</p>
<p>We deploy the app using Capistrano to OpenStack and bare metal servers. For the short-lived verification deploys we decided to explore and deploy to Kubernetes cluster. So, my main goal for the configs refactor was to have a solution that works well for both deploy scenarios.</p>
<h3 id="config-refactor-design-goals">Config refactor design goals</h3>
<ol>
<li>Configs that work in different scenarios:
<ul>
<li>local app</li>
<li>local app using docker containers</li>
<li>local app using docker-compose</li>
<li>Capistrano deploy to OpenStack and bare metal servers</li>
<li>Kubernetes deploy to minikube and real clusters</li>
</ul>
</li>
<li>Flexibility in how configs are defined</li>
</ol>
<p>Some configs like <code class="language-plaintext highlighter-rouge">database.yml</code> and <code class="language-plaintext highlighter-rouge">redis.yml</code> are in YAML format and other are using environment variables. I wanted to keep the flexibility of using YAML configs for the more complex configurations instead of forcing environment variables for everything.</p>
<ol>
<li>Keep everything but the secrets config in source control</li>
</ol>
<p>Managing many config files, especially when deploying and running the app in different ways, increases maintenance complexity. The config files that are not stored in source control needs to become visible to the app during deploy. The goal here is to have at most a single file that’s not in source control. For Capistrano deploys it’s a single shared file with secrets to link during deploy. And, for Kubernetes deploys it’s a single Secret resource that’s updated on change.</p>
<p>By keeping as much of the configs in source control, we’ll do regular reviews on any config changes before merging to master. This is expecially important for much more complex configs like the one we have for <a href="http://localhost:3000/posts/69-managing-activerecord-connections-with-octoshark">Octoshark</a> where we connect to around 50 MySQL instances.</p>
<h3 id="using-environment-variables-with-dotenv">Using environment variables with dotenv</h3>
<p>One of the tenets of <a href="https://12factor.net/">Twelve-Factor app</a> methodology is storing <a href="https://12factor.net/config">configs in the environment</a>. Docker, docker-compose and Kubernetes have a built-in ways for passing environment variables to the containers.</p>
<p>The <a href="https://github.com/bkeepers/dotenv">dotenv</a> gem can help us replicate that by loading environment variables from config files. Once we include <code class="language-plaintext highlighter-rouge">dotenv</code> in the Gemfile, all we need to add is the following line to <code class="language-plaintext highlighter-rouge">config/application.rb</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Dotenv</span><span class="p">.</span><span class="nf">overload</span><span class="p">(</span><span class="s2">".env"</span><span class="p">,</span> <span class="s2">".env.</span><span class="si">#{</span><span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="s2">".env.</span><span class="si">#{</span><span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="si">}</span><span class="s2">.secrets"</span><span class="p">)</span>
</code></pre></div></div>
<p>Here we use the overload feature of <code class="language-plaintext highlighter-rouge">Dotenv</code>. For production environment for example, it will first load the <code class="language-plaintext highlighter-rouge">.env</code> file, then <code class="language-plaintext highlighter-rouge">.env.production</code> and finally the <code class="language-plaintext highlighter-rouge">.env.production.secrets</code> file.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">.</span><span class="nf">env</span> <span class="c1"># keeps the shared variables across all environments</span>
<span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">production</span> <span class="c1"># keeps the environment specific variables</span>
<span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">production</span><span class="p">.</span><span class="nf">secret</span> <span class="c1"># keeps the environment specific secrets</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">.env.production.secrets</code> file is the one that’s ignored in source control and it is used to keep the secrets as well as other configuration values that change between environments.</p>
<p>In the context of containers, we have the flexibility to override any of these environment variables which makes this config strategy work in both scenarios.</p>
<h3 id="using-yaml-configs-with-environment-variables">Using YAML configs with environment variables</h3>
<p>We can use YAML configs like <code class="language-plaintext highlighter-rouge">database.yml</code> with environment variables. We just change it to read the secrets and values that change between environments from environment variables. Where it makes sense we can also use a fallback, i.e. default values. Here’s an example <code class="language-plaintext highlighter-rouge">database.yml</code> config:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">development</span><span class="pi">:</span>
<span class="na">adapter</span><span class="pi">:</span> <span class="s">mysql2</span>
<span class="na">encoding</span><span class="pi">:</span> <span class="s">utf8</span>
<span class="na">reconnect</span><span class="pi">:</span> <span class="no">false</span>
<span class="na">pool</span><span class="pi">:</span> <span class="m">5</span>
<span class="na">database</span><span class="pi">:</span> <span class="s"><%= ENV['MYAPP_DATABASE'] || 'myapp_development' %></span>
<span class="na">username</span><span class="pi">:</span> <span class="s"><%= ENV['MYAPP_USERNAME'] %></span>
<span class="na">password</span><span class="pi">:</span> <span class="s"><%= ENV['MYAPP_PASSWORD'] %></span>
<span class="na">host</span><span class="pi">:</span> <span class="s"><%= ENV['MYAPP_HOST'] || 'localhost' %></span>
<span class="na">port</span><span class="pi">:</span> <span class="s"><%= ENV['MYAPP_PORT'] || 3306 %></span>
</code></pre></div></div>
<p>Rails parses ERB tags by default when interpreting <code class="language-plaintext highlighter-rouge">database.yml</code> config. But for the custom configs we might have in the app, like the Redis one below for example, we need to replicate that behaviour.</p>
<p>Here is a very simple <code class="language-plaintext highlighter-rouge">ConfigParser</code> class that does that:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'yaml'</span>
<span class="nb">require</span> <span class="s1">'erb'</span>
<span class="k">class</span> <span class="nc">ConfigParser</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">environment</span><span class="p">)</span>
<span class="no">YAML</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="no">ERB</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">IO</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="n">file</span><span class="p">)).</span><span class="nf">result</span><span class="p">)[</span><span class="n">environment</span><span class="p">]</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Then for a Redis config:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">:development:</span>
<span class="s">:host: <%= ENV['REDIS_HOST'] || 'localhost' %></span>
<span class="s">:port: <%= ENV['REDIS_PORT'] || 6379 %></span>
<span class="s">:password: <%= ENV['REDIS_PASSWORD'] %></span>
<span class="s">:db: <%= ENV['REDIS_DB'] || 10 %></span>
<span class="s">:reconnect_attempts: </span><span class="m">3</span>
<span class="s">:timeout: </span><span class="m">2</span>
</code></pre></div></div>
<p>We can use the <code class="language-plaintext highlighter-rouge">ConfigParser</code> like:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">redis_config</span> <span class="o">=</span> <span class="no">ConfigParser</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="s1">'config/redis.yml'</span><span class="p">,</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">to_sym</span><span class="p">)</span>
<span class="n">redis_conn</span> <span class="o">=</span> <span class="no">Redis</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">redis_conf</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="rails-52-encrypted-credentials">Rails 5.2 Encrypted Credentials</h3>
<p>With Rails 5.2 being just around the corner, and specifically the <a href="https://www.engineyard.com/blog/rails-encrypted-credentials-on-rails-5.2">Encrypted Credentials</a> feature, we have the option to keep all the secrets encrypted in source control.</p>
<p>We can put all the secrets from the different environments <code class="language-plaintext highlighter-rouge">.env.development.secrets</code>, <code class="language-plaintext highlighter-rouge">.env.test.secrets</code> and <code class="language-plaintext highlighter-rouge">.env.production.secrets</code> in <code class="language-plaintext highlighter-rouge">config/credentials.yml.enc</code> and then the only value that the deploy target will need as a dependency is the <code class="language-plaintext highlighter-rouge">config/master.key</code> encryption key.</p>
<p>This approach of storing production secrets in codebase, although encrypted, might be sensible to some organizations.</p>
<h3 id="final-thoughts">Final thoughts</h3>
<p>I’ve considered using different Rails environments as an alternative approach. That increases complexity and does not meet some of the configs design goals. There are also Rails env checks in the codebase that behave as feature flags. So, the overloading approach with environment variables and the flexibility of using YAML for more complex configs works pretty well in all these scenarios.</p>Recently, I worked on a project to containerize one of our Rails apps. The goal was to add per pull request verification deploys to Kubernetes as part of the CICD pipeline. During that work I faced a need to re-design how we manage the configs in the application and I will share some thoughts about the approach. But before we jump of that, let’s explain the concept of per pull request verification deploys.A Walkthrough for Handling and Testing Exceptions2017-10-22T09:00:00+00:002017-10-22T09:00:00+00:00https://dalibornasevic.com/posts/handling-and-testing-exceptions<p>In a previous blog posts I wrote about the problem of <a href="/posts/52-don-t-overuse-exceptions">overusing exceptions</a>, and in this one we’ll look at some exception handling and testing practices.</p>
<p>To start with, let’s define <code class="language-plaintext highlighter-rouge">LinkCounter</code> class. <code class="language-plaintext highlighter-rouge">LinkCounter</code> counts how many links are on a web page. It is initialized with a url, it uses <a href="https://github.com/lostisland/faraday">Faraday</a> HTTP client to fetch the page content and it uses <a href="https://github.com/sparklemotion/nokogiri">Nokogiri</a> to parse the HTML content.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'faraday'</span>
<span class="nb">require</span> <span class="s1">'nokogiri'</span>
<span class="k">class</span> <span class="nc">LinkCounter</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="vi">@url</span> <span class="o">=</span> <span class="n">url</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">count</span>
<span class="n">doc</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="s1">'a'</span><span class="p">).</span><span class="nf">count</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">doc</span>
<span class="no">Nokogiri</span><span class="o">::</span><span class="no">HTML</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">content</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="vi">@url</span><span class="p">).</span><span class="nf">body</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">connection</span>
<span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Then, we can use it like this:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">puts</span> <span class="no">LinkCounter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s1">'https://example.com'</span><span class="p">).</span><span class="nf">count</span> <span class="c1"># 1</span>
</code></pre></div></div>
<p>Pretty simple so far.</p>
<h3 id="what-could-possibly-go-wrong">What could possibly go wrong?</h3>
<p>To improve the robustness of our <code class="language-plaintext highlighter-rouge">LinkCounter</code> we need to think about what could fail? We identify the Faraday’s <code class="language-plaintext highlighter-rouge">connection.get</code> call, doing the <code class="language-plaintext highlighter-rouge">GET</code> HTTP request, as one with highest probably of failure because it depends on the reliability of the network.</p>
<blockquote>
<p>Always rescue very specific exceptions. Never rescue <code class="language-plaintext highlighter-rouge">Exception</code> and avoid rescuing <code class="language-plaintext highlighter-rouge">StandardError</code> too because it can hide unexpected errors like <code class="language-plaintext highlighter-rouge">NameError</code> and <code class="language-plaintext highlighter-rouge">NoMethodError</code>. See ruby’s <a href="http://blog.nicksieger.com/articles/2006/09/06/rubys-exception-hierarchy/">exception hierarchy</a>.</p>
</blockquote>
<p>In order to rescue the very specific exceptions, we need to figure out all the exceptions that Faraday can raise. Good libraries usually would have a separate file defining all the errors like it’s the case with <a href="https://github.com/lostisland/faraday/blob/master/lib/faraday/error.rb">Faraday errors</a> or <a href="https://github.com/redis/redis-rb/blob/master/lib/redis/errors.rb">Redis errors</a> as another example.</p>
<p>Looking at the Faraday error definitions we can see it has the following hierarchy:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">StandardError</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">Error</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">MissingDependency</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">ClientError</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">ConnectionFailed</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">ResourceNotFound</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">ParsingError</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">TimeoutError</span>
<span class="no">Faraday</span><span class="o">::</span><span class="no">SSLError</span>
</code></pre></div></div>
<h3 id="exploring-faraday-errors">Exploring Faraday errors</h3>
<p>We need to explore and understand at what conditions each of the Faraday errors could happen.</p>
<p>So, if we define very small open timeout, we’ll see <code class="language-plaintext highlighter-rouge">Faraday::ConnectionFailed</code> error.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">request: </span><span class="p">{</span> <span class="ss">open_timeout: </span><span class="mf">0.1</span> <span class="p">}).</span><span class="nf">get</span><span class="p">(</span><span class="s1">'https://example.com'</span><span class="p">)</span>
<span class="c1"># Faraday::ConnectionFailed: execution expired</span>
</code></pre></div></div>
<p>If we define small read timeout, we’ll get <code class="language-plaintext highlighter-rouge">Faraday::TimeoutError</code>.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">request: </span><span class="p">{</span> <span class="ss">open_timeout: </span><span class="mi">1</span><span class="p">,</span> <span class="ss">timeout: </span><span class="mf">0.1</span> <span class="p">}).</span>
<span class="nf">get</span><span class="p">(</span><span class="s1">'https://example.com'</span><span class="p">)</span>
<span class="c1"># Faraday::TimeoutError: Net::ReadTimeout</span>
</code></pre></div></div>
<p>Note here that if we set only the <code class="language-plaintext highlighter-rouge">timeout</code> value, the <code class="language-plaintext highlighter-rouge">open_timeout</code> will use the same value and we wouldn’t be able to reproduce the <code class="language-plaintext highlighter-rouge">Faraday::TimeoutError</code> error, but we’ll get <code class="language-plaintext highlighter-rouge">Faraday::ConnectionFailed</code> error again.</p>
<p>For docs on timeouts in other popular Ruby gems, you can check out this popular <a href="https://github.com/ankane/the-ultimate-guide-to-ruby-timeouts">github repo</a>.</p>
<p>If we try <code class="language-plaintext highlighter-rouge">GET</code> request to a nonexistent host we get <code class="language-plaintext highlighter-rouge">Faraday::ConnectionFailed</code>.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Faraday</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s1">'https://example.nonexistent.com'</span><span class="p">)</span>
<span class="c1"># Faraday::ConnectionFailed: Failed to open TCP connection to example.nonexistent.com:443 (getaddrinfo: Name or service not known)</span>
</code></pre></div></div>
<p>Note that in this case we also have a nice exception message <code class="language-plaintext highlighter-rouge">getaddrinfo: Name or service not known</code> that distinguishes this error from the error that happens when a connection cannot be opened for an existing host.</p>
<p>If we request a website without SSL support, we get <code class="language-plaintext highlighter-rouge">Faraday::SSLError</code>.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Faraday</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s1">'https://ruby.mk'</span><span class="p">)</span>
<span class="c1"># Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed</span>
</code></pre></div></div>
<p>Finally, if we configure Faraday to raise exceptions on 40x and 50x responses, we’ll see it raises <code class="language-plaintext highlighter-rouge">Faraday::ResourceNotFound</code> error for 404 response:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">faraday</span><span class="o">|</span>
<span class="n">faraday</span><span class="p">.</span><span class="nf">use</span> <span class="no">Faraday</span><span class="o">::</span><span class="no">Response</span><span class="o">::</span><span class="no">RaiseError</span>
<span class="n">faraday</span><span class="p">.</span><span class="nf">adapter</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">default_adapter</span>
<span class="k">end</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s1">'https://httpstat.us/404'</span><span class="p">)</span>
<span class="c1"># Faraday::ResourceNotFound: the server responded with status 404</span>
</code></pre></div></div>
<p>And, we’ll get <code class="language-plaintext highlighter-rouge">Faraday::ClientError</code> for 500 response:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">faraday</span><span class="o">|</span>
<span class="n">faraday</span><span class="p">.</span><span class="nf">use</span> <span class="no">Faraday</span><span class="o">::</span><span class="no">Response</span><span class="o">::</span><span class="no">RaiseError</span>
<span class="n">faraday</span><span class="p">.</span><span class="nf">adapter</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">default_adapter</span>
<span class="k">end</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s1">'https://httpstat.us/500'</span><span class="p">)</span>
<span class="c1"># Faraday::ClientError: the server responded with status 500</span>
</code></pre></div></div>
<p>Note that in the last two examples I use this handy <a href="https://httpstat.us/">httpstat.us</a> service that returns the requested status code.</p>
<h3 id="handling-exceptions">Handling exceptions</h3>
<p>Based on our previous exploration, we conclude that we will retry <code class="language-plaintext highlighter-rouge">Faraday::TimeoutError</code> and <code class="language-plaintext highlighter-rouge">Faraday::ConnectionFailed</code> errors except the case when the host does not exist, i.e. exception message is <code class="language-plaintext highlighter-rouge">getaddrinfo: Name or service not known</code>.</p>
<p>Let’s define a general purpose <code class="language-plaintext highlighter-rouge">Retryable</code> module for that.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Retryable</span>
<span class="no">SLEEP_INTERVAL</span> <span class="o">=</span> <span class="mf">0.4</span>
<span class="k">def</span> <span class="nf">with_retries</span><span class="p">(</span><span class="ss">retries: </span><span class="mi">3</span><span class="p">,</span> <span class="ss">retry_skip_reason: </span><span class="kp">nil</span><span class="p">,</span> <span class="ss">rescue_class: </span><span class="p">)</span>
<span class="n">tries</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">begin</span>
<span class="k">yield</span>
<span class="k">rescue</span> <span class="o">*</span><span class="n">rescue_class</span> <span class="o">=></span> <span class="n">e</span>
<span class="n">tries</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">tries</span> <span class="o"><=</span> <span class="n">retries</span> <span class="o">&&</span> <span class="p">(</span><span class="n">retry_skip_reason</span><span class="p">.</span><span class="nf">nil?</span> <span class="o">||</span> <span class="o">!</span><span class="n">e</span><span class="p">.</span><span class="nf">message</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="n">retry_skip_reason</span><span class="p">))</span>
<span class="nb">sleep</span> <span class="n">sleep_interval</span><span class="p">(</span><span class="n">tries</span><span class="p">)</span>
<span class="k">retry</span>
<span class="k">else</span>
<span class="k">raise</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">sleep_interval</span><span class="p">(</span><span class="n">tries</span><span class="p">)</span>
<span class="p">(</span><span class="no">SLEEP_INTERVAL</span> <span class="o">+</span> <span class="nb">rand</span><span class="p">(</span><span class="mf">0.0</span><span class="o">..</span><span class="mf">1.0</span><span class="p">))</span> <span class="o">*</span> <span class="n">tries</span> <span class="o">**</span> <span class="mi">2</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>From this module we can use <code class="language-plaintext highlighter-rouge">with_retries</code> method that by default will retry 3 times the error with an exponential and randomized sleep interval. It also accepts an option <code class="language-plaintext highlighter-rouge">retry_skip_reason</code> to skip retry when a specific exception message matches the skip reason.</p>
<p>We can now use the <code class="language-plaintext highlighter-rouge">Retryable</code> module with <code class="language-plaintext highlighter-rouge">LinkCounter</code> as follows:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">LinkCounter</span>
<span class="kp">include</span> <span class="no">Retryable</span>
<span class="c1"># the rest of the code</span>
<span class="k">def</span> <span class="nf">content</span>
<span class="n">with_retries</span><span class="p">(</span>
<span class="ss">rescue_class: </span><span class="p">[</span><span class="no">Faraday</span><span class="o">::</span><span class="no">TimeoutError</span><span class="p">,</span> <span class="no">Faraday</span><span class="o">::</span><span class="no">ConnectionFailed</span><span class="p">],</span>
<span class="ss">retry_skip_reason: </span><span class="s1">'getaddrinfo: Name or service not known'</span>
<span class="p">)</span> <span class="k">do</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="vi">@url</span><span class="p">).</span><span class="nf">body</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">connection</span>
<span class="vi">@connection</span> <span class="o">||=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
<span class="ss">request: </span><span class="p">{</span> <span class="ss">open_timeout: </span><span class="mi">10</span><span class="p">,</span> <span class="ss">timeout: </span><span class="mi">30</span> <span class="p">}</span>
<span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">faraday</span><span class="o">|</span>
<span class="n">faraday</span><span class="p">.</span><span class="nf">use</span> <span class="no">Faraday</span><span class="o">::</span><span class="no">Response</span><span class="o">::</span><span class="no">RaiseError</span>
<span class="n">faraday</span><span class="p">.</span><span class="nf">adapter</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">default_adapter</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The other exceptions that Faraday could raise are not temporary and we don’t want to retry them. We could either rescue and ignore them or let them raise and be tracked by the exceptions tracking system we have in place. It depends on the use case and if they stop or not our running system.</p>
<h3 id="testing-exception-retries">Testing exception retries</h3>
<blockquote>
<p>Always provide a test / spec that documents why each exception is being handled. This is very important for future readers of the code to understand the failure context better.</p>
</blockquote>
<p>We’ll use RSpec to test the exception retries. If we focus on the <code class="language-plaintext highlighter-rouge">Faraday::TimeoutError</code>, the scenarios that we want to test are that 1) an error is retried and 2) retry is not infinite.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">describe</span> <span class="no">LinkCounter</span> <span class="k">do</span>
<span class="n">let</span><span class="p">(</span><span class="ss">:url</span><span class="p">)</span> <span class="p">{</span> <span class="s1">'http://example.com'</span> <span class="p">}</span>
<span class="n">it</span> <span class="s2">"retries read timeout errors"</span> <span class="k">do</span>
<span class="n">link_counter</span> <span class="o">=</span> <span class="no">LinkCounter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">connection</span> <span class="o">=</span> <span class="n">link_counter</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="ss">:connection</span><span class="p">)</span>
<span class="n">expect</span><span class="p">(</span><span class="n">connection</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:get</span><span class="p">).</span><span class="nf">once</span><span class="p">.</span><span class="nf">and_raise</span><span class="p">(</span><span class="no">Faraday</span><span class="o">::</span><span class="no">TimeoutError</span><span class="p">)</span>
<span class="n">expect</span><span class="p">(</span><span class="n">connection</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:get</span><span class="p">).</span><span class="nf">once</span><span class="p">.</span><span class="nf">and_return</span><span class="p">(</span><span class="n">double</span><span class="p">(</span><span class="ss">body: </span><span class="s1">'<a href="#">link</a>'</span><span class="p">))</span>
<span class="n">allow_any_instance_of</span><span class="p">(</span><span class="no">Retryable</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:sleep_interval</span><span class="p">).</span><span class="nf">and_return</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">expect</span><span class="p">(</span><span class="n">link_counter</span><span class="p">.</span><span class="nf">count</span><span class="p">).</span><span class="nf">to</span> <span class="n">eq</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">it</span> <span class="s2">"re-raises read timeout error after exausting error retries"</span> <span class="k">do</span>
<span class="n">link_counter</span> <span class="o">=</span> <span class="no">LinkCounter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">connection</span> <span class="o">=</span> <span class="n">link_counter</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="ss">:connection</span><span class="p">)</span>
<span class="n">expect</span><span class="p">(</span><span class="n">connection</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:get</span><span class="p">).</span><span class="nf">exactly</span><span class="p">(</span><span class="mi">4</span><span class="p">).</span><span class="nf">times</span><span class="p">.</span><span class="nf">and_raise</span><span class="p">(</span><span class="no">Faraday</span><span class="o">::</span><span class="no">TimeoutError</span><span class="p">)</span>
<span class="n">allow_any_instance_of</span><span class="p">(</span><span class="no">Retryable</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:sleep_interval</span><span class="p">).</span><span class="nf">and_return</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">expect</span> <span class="p">{</span>
<span class="n">expect</span><span class="p">(</span><span class="n">link_counter</span><span class="p">.</span><span class="nf">count</span><span class="p">)</span>
<span class="p">}.</span><span class="nf">to</span> <span class="n">raise_error</span><span class="p">(</span><span class="no">Faraday</span><span class="o">::</span><span class="no">TimeoutError</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>In the above example we use <a href="https://github.com/rspec/rspec-mocks">rspec-mocks</a> to set expectations for the consecutive calls. In the first spec, for the first <code class="language-plaintext highlighter-rouge">GET</code> request we expect timeout error and then for the second call we return a body with content that has one link. In the second spec, we expect 4 <code class="language-plaintext highlighter-rouge">GET</code> requests (1 + 3 retries) and all of them raising timeout error resulting in a final exception being raised.</p>
<p>If you are using <a href="https://github.com/freerange/mocha">mocha</a>, you can set expectations for consecutive invocations like this:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">connection</span><span class="p">.</span><span class="nf">expects</span><span class="p">(</span><span class="ss">:get</span><span class="p">).</span>
<span class="nf">raises</span><span class="p">(</span><span class="no">Faraday</span><span class="o">::</span><span class="no">TimeoutError</span><span class="p">).</span>
<span class="nf">then</span><span class="p">.</span><span class="nf">returns</span><span class="p">(</span><span class="n">stub</span><span class="p">(</span><span class="ss">get: body: </span><span class="s1">'<a href="#">link</a>'</span><span class="p">))</span>
</code></pre></div></div>
<p>Let’s now cover the other two cases that are 3) retrying open timeout errors and 4) not retrying unknown host errors.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">describe</span> <span class="no">LinkCounter</span> <span class="k">do</span>
<span class="c1"># the rest of the specs</span>
<span class="n">it</span> <span class="s2">"retries open timeout errors"</span> <span class="k">do</span>
<span class="n">link_counter</span> <span class="o">=</span> <span class="no">LinkCounter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">connection</span> <span class="o">=</span> <span class="n">link_counter</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="ss">:connection</span><span class="p">)</span>
<span class="n">expect</span><span class="p">(</span><span class="n">connection</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:get</span><span class="p">).</span><span class="nf">once</span><span class="p">.</span><span class="nf">and_raise</span><span class="p">(</span><span class="no">Faraday</span><span class="o">::</span><span class="no">ConnectionFailed</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s1">'execution expired'</span><span class="p">))</span>
<span class="n">expect</span><span class="p">(</span><span class="n">connection</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:get</span><span class="p">).</span><span class="nf">once</span><span class="p">.</span><span class="nf">and_return</span><span class="p">(</span><span class="n">double</span><span class="p">(</span><span class="ss">body: </span><span class="s1">'<a href="#">link</a>'</span><span class="p">))</span>
<span class="n">allow_any_instance_of</span><span class="p">(</span><span class="no">Retryable</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:sleep_interval</span><span class="p">).</span><span class="nf">and_return</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">expect</span><span class="p">(</span><span class="n">link_counter</span><span class="p">.</span><span class="nf">count</span><span class="p">).</span><span class="nf">to</span> <span class="n">eq</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">it</span> <span class="s2">"does not retry unknown host errors"</span> <span class="k">do</span>
<span class="n">link_counter</span> <span class="o">=</span> <span class="no">LinkCounter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">connection</span> <span class="o">=</span> <span class="n">link_counter</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="ss">:connection</span><span class="p">)</span>
<span class="n">expect</span><span class="p">(</span><span class="n">connection</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:get</span><span class="p">).</span><span class="nf">once</span><span class="p">.</span><span class="nf">and_raise</span><span class="p">(</span><span class="no">Faraday</span><span class="o">::</span><span class="no">ConnectionFailed</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"Failed to open TCP connection to example.nonexistent.com:80 (getaddrinfo: Name or service not known)"</span><span class="p">))</span>
<span class="n">allow_any_instance_of</span><span class="p">(</span><span class="no">Retryable</span><span class="p">).</span><span class="nf">to</span> <span class="n">receive</span><span class="p">(</span><span class="ss">:sleep_interval</span><span class="p">).</span><span class="nf">and_return</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">expect</span> <span class="p">{</span>
<span class="n">expect</span><span class="p">(</span><span class="n">link_counter</span><span class="p">.</span><span class="nf">count</span><span class="p">)</span>
<span class="p">}.</span><span class="nf">to</span> <span class="n">raise_error</span><span class="p">(</span><span class="no">Faraday</span><span class="o">::</span><span class="no">ConnectionFailed</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h3 id="final-notes">Final notes</h3>
<p>In this walkthough I did not use TDD intentionally to focus on these other important details. And also, we are often surprised by exceptions we cannot predict in development but they appear in production and we handle them after the fact. The important thing is to always document with a spec the very specific exception that happens, in which conditions it happens so that others can understand, improve and refactor the code in the future.</p>In a previous blog posts I wrote about the problem of overusing exceptions, and in this one we’ll look at some exception handling and testing practices.Debugging Rails Views in Production2017-06-11T10:00:00+00:002017-06-11T10:00:00+00:00https://dalibornasevic.com/posts/debugging-rails-views-in-production<p>Today I’m going to share a quick technique for debugging Rails views in production. When there is a nasty bug or performance issue, the easiest way to find the cause is to reproduce it in the environment where it’s happening with the real data and in the real context.</p>
<p>The technique involves monkey-patching production code in Rails console that adds print statements, defines or redefines methods that when called will get us some insights to understand what’s going on. Investigating and isolating segment by segment, usually using read only operations to prevent undesirable data side effects, and we’ll eventually figure out the cause.</p>
<p>It’s easy to use this approach with small and isolated classes and methods that can be initialized and called without much setup, but we can use the same approach with the standard request-response cycle to debug views with <a href="http://api.rubyonrails.org/classes/Rails/ConsoleMethods.html">ConsoleMethods</a> from Rails.</p>
<h3 id="find-the-slow-partial">Find the slow partial</h3>
<p>Say we have a controller action that we want to investigate where exactly it’s getting slow in the views. Btw, imagine that this is happening only for a single user in production and transaction average metric is not revealing us any useful info.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">HomeController</span> <span class="o"><</span> <span class="no">ApplicationController</span>
<span class="n">before_filter</span> <span class="ss">:authenticate_account!</span>
<span class="k">def</span> <span class="nf">index</span>
<span class="c1"># some stuff</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We can now use the <a href="http://api.rubyonrails.org/classes/Rails/ConsoleMethods.html#method-i-app">app</a> instance available in console to make a <code class="language-plaintext highlighter-rouge">GET</code> request to <code class="language-plaintext highlighter-rouge">/</code> path in the app.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>></span> <span class="n">app</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="no">Started</span> <span class="no">GET</span> <span class="s2">"/"</span> <span class="k">for</span> <span class="mf">127.0</span><span class="o">.</span><span class="mf">0.1</span> <span class="n">at</span> <span class="mi">2017</span><span class="o">-</span><span class="mo">06</span><span class="o">-</span><span class="mi">11</span> <span class="mi">08</span><span class="p">:</span><span class="mi">45</span><span class="p">:</span><span class="mi">15</span> <span class="o">+</span><span class="mo">0200</span>
<span class="no">Processing</span> <span class="n">by</span> <span class="no">HomeController</span><span class="c1">#index as HTML</span>
<span class="no">Completed</span> <span class="mi">401</span> <span class="no">Unauthorized</span> <span class="k">in</span> <span class="mi">10</span><span class="n">ms</span> <span class="p">(</span><span class="no">ActiveRecord</span><span class="p">:</span> <span class="mf">0.0</span><span class="n">ms</span><span class="p">)</span>
<span class="o">=></span> <span class="mi">302</span>
</code></pre></div></div>
<p>Oh, of course. We cannot get to the view rendering yet because of the before filter and we’ll need to authenticate first. We can either login first with another request or we can just stub it for the duration of this console session that will also avoid logging our credentials in production console history.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">HomeController</span>
<span class="n">skip_before_filter</span> <span class="ss">:authenticate_account!</span>
<span class="k">def</span> <span class="nf">current_account</span>
<span class="no">Account</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Then, by making the <code class="language-plaintext highlighter-rouge">GET</code> request to <code class="language-plaintext highlighter-rouge">/</code> path we’ll get the details from the views rendering:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>></span> app.get<span class="o">(</span><span class="s1">'/'</span><span class="o">)</span>
Started GET <span class="s2">"/"</span> <span class="k">for </span>127.0.0.1 at 2017-06-11 00:59:44 +0200
Processing by HomeController#index as HTML
Rendered home/_view1.html.erb <span class="o">(</span>0.0ms<span class="o">)</span>
Rendered home/_view2.html.erb <span class="o">(</span>10000.2ms<span class="o">)</span>
Rendered home/index.html.erb within layouts/application <span class="o">(</span>10001.3ms<span class="o">)</span>
Rendered shared/_topnav.html.erb <span class="o">(</span>0.2ms<span class="o">)</span>
Rendered shared/_flash_messages.html.erb <span class="o">(</span>0.1ms<span class="o">)</span>
Rendered shared/_header.html.erb <span class="o">(</span>0.1ms<span class="o">)</span>
Rendered shared/_footer.html.erb <span class="o">(</span>0.0ms<span class="o">)</span>
Completed 200 OK <span class="k">in </span>10010ms <span class="o">(</span>Views: 10009.8ms | ActiveRecord: 0.0ms<span class="o">)</span>
<span class="o">=></span> 200
</code></pre></div></div>
<p>From the rendering info, we can see that most of the time, that is around 10 seconds, is spent rendering <code class="language-plaintext highlighter-rouge">home/_view2.html.erb</code> partial. We have identified that something slow is happening there but we don’t know what exactly it is.</p>
<h3 id="get-the-stacktrace">Get the stacktrace</h3>
<p>While the request is processing the slow part we can just press <code class="language-plaintext highlighter-rouge">CTRL+C</code> to stop it and get a stacktrace:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>></span> app.get<span class="o">(</span><span class="s1">'/'</span><span class="o">)</span>
Started GET <span class="s2">"/"</span> <span class="k">for </span>127.0.0.1 at 2017-06-11 01:05:30 +0200
Processing by HomeController#index as HTML
Rendered home/_view1.html.erb <span class="o">(</span>0.0ms<span class="o">)</span>
^C Rendered home/_view2.html.erb <span class="o">(</span>1309.6ms<span class="o">)</span>
Rendered home/index.html.erb within layouts/application <span class="o">(</span>1310.6ms<span class="o">)</span>
Completed 500 Internal Server Error <span class="k">in </span>1312ms <span class="o">(</span>ActiveRecord: 0.0ms<span class="o">)</span>
IRB::Abort <span class="o">(</span>abort <span class="k">then </span>interrupt!<span class="o">)</span>:
app/views/home/_view2.html.erb:1:in <span class="sb">`</span><span class="nb">sleep</span><span class="s1">'
app/views/home/_view2.html.erb:1:in `_app_views_home__view__html_erb__3830501997270886489_69842991281620'</span>
app/views/home/index.html.erb:3:in <span class="sb">`</span>_app_views_home_index_html_erb___2357466009542976056_69842998601520<span class="s1">'
Rendered /home/dalibor/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/actionpack-4.2.8/lib/action_dispatch/middleware/templates/rescues/_source.erb (5.6ms)
Rendered /home/dalibor/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/actionpack-4.2.8/lib/action_dispatch/middleware/templates/rescues/_trace.html.erb (2.2ms)
Rendered /home/dalibor/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/actionpack-4.2.8/lib/action_dispatch/middleware/templates/rescues/_request_and_response.html.erb (0.7ms)
Rendered /home/dalibor/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/actionpack-4.2.8/lib/action_dispatch/middleware/templates/rescues/diagnostics.html.erb within rescues/layout (18.0ms)
=> 500
</span></code></pre></div></div>
<p>From the stacktrace we can see that the slow call is the call to <code class="language-plaintext highlighter-rouge">sleep</code> method in <code class="language-plaintext highlighter-rouge">view2</code> partial. So, once we know the “what”, we can go and start figuring out the “why”.</p>
<p>Alternatively to this, we can use <code class="language-plaintext highlighter-rouge">TracePoint</code> as explained in <a href="/posts/51-tracing-ruby-code">tracing ruby</a> blog post to get a stacktrace and then play roulette to sample individual calls to figure out what’s slow.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">trace</span> <span class="p">{</span> <span class="n">app</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span> <span class="p">}</span>
</code></pre></div></div>
<p><a href="http://api.rubyonrails.org/classes/Rails/ConsoleMethods.html">ConsoleMethods</a> module has few handy methods that you can check out.</p>
<p>For example, we can get the response body.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>></span> <span class="n">app</span><span class="p">.</span><span class="nf">response</span><span class="p">.</span><span class="nf">body</span><span class="p">.</span><span class="nf">first</span><span class="p">(</span><span class="mi">15</span><span class="p">)</span>
<span class="o">=></span> <span class="s2">"<!DOCTYPE html>"</span>
</code></pre></div></div>
<p>We can call routes and helper methods, etc.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>></span> <span class="n">helper</span><span class="p">.</span><span class="nf">link_to</span><span class="p">(</span><span class="n">app</span><span class="p">.</span><span class="nf">root_path</span><span class="p">,</span> <span class="s1">'Home'</span><span class="p">)</span>
<span class="o">=></span> <span class="s2">"<a href=</span><span class="se">\"</span><span class="s2">Home</span><span class="se">\"</span><span class="s2">>/</a>"</span>
</code></pre></div></div>Today I’m going to share a quick technique for debugging Rails views in production. When there is a nasty bug or performance issue, the easiest way to find the cause is to reproduce it in the environment where it’s happening with the real data and in the real context.Faster CI builds using an in-memory database2017-02-21T08:10:00+00:002017-02-21T08:10:00+00:00https://dalibornasevic.com/posts/faster-ci-builds-using-an-in-memory-database<p>What if you could get some speed improvement for your database intensive tests for free?</p>
<p>In this blog post we’ll use an in-memory file storage called <a href="https://en.wikipedia.org/wiki/Tmpfs">tmpfs</a> that is available on most Unix-like operating systems. To test this out I am using Ubuntu 14.04 and MySQL 5.5.54 but the approach applies to any database that writes data to disk. Databases are very sensitive to <a href="https://en.wikipedia.org/wiki/IOPS">IOPS</a> since their job is reading and writing data and the speed gain comes from faster writes to RAM than disk.</p>
<p>You should expect to see more significant speed improvement if your test suite is more intense on the database like using database cleaning strategy with non-transactional fixtures, using truncation, etc. The gain depends on the speed difference between writing to RAM and writing to disk for your machine.</p>
<p>In my test, I got ~ <strong>32%</strong> speed improvement with build time decrease from <strong>32.96</strong> to <strong>22.37</strong> seconds:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Before</span>
Finished <span class="k">in </span>32.96 seconds <span class="o">(</span>files took 3.38 seconds to load<span class="o">)</span>
1316 examples, 0 failures, 1 pending
<span class="c"># After</span>
Finished <span class="k">in </span>22.37 seconds <span class="o">(</span>files took 3.29 seconds to load<span class="o">)</span>
1316 examples, 0 failures, 1 pending
</code></pre></div></div>
<p>To test this out yourself and see what speed improvement you get, this is what to do.</p>
<h3 id="create-a-ram-disk">Create a RAM disk</h3>
<p>Create a new directory <code class="language-plaintext highlighter-rouge">/mnt/testdisk</code> and then use the <code class="language-plaintext highlighter-rouge">mount</code> command to create a disk using <code class="language-plaintext highlighter-rouge">tmpfs</code> file storage with size of 300 megabytes.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo mkdir</span> /mnt/testdisk
<span class="nb">sudo </span>mount <span class="nt">-t</span> tmpfs <span class="nt">-o</span> <span class="nv">size</span><span class="o">=</span>300m tmpfs /mnt/testdisk
</code></pre></div></div>
<p>In case you need to unmount and remove that directory later, you can do that with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>umount /mnt/testdisk
<span class="nb">sudo rm</span> <span class="nt">-rf</span> /mnt/testdisk
</code></pre></div></div>
<h3 id="run-mysql-in-docker-container">Run MySQL in Docker container</h3>
<p>If you’re now familiar with Docker you can skip this section. If you are familiar or you want to get familiar, first <a href="https://docs.docker.com/engine/installation/">install it</a>, and then just setup a MySQL container with the following command:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>docker run <span class="se">\</span>
<span class="nt">--detach</span> <span class="se">\</span>
<span class="nt">--name</span><span class="o">=</span>mysql-test <span class="se">\</span>
<span class="nt">--env</span><span class="o">=</span><span class="s2">"MYSQL_ROOT_PASSWORD=pass"</span> <span class="se">\</span>
<span class="nt">--volume</span><span class="o">=</span>/mnt/testdisk:/var/lib/mysql <span class="se">\</span>
mysql:5.5.54
</code></pre></div></div>
<p>What that command does is, it creates a new MySQL container with a name of <code class="language-plaintext highlighter-rouge">mysql-test</code> using version <code class="language-plaintext highlighter-rouge">5.5.54</code>. It sets the root password for MySQL to <code class="language-plaintext highlighter-rouge">pass</code> and attaches the RAM disk we previously created at <code class="language-plaintext highlighter-rouge">/mnt/testdisk</code> to <code class="language-plaintext highlighter-rouge">/var/lib/mysql</code> that is the default MySQL datadir.</p>
<p>If the container was setup successfully, find out the IP address and check if MySQL is running inside:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>docker inspect mysql-test | <span class="nb">grep</span> <span class="s2">"IPAddress"</span>
<span class="c"># "IPAddress": "172.17.0.2",</span>
mysql <span class="nt">-uroot</span> <span class="nt">-ppass</span> <span class="nt">-h</span> 172.17.0.2 <span class="nt">-P</span> 3306
</code></pre></div></div>
<p>If you can connect, then all is good and you are ready to change the host in <code class="language-plaintext highlighter-rouge">database.yml</code> for the test environment and measure the build time.</p>
<p>If something went wrong during container bootup, you can start debugging by checking out the container logs. I had an issue when I tried to use a smaller partition size for tmpfs disk and the container creation failed.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>docker logs mysql-test
</code></pre></div></div>
<p>In case you want to remove the container, you can do that with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>docker <span class="nb">rm</span> <span class="nt">-f</span> mysql-test
</code></pre></div></div>
<h3 id="configure-local-mysql">Configure local MySQL</h3>
<p>If you’re not familiar with Docker, the other option is to manually change the local MySQL config. The inconvenience is that you’ll need to constantly change the config switching between development and test because RAM data will not persist after reboots. Here are setup steps.</p>
<p>Stop MySQL service:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>service mysql stop
</code></pre></div></div>
<p>Change MySQL data directory:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># sudo vi /etc/mysql/my.cnf</span>
datadir <span class="o">=</span> /mnt/testdisk
</code></pre></div></div>
<p>Add apparmor (Linux kernel security module) alias for the new MySQL path:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># sudo vi /etc/apparmor.d/tunables/alias</span>
<span class="nb">alias</span> /var/lib/mysql/ -> /mnt/testdisk,
</code></pre></div></div>
<p>Restart apparmor service:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>service apparmor restart
</code></pre></div></div>
<p>Re-configure MySQL to setup things properly in the new data directory:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>dpkg-reconfigure mysql-server-5.5
</code></pre></div></div>
<p>The last command will auto-start MySQL and then you should be ready to measure the test time.</p>
<p>Share in the comments what speed improvements do you get.</p>What if you could get some speed improvement for your database intensive tests for free?Auto-reconnect for ActiveRecord connections2017-01-20T08:00:00+00:002017-01-20T08:00:00+00:00https://dalibornasevic.com/posts/auto-reconnect-for-active-record-connections<p>ActiveRecord has a special config option <code class="language-plaintext highlighter-rouge">reconnect: true</code> for native auto-reconnect when using MySQL database. With that option in <code class="language-plaintext highlighter-rouge">database.yml</code>, it will try to reconnect only once as per the <a href="http://dev.mysql.com/doc/refman/5.7/en/auto-reconnect.html">manual</a> before it fails:</p>
<blockquote>
<p>The MySQL client library can perform an automatic reconnection to the server if it finds that the connection is down when you attempt to send a statement to the server to be executed. If auto-reconnect is enabled, the library tries once to reconnect to the server and send the statement again.</p>
</blockquote>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>></span> Post.count
<span class="o">(</span>0.7ms<span class="o">)</span> SELECT COUNT<span class="o">(</span><span class="k">*</span><span class="o">)</span> FROM <span class="sb">`</span>posts<span class="sb">`</span>
ActiveRecord::StatementInvalid: Mysql2::Error: Can<span class="s1">'t connect to local MySQL server through socket '</span>/var/run/mysqld/mysqld.sock<span class="s1">' (2): SELECT COUNT(*) FROM `posts`
</span></code></pre></div></div>
<p>Often we want to have more control over the reconnect strategy in order to give it more than one chance for the connection to recover. Imagine doing a master-slave fail-over or the database server is not stable and it takes about 10 seconds of downtime for the server to become available. To keep the service reliable we’ll need to avoid dropping requests during that interval.</p>
<p>One way to do that would be to patch ActiveRecord to auto-reconnect with custom wait intervals like:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Mysql2AdapterPatch</span>
<span class="k">def</span> <span class="nf">execute</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
<span class="c1"># During `reconnect!`, `Mysql2Adapter` first disconnect and set the</span>
<span class="c1"># @connection to nil, and then tries to connect. When connect fails,</span>
<span class="c1"># @connection will be left as nil value which will cause issues later.</span>
<span class="n">connect</span> <span class="k">if</span> <span class="vi">@connection</span><span class="p">.</span><span class="nf">nil?</span>
<span class="k">begin</span>
<span class="k">super</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
<span class="k">rescue</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">StatementInvalid</span> <span class="o">=></span> <span class="n">e</span>
<span class="k">if</span> <span class="n">e</span><span class="p">.</span><span class="nf">message</span> <span class="o">=~</span> <span class="sr">/server has gone away/i</span>
<span class="n">in_transaction</span> <span class="o">=</span> <span class="n">transaction_manager</span><span class="p">.</span><span class="nf">current_transaction</span><span class="p">.</span><span class="nf">open?</span>
<span class="n">try_reconnect</span>
<span class="n">in_transaction</span> <span class="p">?</span> <span class="k">raise</span> <span class="p">:</span> <span class="k">retry</span>
<span class="k">else</span>
<span class="k">raise</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">try_reconnect</span>
<span class="n">sleep_times</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">8</span><span class="p">]</span>
<span class="k">begin</span>
<span class="n">reconnect!</span>
<span class="k">rescue</span> <span class="no">Mysql2</span><span class="o">::</span><span class="no">Error</span> <span class="o">=></span> <span class="n">e</span>
<span class="n">sleep_time</span> <span class="o">=</span> <span class="n">sleep_times</span><span class="p">.</span><span class="nf">shift</span>
<span class="k">if</span> <span class="n">sleep_time</span> <span class="o">&&</span> <span class="n">e</span><span class="p">.</span><span class="nf">message</span> <span class="o">=~</span> <span class="sr">/can't connect/i</span>
<span class="nb">warn</span> <span class="s2">"Server timed out, retrying in </span><span class="si">#{</span><span class="n">sleep_time</span><span class="si">}</span><span class="s2"> sec."</span>
<span class="nb">sleep</span> <span class="n">sleep_time</span>
<span class="k">retry</span>
<span class="k">else</span>
<span class="k">raise</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="nb">require</span> <span class="s1">'active_record/connection_adapters/mysql2_adapter'</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">ConnectionAdapters</span><span class="o">::</span><span class="no">Mysql2Adapter</span><span class="p">.</span><span class="nf">prepend</span> <span class="no">Mysql2AdapterPatch</span>
</code></pre></div></div>
<p>When connection goes down, it starts trying to reconnect and finally succeeds when server is up.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>></span> Post.count
<span class="o">(</span>0.6ms<span class="o">)</span> SELECT COUNT<span class="o">(</span><span class="k">*</span><span class="o">)</span> FROM <span class="sb">`</span>posts<span class="sb">`</span>
Server timed out, retrying <span class="k">in </span>0.1 sec.
Server timed out, retrying <span class="k">in </span>0.5 sec.
Server timed out, retrying <span class="k">in </span>1 sec.
Server timed out, retrying <span class="k">in </span>2 sec.
Server timed out, retrying <span class="k">in </span>4 sec.
<span class="o">(</span>1.1ms<span class="o">)</span> SELECT COUNT<span class="o">(</span><span class="k">*</span><span class="o">)</span> FROM <span class="sb">`</span>posts<span class="sb">`</span>
<span class="o">=></span> 0
</code></pre></div></div>
<p>What’s interesting to note here is that if during a transaction block the connection goes down and reconnects, it will continue executing the following queries and will just swallow the previous queries from the start of the transaction until the moment where connection dropped. That’s why when trying to reconnect while <code class="language-plaintext highlighter-rouge">in_transaction</code> as per the patch above, it’s safer to re-raise the connect error.</p>
<p>Here’s an example to demonstrate that edge-case:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Post</span><span class="p">.</span><span class="nf">transaction</span> <span class="k">do</span>
<span class="no">Post</span><span class="p">.</span><span class="nf">create</span>
<span class="nb">sleep</span> <span class="mi">5</span>
<span class="no">Post</span><span class="p">.</span><span class="nf">count</span>
<span class="k">end</span>
</code></pre></div></div>
<p>If the connection is dropped while on the <code class="language-plaintext highlighter-rouge">sleep</code> call, and then reconnects, it will re-raise the dropped connection error to stop executing following queries because the <code class="language-plaintext highlighter-rouge">Post.create</code> will not get created.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">(</span>0.3ms<span class="o">)</span> BEGIN
SQL <span class="o">(</span>0.2ms<span class="o">)</span> INSERT INTO <span class="sb">`</span>posts<span class="sb">`</span> <span class="o">(</span><span class="sb">`</span>created_at<span class="sb">`</span>, <span class="sb">`</span>updated_at<span class="sb">`</span><span class="o">)</span> VALUES <span class="o">(</span><span class="s1">'2017-01-18 20:18:14'</span>, <span class="s1">'2017-01-18 20:18:14'</span><span class="o">)</span>
<span class="o">(</span>0.2ms<span class="o">)</span> SELECT COUNT<span class="o">(</span><span class="k">*</span><span class="o">)</span> FROM <span class="sb">`</span>posts<span class="sb">`</span>
Server timed out, retrying <span class="k">in </span>0.1 sec.
Server timed out, retrying <span class="k">in </span>0.5 sec.
Server timed out, retrying <span class="k">in </span>1 sec.
Server timed out, retrying <span class="k">in </span>2 sec.
<span class="o">(</span>0.1ms<span class="o">)</span> ROLLBACK
ActiveRecord::StatementInvalid: Mysql2::Error: MySQL server has gone away: SELECT COUNT<span class="o">(</span><span class="k">*</span><span class="o">)</span> FROM <span class="sb">`</span>posts<span class="sb">`</span>
</code></pre></div></div>
<p>I hope you find this info useful, please share in the comments if you have any thoughts.</p>ActiveRecord has a special config option reconnect: true for native auto-reconnect when using MySQL database. With that option in database.yml, it will try to reconnect only once as per the manual before it fails: