A Walkthrough for Handling and Testing Exceptions
In a previous blog posts I wrote about the problem of overusing exceptions, and in this one we’ll look at some exception handling and testing practices.
To start with, let’s define LinkCounter
class. LinkCounter
counts how many links are on a web page. It is initialized with a url, it uses Faraday HTTP client to fetch the page content and it uses Nokogiri to parse the HTML content.
require 'faraday'
require 'nokogiri'
class LinkCounter
def initialize(url)
@url = url
end
def count
doc.css('a').count
end
private
def doc
Nokogiri::HTML.parse(content)
end
def content
connection.get(@url).body
end
def connection
Faraday.new
end
end
Then, we can use it like this:
puts LinkCounter.new('https://example.com').count # 1
Pretty simple so far.
What could possibly go wrong?
To improve the robustness of our LinkCounter
we need to think about what could fail? We identify the Faraday’s connection.get
call, doing the GET
HTTP request, as one with highest probably of failure because it depends on the reliability of the network.
Always rescue very specific exceptions. Never rescue
Exception
and avoid rescuingStandardError
too because it can hide unexpected errors likeNameError
andNoMethodError
. See ruby’s exception hierarchy.
In order to rescue the very specific exceptions, we need to figure out all the exceptions that Faraday can raise. Good libraries usually would have a separate file defining all the errors like it’s the case with Faraday errors or Redis errors as another example.
Looking at the Faraday error definitions we can see it has the following hierarchy:
StandardError
Faraday::Error
Faraday::MissingDependency
Faraday::ClientError
Faraday::ConnectionFailed
Faraday::ResourceNotFound
Faraday::ParsingError
Faraday::TimeoutError
Faraday::SSLError
Exploring Faraday errors
We need to explore and understand at what conditions each of the Faraday errors could happen.
So, if we define very small open timeout, we’ll see Faraday::ConnectionFailed
error.
Faraday.new(request: { open_timeout: 0.1 }).get('https://example.com')
# Faraday::ConnectionFailed: execution expired
If we define small read timeout, we’ll get Faraday::TimeoutError
.
Faraday.new(request: { open_timeout: 1, timeout: 0.1 }).
get('https://example.com')
# Faraday::TimeoutError: Net::ReadTimeout
Note here that if we set only the timeout
value, the open_timeout
will use the same value and we wouldn’t be able to reproduce the Faraday::TimeoutError
error, but we’ll get Faraday::ConnectionFailed
error again.
For docs on timeouts in other popular Ruby gems, you can check out this popular github repo.
If we try GET
request to a nonexistent host we get Faraday::ConnectionFailed
.
Faraday.get('https://example.nonexistent.com')
# Faraday::ConnectionFailed: Failed to open TCP connection to example.nonexistent.com:443 (getaddrinfo: Name or service not known)
Note that in this case we also have a nice exception message getaddrinfo: Name or service not known
that distinguishes this error from the error that happens when a connection cannot be opened for an existing host.
If we request a website without SSL support, we get Faraday::SSLError
.
Faraday.get('https://ruby.mk')
# Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
Finally, if we configure Faraday to raise exceptions on 40x and 50x responses, we’ll see it raises Faraday::ResourceNotFound
error for 404 response:
Faraday.new do |faraday|
faraday.use Faraday::Response::RaiseError
faraday.adapter Faraday.default_adapter
end.get('https://httpstat.us/404')
# Faraday::ResourceNotFound: the server responded with status 404
And, we’ll get Faraday::ClientError
for 500 response:
Faraday.new do |faraday|
faraday.use Faraday::Response::RaiseError
faraday.adapter Faraday.default_adapter
end.get('https://httpstat.us/500')
# Faraday::ClientError: the server responded with status 500
Note that in the last two examples I use this handy httpstat.us service that returns the requested status code.
Handling exceptions
Based on our previous exploration, we conclude that we will retry Faraday::TimeoutError
and Faraday::ConnectionFailed
errors except the case when the host does not exist, i.e. exception message is getaddrinfo: Name or service not known
.
Let’s define a general purpose Retryable
module for that.
module Retryable
SLEEP_INTERVAL = 0.4
def with_retries(retries: 3, retry_skip_reason: nil, rescue_class: )
tries = 0
begin
yield
rescue *rescue_class => e
tries += 1
if tries <= retries && (retry_skip_reason.nil? || !e.message.include?(retry_skip_reason))
sleep sleep_interval(tries)
retry
else
raise
end
end
end
private
def sleep_interval(tries)
(SLEEP_INTERVAL + rand(0.0..1.0)) * tries ** 2
end
end
From this module we can use with_retries
method that by default will retry 3 times the error with an exponential and randomized sleep interval. It also accepts an option retry_skip_reason
to skip retry when a specific exception message matches the skip reason.
We can now use the Retryable
module with LinkCounter
as follows:
class LinkCounter
include Retryable
# the rest of the code
def content
with_retries(
rescue_class: [Faraday::TimeoutError, Faraday::ConnectionFailed],
retry_skip_reason: 'getaddrinfo: Name or service not known'
) do
connection.get(@url).body
end
end
def connection
@connection ||= Faraday.new(
request: { open_timeout: 10, timeout: 30 }
) do |faraday|
faraday.use Faraday::Response::RaiseError
faraday.adapter Faraday.default_adapter
end
end
end
The other exceptions that Faraday could raise are not temporary and we don’t want to retry them. We could either rescue and ignore them or let them raise and be tracked by the exceptions tracking system we have in place. It depends on the use case and if they stop or not our running system.
Testing exception retries
Always provide a test / spec that documents why each exception is being handled. This is very important for future readers of the code to understand the failure context better.
We’ll use RSpec to test the exception retries. If we focus on the Faraday::TimeoutError
, the scenarios that we want to test are that 1) an error is retried and 2) retry is not infinite.
describe LinkCounter do
let(:url) { 'http://example.com' }
it "retries read timeout errors" do
link_counter = LinkCounter.new(url)
connection = link_counter.send(:connection)
expect(connection).to receive(:get).once.and_raise(Faraday::TimeoutError)
expect(connection).to receive(:get).once.and_return(double(body: '<a href="#">link</a>'))
allow_any_instance_of(Retryable).to receive(:sleep_interval).and_return(0)
expect(link_counter.count).to eq(1)
end
it "re-raises read timeout error after exausting error retries" do
link_counter = LinkCounter.new(url)
connection = link_counter.send(:connection)
expect(connection).to receive(:get).exactly(4).times.and_raise(Faraday::TimeoutError)
allow_any_instance_of(Retryable).to receive(:sleep_interval).and_return(0)
expect {
expect(link_counter.count)
}.to raise_error(Faraday::TimeoutError)
end
end
In the above example we use rspec-mocks to set expectations for the consecutive calls. In the first spec, for the first GET
request we expect timeout error and then for the second call we return a body with content that has one link. In the second spec, we expect 4 GET
requests (1 + 3 retries) and all of them raising timeout error resulting in a final exception being raised.
If you are using mocha, you can set expectations for consecutive invocations like this:
connection.expects(:get).
raises(Faraday::TimeoutError).
then.returns(stub(get: body: '<a href="#">link</a>'))
Let’s now cover the other two cases that are 3) retrying open timeout errors and 4) not retrying unknown host errors.
describe LinkCounter do
# the rest of the specs
it "retries open timeout errors" do
link_counter = LinkCounter.new(url)
connection = link_counter.send(:connection)
expect(connection).to receive(:get).once.and_raise(Faraday::ConnectionFailed.new('execution expired'))
expect(connection).to receive(:get).once.and_return(double(body: '<a href="#">link</a>'))
allow_any_instance_of(Retryable).to receive(:sleep_interval).and_return(0)
expect(link_counter.count).to eq(1)
end
it "does not retry unknown host errors" do
link_counter = LinkCounter.new(url)
connection = link_counter.send(:connection)
expect(connection).to receive(:get).once.and_raise(Faraday::ConnectionFailed.new("Failed to open TCP connection to example.nonexistent.com:80 (getaddrinfo: Name or service not known)"))
allow_any_instance_of(Retryable).to receive(:sleep_interval).and_return(0)
expect {
expect(link_counter.count)
}.to raise_error(Faraday::ConnectionFailed)
end
end
Final notes
In this walkthough I did not use TDD intentionally to focus on these other important details. And also, we are often surprised by exceptions we cannot predict in development but they appear in production and we handle them after the fact. The important thing is to always document with a spec the very specific exception that happens, in which conditions it happens so that others can understand, improve and refactor the code in the future.