Encapsulation is a Lie

by Cameron Dutro on May 28, 2021

In this post I respond to another of Jason Swett’s recent articles, Don’t wrap instance variables in attr_reader unless necessary. Jason, if you’re reading this please know this blog isn’t only about critiquing your writing, which I find insightful and thought-provoking. You’ve really gotten me thinking lately, and I’ve been meaning to start a blog for a long time anyway. Seemed like a good opportunity to finally get one going.

What is `attr_reader`?

It’s common to see Ruby classes expose instance variables using a special class method called attr_reader, eg:

class Email
  attr_reader :subject, :body

  def initialize(subject, body)
    @subject = subject
    @body = body
  end
end

email = Email.new('Check this out', 'Ruby rocks')
puts email.subject  # => prints "Check this out"

As you can see, attr_reader :name defines the #name method on our Email class. The #name method simply returns the value of the @name instance variable.

Ruby also features two other class methods, attr_writer and attr_accessor. The former defines a method for assigning a value to an instance variable (eg. #name=) while the latter defines both the getter and the setter (i.e. both #name and #name=).

Why would you ever do this? The basic premise is that instance variables are private, meaning nobody outside the class can get or set them. By wrapping ivars with attr_reader and friends, they are now available to the outside world.

Why Methods are Almost Always Better

In my opinion, by far the biggest benefit of attr_reader is that it exposes instance variables as methods. Why are methods better? In a word, inheritance.

There’s no way to override instance variables in Ruby. Consider an ivar-only version of our Email class from earlier (notice the addition of the #deliver_to method):

require 'mail'

class Email
  def initialize(subject, body)
    @subject = subject
    @body = body
  end

  def deliver_to(address)
    mail = Mail.new
    mail[:from] = 'no-reply@camerondutro.com'
    mail[:to] = address
    mail[:subject] = @subject
    mail[:body] = @body
    mail.deliver!
  end
end

Let’s say I want to add a signature at the end of the email body. To do so, I’ll create a new EmailWithSignature class that inherits from Email:

class EmailWithSignature < Email
end

Somewhere in my new class, I need to append the signature to the body. I only really have two options:

Override #initialize and append the signature to @body.
Override #deliver_to so it appends the signature on send.

Neither of these options feel particularly “clean.” Both #initialize and #deliver_to accept arguments, and those arguments can change over time. If I override one of them, I have to make sure my derived EmailWithSignature class changes whenever Email does.

If I choose to override #deliver_to, I have to copy/paste the logic into EmailWithSignature in order to change the content of the body, or potentially reassign @body before calling super. Yuck.

Instead, let’s wrap our instance variables in attr_readers (see above). Now the derived class can simply override the body method, which will always accept zero arguments:

class EmailWithSignature < Email
  def body
    super + "\n\n-Cameron Dutro\nInternational Man of Mystery"
  end
end

Much easier, much cleaner.

The limitations of the ivar approach have bitten me numerous times in my professional career. In most cases, requirements had changed and I found myself needing to extend an existing class. Using attr_reader would have given me the “hooks” I needed to non-intrusively modify the class’s behavior.

Remember that derived classes can call your private methods. If you’re worried about someone else messing with your encapsulated data, just make your attr_readers private:

class Email
  attr_reader :subject, :body
  private :subject, :body

  def initialize(subject, body)
    @subject = subject
    @body = body
  end
end

email = Email.new('Check this out', 'Ruby rocks')
puts email.subject  # => raises a NoMethodError

However, I prefer to make all of mine public. That’s because I believe encapsulation doesn’t exist.

The Problem with Encapsulation

If you’ve gone through any sort of formal computer science training, chances are you’ve been taught the four main pillars of object-oriented programming: inheritance, abstraction, polymorphism, and encapsulation.

As you probably already know, encapsulation prevents direct access to an object’s state. In Ruby, that means preventing direct access to an object’s instance variables. The idea is that the object and the object alone should be able to mutate its state, perhaps to maintain some invariants, etc.

However in the vast majority of object-oriented systems, much of what we think of as the internal, encapsulated state of an object is actually shared state. Let’s take a look at a class (granted, a very naïve one) that represents an email address:

class EmailAddress
  def initialize(address)
    unless address.include?('@')
      raise InvalidAddressError, "the address '#{address}' is invalid"
    end

    @address = address
  end

  def user
    @address.split('@')[0]
  end

  def host
    @address.split('@')[1]
  end
end

Our class nicely encapsulates @address. Nobody else should be able to mess with it right?

Wrong!

address_str = 'foo@bar.com'
address = EmailAddress.new(address_str)
address.host  # => "bar.com"
address_str.replace('woops!')
address.host  # => nil

Woops! Since everything in Ruby is passed by reference, it’s entirely possible for the data given to an object to change without the object’s knowledge. In other words, encapsulation can be easily bypassed.

It doesn’t matter if we use ivars or make the attr_reader private. There’s always the chance someone else is holding on to a reference to our “private” data and can mutate it at will.

To me, that’s a pretty big deal. It means encapsulation is kind of a lie. If you assign ivars in your class’s constructor from passed-in data, you might as well expose them with attr_readers. They’re basically public anyway.

Encapsulating Better

Hold on. If encapsulation doesn’t exist, then doesn’t that call all of object-oriented programming into question?

Maybe, but I’m not the right person to say one way or the other. I happen to really enjoy object-oriented programming. It fits the way my brain works.

But just like a number of aspects of software development, programming well requires discipline. I posit that large object-oriented systems stay afloat because programmers, mostly unconsciously, develop an understanding of the nuances of encapsulation and evolve habits to avoid the major pitfalls. That’s certainly been the case for me. It was only in thinking deeply about this article that I began to ask myself why, for example, reassigning instance variables feels wrong to me.

To that end, I like to follow these rules:

Only set instance variables once. After they are set, treat them like constants.
Copy objects before mutating them.

The goal here is to prevent our objects from changing except at well-known points in time.

Only Set Instance Variables Once

In most cases, I think reassigning instance variables is a code smell. If your object needs to change what data it references, it should have asked for different data at initialization.

NOTE: Methods that use instance variables to memoize the result of a lazily evaluated expression are ok because, although the object didn’t ask for the data at initialization, it still only sets the instance variable once.

The less your object’s state changes after initialization, the less uncertainty you introduce into the system.

Copy Objects Before Mutating Them

Consider this classic example of pass-by-reference mayhem:

def compliment(name)
  puts name << ', you rock!'
end

name = 'Cameron'
compliment(name)
name  # => "Cameron, you rock!"

The caller probably isn’t expecting compliment to mutate name.

The same principle applies to data referenced by an object’s instance variables. Mutating only copies of your data prevents these sorts of surprises.

Back to `attr_reader`

Ok, but what does all this have to do with attr_reader?

In his usual erudite manner, Jason lays out the case against attr_reader with the following statement:

Adding a public attr_reader throws away the benefits of encapsulation

Private instance variables are useful for the same reason as private methods: because you know they’re not depended on by outside clients.

If I have a class that has an instance variable called @price, I know that I can rename that instance variable to @cost or change it to @price_cents (changing the whole meaning of the value) or even kill @price altogether. What I want to do with @price is 100% my business. This is great.

Hopefully I’ve shown in this article why encapsulation is more or less a myth. There’s no way to know who else may reference the same data as your instance variables, so it very well may not be “100% your business.” This is especially true if one of your methods returns it somehow, perhaps wrapped by another object.

I do agree that adding attr_reader :price changes the class’s public interface, making it more difficult to change in the future. Unless you’re developing library code or a gem however, I think the public interface argument is fairly weak. As Jason says a few paragraphs earlier, if a naming issue causes a problem in your application, your application probably isn’t tested well enough.

Conclusion

My takeaway message is this: encapsulation doesn’t provide the guarantees we were taught it does. There’s no real benefit to “hiding” data inside an object, since that data may be referenced - privately or publicly - by any number of other objects. You might as well expose your instance variables with attr_reader for the inheritance benefits.

Disagree? Hit me up on Twitter.

What is attr_reader?