If you ran this code snippet, what would you expect the output to be? Take a moment to think about the answer before reading the following paragraph.
If you had asked me a few months ago, I likely would answered with ’{:bar=>"car"}’
, the rationale being something like: “When we pass in an object to puts
, to_s
gets called to do string conversion, and '{:bar=>"car"}'
is the string representation of the value returned by to_s
.” Seems reasonable, right?
When we actually run the code, we see the following output:
#<Foo:0x00007f7f7f160bb0>
Counterintuitive, right? We typically see the object’s class name and address (ie. #<Foo:0x00007f7f7f160bb0>
) in an interpolated string like that when we haven’t explicitly defined to_s
, but we did define that method.
Why are we printing out a reference to the parent Foo, rather than something related to our {:bar=>"car"}
hash?
Let’s dig in further and look at a slightly different code example:
If we run the code snippet above, our terminal prints this:
#<Foo:0x00007f843b0f1268>
{:bar=>"car"}
Counterintuitive indeed. When we pass in an instance of Foo to puts
, aren’t we expecting to_s
to be called under the hood? Why are we getting a different result when we explicitly call to_s
?
puts
is Ruby function that’s purely implemented in C, so we can’t just step with a debugger like pry
or byebu
g to find out more; puts
doesn’t have Ruby code to step into! But, we can read through the Ruby source code on Github: the io.c
file sounds like a promising place to read about puts
, and we find this definition of rb_io_puts
there:
C can be difficult to read compared to Ruby code, but this line of code looks promising: line = rb_obj_as_string(argv[i]);
. So, let’s read the definition of rb_obj_as_string
, which is found in the string.c
file:
str = rb_funcall(obj, idTo_s, 0)
calls our object’s to_s
method (this idTo_type
naming pattern is also found elsewhere in Ruby’s C source code for other built-in Ruby types, such as Array and Symbol). We then pass the result of to_s
into rb_obj_as_string_result
. How is rb_obj_as_string_result
defined in string.c
?
And this explains it! In the underlying implementation of puts
, rb_obj_as_string_result
explicitly checks if to_s
has returned a string. If we haven’t returned a string, that value is overridden, and we use the return value of rb_any_to_s
instead (ie. the function that returns a class name / address string like #<Foo:0x00007f7f7f160bb0>
).
This is why we’re printing a reference to Foo, and not anything to do with the actual hash – the value of to_s
is discarded because it’s not a string! This also explains the discrepancy between puts foo_instance
and puts foo_instance.to_s
– we pass in a hash to rb_obj_as_string
, meaning {:bar=>"car"}
is passed into rb_obj_as_string_result
, which does have a definition of to_s
that returns a string.
The way Ruby’s puts
function to overrides a value we explicitly return with to_s
can be unexpected if you haven’t seen it before, but upon reflection, I do think that this is sensible language design. The alternative would be for Ruby to recursively call our underlying rb_obj_as_string
on the value returned by to_s
until we get a string, but this introduces additional complexity for little benefit. At the end of the day, if we want to write clean code, any to_s
functions that we write should, well, return a string 🙂
Start your journey towards writing better software, and watch this space for new content.