Ruby OpenURI open() returns StringIO & FileIO

Ahh, the little things in life. I was hacking out some code the other day and I was doing something like…

report_data = open(report_url)
data_set = FasterCSV.read(report_data.path)
data_set.each { |row| coolness(row) }

And I ran into an error coming out of FasterCSV:

TypeError: can't convert nil into String

After a quick headache or two I realized calling .path on an unknown Class type might be a problem. While in my test code and production code I was always seeing a FileIO object returned from the open() method, the particular use case I was now going through was returning a StringIO from open(). StringIO does not have a .path method, obviously. The realization of why this was happening came from digging into the implementation of open-uri.rb in Ruby 1.8.7:

  class Buffer # :nodoc:
    def initialize
      @io = StringIO.new
      @size = 0
    end
    attr_reader :size
 
    StringMax = 10240
    def <<(str)
      @io << str
      @size += str.length
      if StringIO === @io && StringMax < @size
        require 'tempfile'
        io = Tempfile.new('open-uri')
        io.binmode
        Meta.init io, @io if @io.respond_to? :meta
        io << @io.string
        @io = io
      end
    end

The Buffer implementation for open-uri checks the size of the object before creating a Tempfile. Anything under 10k and you’re looking at a StringIO object.

Fortunately, FasterCSV will operate on an IO object…

# from
FasterCSV.read(report_data.path)
# to
FasterCSV.read(report_data)

It was still a little startling to see such a behavior (/optimization?) going on in open-uri.rb. Pretty cool, but this reminded me that I need a few more test cases to uncover behaviors on different data set sizes.

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *