Ruby Spreadsheets

A spreadsheet, as defined by Wikipedia, is

A sample spreadsheet

[…] a computer application that simulates a paper, accounting worksheet. It displays multiple cells that together make up a grid consisting of rows and columns, each cell containing either alphanumeric text or numeric values. A spreadsheet cell may alternatively contain a formula that defines how the contents of that cell is to be calculated from the contents of any other cell (or combination of cells) each time any cell is updated […]

Now what does it take to implement the scaffold of a spreadsheet application? Turns out, that task can be accomplished in moderate time.

Requirements

First, let’s state the range of functions to be implemented and those not to be implemented.

The spreadsheet scaffold needs to feature

  • Unlimited number of cells index on a two-dimensional grid.
  • Read and write explicit values from/to cells.
  • Specify and evaluate formulas for each cell that depends on values or formulas from other cells.

The spreadsheet scaffold won’t feature

  • A graphical representation of a spreadsheet.
  • Detect cyclic dependencies between cell formulas.

Design

The main entity is the Spreadsheet. A Spreadsheet contains cells and methods to work with those cells (read, write). One of the requirements states that the number of cells is unlimited, which leads to a design where cells are only allocated when they are needed (i.e. a value is assigned to them). Cells are therefore organized in sparse data structure (e.g a hash) and can be retrieved by a two-dimensional index.

An index, in this case, models the concept of a two-dimensional point. Not only numeric pairs such as [1,2] [3,4] are valid indices, but also alphanumeric combinations like [1,'A'] [2, 'AA']. However, to calculate the number of elements between two such pairs, a range must be definable on them.

One of the requirements allows a formula to be assigned to a cell. A formula itself is, based on Wikipedia, a concise way to express information symbolically. If you think about it, most of the code you produced uses symbols (variables that can take on any value) to express information. Instead of limiting ourself to predefined formulas by the spreadsheet system, we will use code itself to represent a formula. As such code is required to be self-contained, assignable to variables and evaluable at any time. Formulas will be functions of higher order that take on variables and output a new function.

Update munificent kindly pointed out that my previous conclusion to insist on closures was incorrect and that the correct term is higher order function)

Implementation

Based on the design requirements we need to choose the tools. Although a lot of languages will do the job (even C++ does, use function pointers or Boost.Function as a replacement for closures and Boost.Any to allow any value to be assigned to cells), I’ve chosen Ruby. Ruby simply because it requires no infrastructure (like projects in C++) to get started and is interpreted which allows interactive evaluation of ruby code inside Ruby.

The script example/workbook_a.rb contains an acceptance test illustrating the fundamental interaction with the spreadsheet application. Here is the listing:

# New spreadsheet, default 0 for empty cells
s  = Spreadsheet.new(0)

# Set values
s[2, 'AA'] = "Hello World!"
s[1, 'A'] = 10
s[1, 'B'] = 30

# Add a 'formula' that is evaluated
# when the cell is read.
s[2, 'A'] = Proc.new do
  s[1, 'A'] + s[1, 'B'] + s[1, 'C']
end

# Read cell
puts "Content of cell (2,'A') is #{s[2, 'A']}" # => 40
s[1, 'C'] = 40
puts "Content of cell (2,'A') is #{s[2, 'A']}" # => 80

The code, in lib/spreadsheet.rb, to bring spreadsheets alive fits a single page easily. Here’s the listing.

# A spread-sheet class with lazy-evaluation.
class Spreadsheet
  
  # Include methods for 2d block based iteration (see below in this post)
  include BlockEnumeration
  
  def initialize(default = 0)
    @cells = Hash.new(default)
  end
  
  # Read the content of the cell(r,c)
  def [](r, c)
    read_internal(r, c)
  end
  
  # Assign the value, v, to the cell(r,c)
  def []=(r, c, v)
    write_internal(r, c, v)
  end
  
  private
  
  # Internal read cell(r,c)
  def read_internal(r, c)
    v = @cells[ make_key(r, c) ]
    # If cell contains closure, evaluate
    # closure and all possibly nested closures
    if v.instance_of?(Proc)
      begin
        v = v.call(self)
      end while (v.instance_of?(Proc))
    end
    v
  end
  
  # Internal write cell(r,c)
  def write_internal(r, c, v)
    @cells[ make_key(r, c) ] = v
  end
  
  # Make key from row, column
  def make_key(r, c)
    [r, c]
  end
end

Here are a few short notes on the code: First, the hash value is simply a Ruby Array of the row and column index. In Ruby an array has the nice properties that the hash value calculated from the array depends on its elements, which is why we can use it to index multi-dimensional coordinates. Next, there is no difference in assigning an explicit value or closure to a cell. The difference is made when reading a cell. In case it is a closure the value is retrieved by evaluating the closure and possibly nested closures.

The module BlockEnumeration, see file lib/block_enum.rb, contains methods to enumerate elements in a block, specified by an upper-left and lower-right index, as well as methods to count the number elements in such a block. Ruby ranges are used to provide a convenient way to iterate over elements specified in a block. In Ruby ranges can be build from various data types. For Example

(0..3).to_a => [0, 1, 2, 3]
('a'..'d').to_a => ["a", "b", "c", "d"]

Convenience Functions

Convenience functions, such as calculating the average of elements inside a block, are provided by mixins. When necessary the spreadsheet is extended by these functionality

require 'lib/spreadsheet'
require 'lib/statistics'

a = Spreadsheet.new(0)
a[0,0] = 1.0
a[1,0] = lambda { |s| s[0,0] + s[0,1] }
a[0,1] = 2.0

a.extend(Statistics)

s = Spreadsheet.new(0)
s[0,0] = a.avg([0,0], [1,1])

Here, two the second spreadsheet calculates the average value of the block elements specified from the first spreadsheet on request (when it is read). In case you want to keep the block size dependent on other cells, you simply nest the average inside a formula

s = Spreadsheet.new(0)
s[0,0] = Proc.new do 
  a.avg([0,0], [s[1,1] ,s[1,2]])
end
s[1,1] = 1
s[1,2] = 2

puts s[0,0]

As always complete code can be found at my code-blog repository. See the link below.

Links

  • rspreadsheets – Ruby spreadsheet implementation as discussed in this blog.
Advertisements

One thought on “Ruby Spreadsheets

  1. Pingback: uberVU - social comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s