Ruby Hacking Guide

Translated by Sebastian Krause

Chapter 1: Introduction

A Minimal Introduction to Ruby

Here the Ruby prerequisites are explained, which one needs to know in order to understand the first section. I won’t point out programming techniques or points one should be careful about. So don’t think you’ll be able to write Ruby programs just because you read this chapter. Readers who have prior experience with Ruby can skip this chapter.

We will talk about grammar extensively in the second section, hence I won’t delve into the finer points of grammar here. From hash literals and such I’ll show only the most widely used notations. On principle I won’t omit things even if I can. This way the syntax becomes more simple. I won’t always say “We can omit this”.

Objects

Strings

Everything that can be manipulated in a Ruby program is an object. There are no primitives as Java’s int and long. For instance if we write as below it denotes a string object with content content.

"content"

I casually called it a string object but to be precise this is an expression which generates a string object. Therefore if we write it several times each time another string object is generated.

"content"
"content"
"content"

Here three string objects with content content are generated.

By the way, objects just existing there can’t be seen by programmers. Let’s show how to print them on the terminal.

p("content")   # Shows "content"

Everything after an # is a comment. From now on, I’ll put the result of an expression in a comment behind.

p(……) calls the function p. It displays arbitrary objects “as such”. It’s basically a debugging function.

Precisely speaking, there are no functions in Ruby, but just for now we can think of it as a function. You can use functions wherever you are.

Various Literals

Now, let’s explain some more the expressions which directly generate objects, the so-called literals. First the integers and floating point numbers.

# Integer
1
2
100
9999999999999999999999999   # Arbitrarily big integers

# Float
1.0
99.999
1.3e4     # 1.3×10^4

Don’t forget that these are all expressions which generate objects. I’m repeating myself but there are no primitives in Ruby.

Below an array object is generated.

[1, 2, 3]

This program generates an array which consists of the three integers 1, 2 and 3 in that order. As the elements of an array can be arbitrary objects the following is also possible.

[1, "string", 2, ["nested", "array"]]

And finally, a hash table is generated by the expression below.

{"key"=>"value", "key2"=>"value2", "key3"=>"value3"}

A hash table is a structure which expresses one-to-one relationships between arbitrary objects. The above line creates a table which stores the following relationships.

"key"   →  "value"
"key2"  →  "value2"
"key3"  →  "value3"

If we ask a hash table created in this way “What’s corresponding to key?”, it’ll answer “That’s value.” How can we ask? We use methods.

Method Calls

We can call methods on an object. In C++ Jargon they are member functions. I don’t think it’s necessary to explain what a method is. I’ll just explain the notation.

"content".upcase()

Here the upcase method is called on a string object ( with content content). As upcase is a method which returns a new string with the small letters replaced by capital letters, we get the following result.

p("content".upcase())   # Shows "CONTENT"

Method calls can be chained.

"content".upcase().downcase()

Here the method downcase is called on the return value of "content".upcase().

There are no public fields (member variables) as in Java or C++. The object interface consists of methods only.

The Program

Top Level

In Ruby we can just write expressions and it becomes a program. One doesn’t need to define a main() as in C++ or Java.

p("content")

This is a complete Ruby program. If we put this into a file called first.rb we can execute it from the command line as follows.

% ruby first.rb
"content"

With the -e option of the ruby program we don’t even need to create a file.

% ruby -e 'p("content")'
"content"

By the way, the place where p is written is the lowest nesting level of the program, it means the highest level from the program’s standpoint, thus it’s called “top-level”. Having top-level is a characteristic trait of Ruby as a scripting language.

In Ruby, one line is usually one statement. A semicolon at the end isn’t necessary. Therefore the program below is interpreted as three statements.

p("content")
p("content".upcase())
p("CONTENT".downcase())

When we execute it it looks like this.

% ruby second.rb
"content"
"CONTENT"
"content"

Local Variables

In Ruby all variables and constants store references to objects. That’s why one can’t copy the content by assigning one variable to another variable. Variables of type Object in Java or pointers to objects in C++ are good to think of. However, you can’t change the value of each pointer itself.

In Ruby one can tell the classification (scope) of a variable by the beginning of the name. Local variables start with a small letter or an underscore. One can write assignments by using “=”.

str = "content"
arr = [1,2,3]

An initial assignment serves as declaration, an explicit declaration is not necessary. Because variables don’t have types, we can assign any kind of objects indiscriminately. The program below is completely legal.

lvar = "content"
lvar = [1,2,3]
lvar = 1

But even if we can, we don’t have to do it. If different kind of objects are put in one variable, it tends to become difficult to read. In a real world Ruby program one doesn’t do this kind of things without a good reason. The above was just an example for the sake of it.

Variable reference has also a pretty sensible notation.

str = "content"
p(str)           # Shows "content"

In addition let’s check the point that a variable hold a reference by taking an example.

a = "content"
b = a
c = b

After we execute this program all three local variables a b c point to the same object, a string object with content "content" created on the first line (Figure 1).

figure 1: Ruby variables store references to objects
figure 1: Ruby variables store references to objects

By the way, as these variables are called local, they should be local to somewhere, but we cannot talk about this scope without reading a bit further. Let’s say for now that the top level is one local scope.

Constants

Constants start with a capital letter. They can only be assigned once (at their creation).

Const = "content"
PI = 3.1415926535

p(Const)   # Shows "content"

I’d like to say that if we assign twice an error occurs. But there is just a warning, not an error. It is in this way in order to avoid raising an error even when the same file is loaded twice in applications that manipulate Ruby program itself, for instance in development environments. Therefore, it is allowed due to practical requirements and there’s no other choice, but essentially there should be an error. In fact, up until version 1.1 there really was an error.

C = 1
C = 2   # There is a warning but ideally there should be an error.

A lot of people are fooled by the word constant. A constant only does not switch objects once it is assigned. But it does not mean the pointed object itself won’t change. The term “read only” might capture the concept better than “constant”.

By the way, to indicate that an object itself shouldn’t be changed another means is used: freeze.

figure 2: constant means read only
figure 2: constant means read only

And the scope of constants is actually also cannot be described yet. It will be discussed later in the next section mixing with classes.

Control Structures

Since Ruby has a wide abundance of control structures, just lining up them can be a huge task. For now, I just mention that there are if and while.

if i < 10 then
  # body
end

while i < 10 do
  # body
end

In a conditional expression, only the two objects, false and nil, are false and all other various objects are true. 0 or the empty string are also true of course.

It wouldn’t be wise if there were just false, there is also true. And it is of course true.

Classes and Methods

Classes

In object oriented system, essentially methods belong to objects. It can hold only in a ideal world, though. In a normal program there are a lot of objects which have the same set of methods, it would be an enormous work if each object remember the set of callable methods. Usually a mechanism like classes or multimethods is used to get rid of the duplication of definitions.

In Ruby, as the traditional way to bind objects and methods together, the concept of classes is used. Namely every object belongs to a class, the methods which can be called are determined by the class. And in this way, an object is called “an instance of the XX class”.

For example the string "str" is an instance of the String class. And on this String class the methods upcase, downcase, strip and many others are defined. So it looks as if each string object can respond to all these methods.

# They all belong to the String class,
# hence the same methods are defined
       "content".upcase()
"This is a pen.".upcase()
    "chapter II".upcase()

       "content".length()
"This is a pen.".length()
    "chapter II".length()

By the way, what happens if the called method isn’t defined? In a static language a compiler error occurs but in Ruby there is a runtime exception. Let’s try it out. For this kind of programs the -e option is handy.

% ruby -e '"str".bad_method()'
-e:1: undefined method `bad_method' for "str":String (NoMethodError)

When the method isn’t found there’s apparently a NoMethodError.

Always saying “the upcase method of String” and such is cumbersome. Let’s introduce a special notation String#upcase refers to the method upcase defined in the class String.

By the way, if we write String.upcase it has a completely different meaning in the Ruby world. What could that be? I explain it in the next paragraph.

Class Definition

Up to now we talked about already defined classes. We can of course also define our own classes. To define classes we use the class statement.

class C
end

This is the definition of a new class C. After we defined it we can use it as follows.

class C
end
c = C.new()   # create an instance of C and assign it to the variable c

Note that the notation for creating a new instance is not new C. The astute reader might think: Hmm, this C.new() really looks like a method call. In Ruby the object generating expressions are indeed just methods.

In Ruby class names and constant names are the same. Then, what is stored in the constant whose name is the same as a class name? In fact, it’s the class. In Ruby all things which a program can manipulate are objects. So of course classes are also expressed as objects. Let’s call these class objects. Every class is an instance of the class Class.

In other words a class statement creates a new class object and it assigns a constant named with the classname to the class. On the other hand the generation of an instance references this constant and calls a method on this object ( usually new). If we look at the example below, it’s pretty obvious that the creation of an instance doesn’t differ from a normal method call.

S = "content"
class C
end

S.upcase()  # Get the object the constant S points to and call upcase
C.new()     # Get the object the constant C points to and call new

So new is not a reserved word in Ruby.

And we can also use p for an instance of a class even immediately after its creation.

class C
end

c = C.new()
p(c)       # #<C:0x2acbd7e4>

It won’t display as nicely as a string or an integer but it shows its respective class and it’s internal ID. This ID is the pointer value which points to the object.

Oh, I completely forgot to mention about the notation of method names: Object.new means the class object Object and the new method called on the class itself. So Object#new and Object.new are completely different things, we have to separate them strictly.

obj = Object.new()   # Object.new
obj.new()            # Object#new

In practice a method Object#new is almost never defined so the second line will return an error. Please regard this as an example of the notation.

Method Definition

Even if we can define classes, it is useless if we cannot define methods. Let’s define a method for our class C.

class C
  def myupcase( str )
    return str.upcase()
  end
end

To define a method we use the def statement. In this example we defined the method myupcase. The name of the only parameter is str. As with variables, it’s not necessary to write parameter types or the return type. And we can use any number of parameters.

Let’s use the defined method. Methods are usually called from the outside by default.

c = C.new()
result = c.myupcase("content")
p(result)   # Shows "CONTENT"

Of course if you get used to it you don’t need to assign every time. The line below gives the same result.

p(C.new().myupcase("content"))   # Also shows "CONTENT"

self

During the execution of a method the information about who is itself (the instance on which the method was called) is always saved and can be picked up in self. Like the this in C++ or Java. Let’s check this out.

class C
  def get_self()
    return self
  end
end

c = C.new()
p(c)              # #<C:0x40274e44>
p(c.get_self())   # #<C:0x40274e44>

As we see, the above two expressions return the exact same object. We could confirm that self is c during the method call on c.

Then what is the way to call a method on itself? What first comes to mind is calling via self.

class C
  def my_p( obj )
    self.real_my_p(obj)   # called a method against oneself
  end

  def real_my_p( obj )
    p(obj)
  end
end

C.new().my_p(1)   # Output 1

But always adding the self when calling an own method is tedious. Hence, it is designed so that one can omit the called method (the receiver) whenever one calls a method on self.

class C
  def my_p( obj )
    real_my_p(obj)   # You can call without specifying the receiver
  end

  def real_my_p( obj )
    p(obj)
  end
end

C.new().my_p(1)   # Output 1

Instance Variables

As there are a saying “Objects are data and code”, just being able to define methods alone would be not so useful. Each object must also be able to to store data. In other words instance variables. Or in C++ jargon member variables.

In the fashion of Ruby’s variable naming convention, the variable type can be determined by the first a few characters. For instance variables it’s an @.

class C
  def set_i(value)
    @i = value
  end

  def get_i()
    return @i
  end
end

c = C.new()
c.set_i("ok")
p(c.get_i())   # Shows "ok"

Instance variables differ a bit from the variables seen before: We can reference them without assigning (defining) them. To see what happens we add the following lines to the code above.

c = C.new()
p(c.get_i())   # Shows nil

Calling get without set gives nil. nil is the object which indicates “nothing”. It’s mysterious that there’s really an object but it means nothing, but that’s just the way it is.

We can use nil like a literal as well.

p(nil)   # Shows nil

initialize

As we saw before, when we call ‘new’ on a freshly defined class, we can create an instance. That’s sure, but sometimes we might want to have a peculiar instantiation. In this case we don’t change the new method, we define the initialize method. When we do this, it gets called within new.

class C
  def initialize()
    @i = "ok"
  end
  def get_i()
    return @i
  end
end
c = C.new()
p(c.get_i())   # Shows "ok"

Strictly speaking this is the specification of the new method but not the specification of the language itself.

Inheritance

Classes can inherit from other classes. For instance String inherits from Object. In this book, we’ll indicate this relation by a vertical arrow as in Fig.3.

figure 3: Inheritance
figure 3: Inheritance

In the case of this illustration, the inherited class (Object) is called superclass or superior class. The inheriting class (String) is called subclass or inferior class. This point differs from C++ jargon, be careful. But it’s the same as in Java.

Anyway let’s try it out. Let our created class inherit from another class. To inherit from another class ( or designate a superclass) write the following.

class C < SuperClassName
end

When we leave out the superclass like in the cases before the class Object becomes tacitly the superclass.

Now, why should we want to inherit? Of course to hand over methods. Handing over means that the methods which were defined in the superclass also work in the subclass as if they were defined in there once more. Let’s check it out.

class C
  def hello()
    return "hello"
  end
end

class Sub < C
end

sub = Sub.new()
p(sub.hello())   # Shows "hello"

hello was defined in the class C but we could call it on an instance of the class Sub as well. Of course we don’t need to assign variables. The above is the same as the line below.

p(Sub.new().hello())

By defining a method with the same name, we can overwrite the method. In C++ and Object Pascal (Delphi) it’s only possible to overwrite functions explicitly defined with the keyword virtual but in Ruby every method can be overwritten unconditionally.

class C
  def hello()
    return "Hello"
  end
end

class Sub < C
  def hello()
    return "Hello from Sub"
  end
end

p(Sub.new().hello())   # Shows "Hello from Sub"
p(C.new().hello())     # Shows "Hello"

We can inherit over several steps. For instance as in Fig.4 Fixnum inherits every method from Object, Numeric and Integer. When there are methods with the same name the nearer classes take preference. As type overloading isn’t there at all the requisites are extremely straightforward.

figure 4: Inheritance over multiple steps
figure 4: Inheritance over multiple steps

In C++ it’s possible to create a class which inherits nothing. While in Ruby one has to inherit from the Object class either directly or indirectly. In other words when we draw the inheritance relations it becomes a single tree with Object at the top. For example, when we draw a tree of the inheritance relations among the important classes of the basic library, it would look like Fig.5.

figure 5: Ruby's class tree
figure 5: Ruby's class tree

Once the superclass is appointed ( in the definition statement ) it’s impossible to change it. In other words, one can add a new class to the class tree but cannot change a position or delete a class.

Inheritance of Variables……?

In Ruby (instance) variables aren’t inherited. Even though trying to inherit, a class does not know about what variables are going to be used.

But when an inherited method is called ( in an instance of a subclass), assignment of instance variables happens. Which means they become defined. Then, since the namespace of instance variables is completely flat based on each instance, it can be accessed by a method of whichever class.

class A
  def initialize()   # called from when processing new()
    @i = "ok"
  end
end

class B < A
  def print_i()
    p(@i)
  end
end

B.new().print_i()   # Shows "ok"

If you can’t agree with this behavior, let’s forget about classes and inheritance. When there’s an instance obj of the class C, then think as if all the methods of the superclass of C are defined in C. Of course we keep the overwrite rule in mind. Then the methods of C get attached to the instance obj (Fig.6). This strong palpability is a specialty of Ruby’s object orientation.

figure 6: A conception of a Ruby object
figure 6: A conception of a Ruby object

Modules

Only a single superclass can be designated. So Ruby looks like single inheritance. But because of modules it has in practice the ability which is identical to multiple inheritance. Let’s explain these modules next.

In short, modules are classes for which a superclass cannot be designated and instances cannot be created. For the definition we write as follows.

module M
end

Here the module M was defined. Methods are defined exactly the same way as for classes.

module M
  def myupcase( str )
    return str.upcase()
  end
end

But because we cannot create instances, we cannot call them directly. To do that, we use the module by “including” it into other classes. Then we become to be able to deal with it as if a class inherited the module.

module M
  def myupcase( str )
    return str.upcase()
  end
end

class C
  include M
end

p(C.new().myupcase("content"))  # "CONTENT" is shown

Even though no method was defined in the class C we can call the method myupcase. It means it “inherited” the method of the module M. Inclusion is functionally completely the same as inheritance. There’s no limit on defining methods or accessing instance variables.

I said we cannot specify any superclass of a module, but other modules can be included.

module M
end

module M2
  include M
end

In other words it’s functionally the same as appointing a superclass. But a class cannot come above a module. Only modules are allowed above modules.

The example below also contains the inheritance of methods.

module OneMore
  def method_OneMore()
    p("OneMore")
  end
end

module M
  include OneMore

  def method_M()
    p("M")
  end
end

class C
  include M
end

C.new().method_M()         # Output "M"
C.new().method_OneMore()   # Output "OneMore"

As with classes when we sketch inheritance it looks like Fig.7

figure 7: multilevel inclusion
figure 7: multilevel inclusion

Besides, the class C also has a superclass. How is its relationship to modules? For instance, let’s think of the following case.

# modcls.rb

class Cls
  def test()
    return "class"
  end
end

module Mod
  def test()
    return "module"
  end
end

class C < Cls
  include Mod
end

p(C.new().test())   # "class"? "module"?

C inherits from Cls and includes Mod. Which will be shown in this case, "class" or "module"? In other words, which one is “closer”, class or module? We’d better ask Ruby about Ruby, thus let’s execute it:

% ruby modcls.rb
"module"

Apparently a module takes preference before the superclass.

In general, in Ruby when a module is included, it would be inherited by going in between the class and the superclass. As a picture it might look like Fig.8.

figure 8: The relation between modules and classes
figure 8: The relation between modules and classes

And if we also taking the modules included in the module into accounts, it would look like Fig.9.

figure 9: The relation between modules and classes(2
figure 9: The relation between modules and classes(2

The Program revisited

Caution. This section is extremely important and explaining the elements which are not easy to mix with for programmers who have only used static languages before. For other parts just skimming is sufficient, but for only this part I’d like you to read it carefully. The explanation will also be relatively attentive.

Nesting of Constants

First a repetition of constants. As a constant begins with a capital letter the definition goes as follows.

Const = 3

Now we reference the constant in this way.

p(Const)   # Shows 3

Actually we can also write this.

p(::Const)   # Shows 3 in the same way.

The :: in front shows that it’s a constant defined at the top level. You can think of the path in a filesystem. Assume there is a file vmunix in the root directory. Being at / one can write vmunix to access the file. One can also write /vmunix as its full path. It’s the same with Const and ::Const. At top level it’s okay to write only Const or to write the full path ::Const

And what corresponds to a filesystem’s directories in Ruby? That should be class and module definition statements. However mentioning both is cumbersome, so I’ll just subsume them under class definition. When one enters a class definition the level for constants rises ( as if entering a directory).

class SomeClass
  Const = 3
end

p(::SomeClass::Const)   # Shows 3
p(  SomeClass::Const)   # The same. Shows 3

SomeClass is defined at toplevel. Hence one can reference it by writing either SomeClass or ::SomeClass. And as the constant Const nested in the class definition is a Const “inside SomeClass”, It becomes ::SomeClass::Const.

As we can create a directory in a directory, we can create a class inside a class. For instance like this:

class C        # ::C
  class C2     # ::C::C2
    class C3   # ::C::C2::C3
    end
  end
end

By the way, for a constant defined in a class definition statement, should we always write its full name? Of course not. As with the filesystem, if one is inside the same class definition one can skip the ::. It becomes like that:

class SomeClass
  Const = 3
  p(Const)   # Shows 3.
end

“What?” you might think. Surprisingly, even if it is in a class definition statement, we can write a program which is going to be executed. People who are used to only static languages will find this quite exceptional. I was also flabbergasted the first time I saw it.

Let’s add that we can of course also view a constant inside a method. The reference rules are the same as within the class definition (outside the method).

class C
  Const = "ok"
  def test()
    p(Const)
  end
end

C.new().test()   # Shows "ok"

Everything is executed

Looking at the big picture I want to write one more thing. In Ruby almost the whole parts of program is “executed”. Constant definitions, class definitions and method definitions and almost all the rest is executed in the apparent order.

Look for instance at the following code. I used various constructions which have been used before.

 1:  p("first")
 2:
 3:  class C < Object
 4:    Const = "in C"
 5:
 6:    p(Const)
 7:
 8:    def myupcase(str)
 9:       return str.upcase()
10:    end
11:  end
12:
13:  p(C.new().myupcase("content"))

This program is executed in the following order:

line description
1: p("first") Shows "first"
3: < Object The constant Object is referenced and the class object Object is gained
3: class C A new class object with superclass Object is generated, and assigned to the constant C
4: Const = "in C" Assigning the value "in C" to the constant ::C::Const
6: p(Const) Showing the constant ::C::Const hence "in C"
8: def myupcase(...)...end Define C#myupcase
13: C.new().myupcase(...) Refer the constant C, call the method new on it, and then myupcase on the return value
9: return str.upcase() Returns "CONTENT"
13: p(...) Shows "CONTENT"

The Scope of Local Variables

At last we can talk about the scope of local variables.

The toplevel, the interior of a class definition, the interior of a module definition and a method body are all have each completely independent local variable scope. In other words, the lvar variables in the following program are all different variables, and they do not influence each other.

lvar = 'toplevel'

class C
  lvar = 'in C'
  def method()
    lvar = 'in C#method'
  end
end

p(lvar)   # Shows "toplevel"

module M
  lvar = 'in M'
end

p(lvar)   # Shows "toplevel"

self as context

Previously, I said that during method execution oneself (an object on which the method was called) becomes self. That’s true but only half true. Actually during the execution of a Ruby program, self is always set wherever it is. It means there’s self also at the top level or in a class definition statement.

For instance the self at the toplevel is main. It’s an instance of the Object class which is nothing special. main is provided to set up self for the time being. There’s no deeper meaning attached to it.

Hence the toplevel’s self i.e. main is an instance of Object, such that one can call the methods of Object there. And in Object the module Kernel is included. In there the function-flavor methods like p and puts are defined (Fig.10). That’s why one can call puts and p also at the toplevel.

figure 10: `main`, `Object` and `Kernel`
figure 10: main, Object and Kernel

Thus p isn’t a function, it’s a method. Just because it is defined in Kernel and thus can be called like a function as “its own” method wherever it is or no matter what the class of self is. Therefore, there aren’t functions in the true sense, there are only methods.

By the way, besides p and puts there are the function-flavor methods print, puts, printf, sprintf, gets, fork, and exec and many more with somewhat familiar names. When you look at the choice of names you might be able to imagine Ruby’s character.

Well, since self is setup everywhere, self should also be in a class definition in the same way. The self in the class definition is the class itself (the class object). Hence it would look like this.

class C
  p(self)   # C
end

What should this be good for? In fact, we’ve already seen an example in which it is very useful. This one.

module M
end
class C
  include M
end

This include is actually a method call to the class object C. I haven’t mentioned it yet but the parentheses around arguments can be omitted for method calls. And I omitted the parentheses around include such that it doesn’t look like a method call because we have not finished the talk about class definition statement.

Loading

In Ruby the loading of libraries also happens at runtime. Normally one writes this.

require("library_name")

The impression isn’t false, require is a method. It’s not even a reserved word. When it is written this way, loading is executed on the line it is written, and the execution is handed over to (the code of) the library. As there is no concept like Java packages in Ruby, when we’d like to separate namespaces, it is done by putting files into a directory.

require("somelib/file1")
require("somelib/file2")

And in the library usually classes and such are defined with class statements or module statements. The constant scope of the top level is flat without the distinction of files, so one can see classes defined in another file without any special preparation. To partition the namespace of class names one has to explicitly nest modules as shown below.

# example of the namespace partition of net library
module Net
  class SMTP
    # ...
  end
  class POP
    # ...
  end
  class HTTP
    # ...
  end
end

More about Classes

The talk about Constants still goes on

Up to now we used the filesystem metaphor for the scope of constants, but I want you to completely forget that.

There is more about constants. Firstly one can also see constants in the “outer” class.

Const = "ok"
class C
  p(Const)   # Shows "ok"
end

The reason why this is designed in this way is because this becomes useful when modules are used as namespaces. Let’s explain this by adding a few things to the previous example of net library.

module Net
  class SMTP
    # Uses Net::SMTPHelper in the methods
  end
  class SMTPHelper   # Supports the class Net::SMTP
  end
end

In such case, it’s convenient if we can refer to it also from the SMTP class just by writing SMTPHelper, isn’t it? Therefore, it is concluded that “it’s convenient if we can see the outer classes”.

The outer class can be referenced no matter how many times it is nesting. When the same name is defined on different levels, the one which will first be found from within will be referred to.

Const = "far"
class C
  Const = "near" # This one is closer than the one above
  class C2
    class C3
      p(Const)   # "near" is shown
    end
  end
end

There’s another way of searching constants. If the toplevel is reached when going further and further outside then the own superclass is searched for the constant.

class A
  Const = "ok"
end
class B < A
  p(Const)   # "ok" is shown
end

Really, that’s pretty complicated.

Let’s summarize. When looking up a constant, first the outer classes is searched then the superclasses. This is quite contrived, but let’s assume a class hierarchy as follows.

class A1
end
class A2 < A1
end
class A3 < A2
  class B1
  end
  class B2 < B1
  end
  class B3 < B2
    class C1
    end
    class C2 < C1
    end
    class C3 < C2
      p(Const)
    end
  end
end

When the constant Const in C3 is referenced, it’s looked up in the order depicted in Fig.11.

figure 11: Search order for constants
figure 11: Search order for constants

Be careful about one point. The superclasses of the classes outside, for instance A1 and B2, aren’t searched at all. If it’s outside once it’s always outside and if it’s superclass once it’s always superclass. Otherwise, the number of classes searched would become too big and the behavior of such complicated thing would become unpredictable.

Metaclasses

I said that a method can be called on if it is an object. I also said that the methods that can be called are determined by the class of an object. Then shouldn’t there be a class for class objects? (Fig.12)

figure 12: A class of classes?
figure 12: A class of classes?

In this kind of situation, in Ruby, we can check in practice. It’s because there’s “a method which returns the class (class object) to which an object itself belongs”, Object#class.

p("string".class())   # String is shown
p(String.class())     # Class is shown
p(Object.class())     # Class is shown

Apparently String belongs to the class named Class. Then what’s the class of Class?

p(Class.class())      # Class is shown

Again Class. In other words, whatever object it is, by following like .class().class().class() …, it would reach Class in the end, then it will stall in the loop (Fig.13).

figure 13: The class of the class of the class...
figure 13: The class of the class of the class...

Class is the class of classes. And what has a recursive structure as “X of X” is called a meta-X. Hence Class is a metaclass.

Metaobjects

Let’s change the target and think about modules. As modules are also objects, there also should be a class for them. Let’s see.

module M
end
p(M.class())   # Module is shown

The class of a module seems to be Module. And what should be the class of the class Module?

p(Module.class())   # Class

It’s again Class

Now we change the direction and examine the inheritance relationships. What’s the superclass of Class and Module? In Ruby, we can find it out with Class#superclass.

p(Class.superclass())    # Module
p(Module.superclass())   # Object
p(Object.superclass())   # nil

So Class is a subclass of Module. Based on these facts, Figure 14 shows the relationships between the important classes of Ruby.

figure 14: The class relationship between the important Ruby classes
figure 14: The class relationship between the important Ruby classes

Up to now we used new and include without any explanation, but finally I can explain their true form. new is really a method defined for the class Class. Therefore on whatever class, (because it is an instance of Class), new can be used immediately. But new isn’t defined in Module. Hence it’s not possible to create instances in a module. And since include is defined in the Module class, it can be called on both modules and classes.

These three classes Object, Module and class are objects that support the foundation of Ruby. We can say that these three objects describe the Ruby’s object world itself. Namely they are objects which describe objects. Hence, Object Module Class are Ruby’s “meta-objects”.

Singleton Methods

I said that methods can be called if it is an object. I also said that the methods that can be called are determined by the object’s class. However I think I also said that ideally methods belong to objects. Classes are just a means to eliminate the effort of defining the same method more than once.

Actually In Ruby there’s also a means to define methods for individual objects (instances) not depending on the class. To do this, you can write this way.

obj = Object.new()
def obj.my_first()
  puts("My first singleton method")
end
obj.my_first()   # Shows My first singleton method

As you already know Object is the root for every class. It’s very unlikely that a method whose name is so weird like my_first is defined in such important class. And obj is an instance of Object. However the method my_first can be called on obj. Hence we have created without doubt a method which has nothing to do with the class the object belongs to. These methods which are defined for each object individually are called singleton methods.

When are singleton methods used? First, it is used when defining something like static methods of Java or C++. In other words methods which can be used without creating an instance. These methods are expressed in Ruby as singleton methods of a class object.

For example in UNIX there’s a system call unlink. This command deletes a file entry from the filesystem. In Ruby it can be used directly as the singleton method unlink of the File class. Let’s try it out.

File.unlink("core")  # deletes the coredump

It’s cumbersome to say “the singleton method unlink of the object File”. We simply write File.unlink. Don’t mix it up and write File#unlink, or vice versa don’t write File.write for the method write defined in File.

▼ A summary of the method notation

_. notation _. the target object _. example
File.unlink the Fileclass itself File.unlink("core")
File#write an instance of File f.write("str")

Class Variables

Class variables were added to Ruby from 1.6 on, they are a relatively new mechanism. As with constants, they belong to a class, and they can be referenced and assigned from both the class and its instances. Let’s look at an example. The beginning of the name is @@.

class C
  @@cvar = "ok"
  p(@@cvar)      # "ok" is shown

  def print_cvar()
    p(@@cvar)
  end
end

C.new().print_cvar()  # "ok" is shown

As the first assignment serves as the definition, a reference before an assignment like the one shown below leads to a runtime error. There is an ´@´ in front but the behavior differs completely from instance variables.

% ruby -e '
class C
  @@cvar
end
'
-e:3: uninitialized class variable @@cvar in C (NameError)

Here I was a bit lazy and used the -e option. The program is the three lines between the single quotes.

Class variables are inherited. Or saying it differently, a variable in a superior class can be assigned and referenced in the inferior class.

class A
  @@cvar = "ok"
end

class B < A
  p(@@cvar)            # Shows "ok"
  def print_cvar()
    p(@@cvar)
  end
end

B.new().print_cvar()   # Shows "ok"

Global Variables

At last there are also global variables. They can be referenced from everywhere and assigned everywhere. The first letter of the name is a $.

$gvar = "global variable"
p($gvar)   # Shows "global variable"

As with instance variables, all kinds of names can be considered defined for global variables before assignments. In other words a reference before an assignment gives a nil and doesn’t raise an error.


Copyright (c) 2002-2004 Minero Aoki, All rights reserved.

English Translation: Sebastian Krause skra@pantolog.de