Ruby Hacking Guide

Translated by Sebastian Krause & ocha-

Chapter 5: Garbage Collection

A conception of an executing program

It’s all of a sudden but at the beginning of this chapter, we’ll learn about the memory space of an executing program. In this chapter we’ll step inside the lower level parts of a computer quite a bit, so without preliminary knowledge it’ll be hard to follow. And it’ll be also necessary for the following chapters. Once we finish this here, the rest will be easier.

Memory Segments

A general C program has the following parts in the memory space:

The text area is where the code lies. Obviously the second area holds static and global variables. Arguments and local variables of functions are piling up in the machine stack. The heap is the place where allocated by malloc().

Let’s talk a bit more about number three, the machine stack. Since it is called the machine “stack”, obviously it has a stack structure. In other words, new stuff is piled on top of it one after another. When we actually pushes values on the stack, each value would be a tiny piece such as int. But logically, there are a little larger pieces. They are called stack frames.

One stack frame corresponds to one function call. Or in other words when there is a function call, one stack frame is pushed. When doing return, one stack frame will be popped. Figure 1 shows the really simplified appearance of the machine stack.

figure 1: Machine Stack
figure 1: Machine Stack

In this picture, “above” is written above the top of the stack, but this it is not necessarily always the case that the machine stack goes from low addresses to high addresses. For instance, on the x86 machine the stack goes from high to low addresses.

alloca()

By using malloc(), we can get an arbitrarily large memory area of the heap. alloca() is the machine stack version of it. But unlike malloc() it’s not necessary to free the memory allocated with alloca(). Or one should say: it is freed automatically at the same moment of return of each function. That’s why it’s not possible to use an allocated value as the return value. It’s the same as “You must not return the pointer to a local variable.”

There’s been not any difficulty. We can consider it something to locally allocate an array whose size can be changed at runtime.

However there exist environments where there is no native alloca(). There are still many who would like to use alloca() even if in such environment, sometimes a function to do the same thing is written in C. But in that case, only the feature that we don’t have to free it by ourselves is implemented and it does not necessarily allocate the memory on the machine stack. In fact, it often does not. If it were possible, a native alloca() could have been implemented in the first place.

How can one implement alloca() in C? The simplest implementation is: first allocate memory normally with malloc(). Then remember the pair of the function which called alloca() and the assigned addresses in a global list. After that, check this list whenever alloca() is called, if there are the memories allocated for the functions already finished, free them by using free().

figure 2: The behavior of an `alloca(
figure 2: The behavior of an `alloca(

The missing/alloca.c of ruby is an example of an emulated alloca() .

Overview

From here on we can at last talk about the main subject of this chapter: garbage collection.

What is GC?

Objects are normally on top of the memory. Naturally, if a lot of objects are created, a lot of memory is used. If memory were infinite there would be no problem, but in reality there is always a memory limit. That’s why the memory which is not used anymore must be collected and recycled. More concretely the memory received through malloc() must be returned with free().

However, it would require a lot of efforts if the management of malloc() and free() were entirely left to programmers. Especially in object oriented programs, because objects are referring each other, it is difficult to tell when to release memory.

There garbage collection comes in. Garbage Collection (GC) is a feature to automatically detect and free the memory which has become unnecessary. With garbage collection, the worry “When should I have to free() ??” has become unnecessary. Between when it exists and when it does not exist, the ease of writing programs differs considerably.

By the way, in a book about something that I’ve read, there’s a description “the thing to tidy up the fragmented usable memory is GC”. This task is called “compaction”. It is compaction because it makes a thing compact. Because compaction makes memory cache more often hit, it has effects for speed-up to some extent, but it is not the main purpose of GC. The purpose of GC is to collect memory. There are many GCs which collect memories but don’t do compaction. The GC of ruby also does not do compaction.

Then, in what kind of system is GC available? In C and C++, there’s Boehm GC\footnote{Boehm GC http://www.hpl.hp.com/personal/Hans_Boehm/gc} which can be used as an add-on. And, for the recent languages such as Java and Perl, Python, C#, Eiffel, GC is a standard equipment. And of course, Ruby has its GC. Let’s follow the details of ruby’s GC in this chapter. The target file is gc.c.

What does GC do?

Before explaining the GC algorithm, I should explain “what garbage collection is”. In other words, what kind of state of the memory is “the unnecessary memory”?

To make descriptions more concrete, let’s simplify the structure by assuming that there are only objects and links. This would look as shown in Figure 3.

figure 3: Objects
figure 3: Objects

The objects pointed to by global variables and the objects on the stack of a language are surely necessary. And objects pointed to by instance variables of these objects are also necessary. Furthermore, the objects that are reachable by following links from these objects are also necessary.

To put it more logically, the necessary objects are all objects which can be reached recursively via links from the “surely necessary objects” as the start points. This is depicted in figure 4. What are on the left of the line are all “surely necessary objects”, and the objects which can be reached from them are colored black. These objects colored black are the necessary objects. The rest of the objects can be released.

figure 4: necessary objects and unnecessary objects
figure 4: necessary objects and unnecessary objects

In technical terms, “the surely necessary objects” are called “the roots of GC”. That’s because they are the roots of tree structures that emerges as a consequence of tracing necessary objects.

Mark and Sweep

GC was first implemented in Lisp. The GC implemented in Lisp at first, it means the world’s first GC, is called mark&sweep GC. The GC of ruby is one type of it.

The image of Mark-and-Sweep GC is pretty close to our definition of “necessary object”. First, put “marks” on the root objects. Setting them as the start points, put “marks” on all reachable objects. This is the mark phase.

At the moment when there’s not any reachable object left, check all objects in the object pool, release (sweep) all objects that have not marked. “Sweep” is the “sweep” of Minesweeper.

There are two advantages.

There are also two disadvantages.

When using the emacs editor, there sometimes appears Garbage collecting… and it completely stops reacting. That is an example of the second disadvantage. But this point can be alleviated by modifying the algorithm (it is called incremental GC).

Stop and Copy

Stop and Copy is a variation of Mark and Sweep. First, prepare several object areas. To simplify this description, assume there are two areas A and B here. And put an “active” mark on the one of the areas. When creating an object, create it only in the “active” one. (Figure 5)

figure 5: Stop and Copy (1)
figure 5: Stop and Copy (1

When the GC starts, follow links from the roots in the same manner as mark-and-sweep. However, move objects to another area instead of marking them (Figure 6). When all the links have been followed, discard the all elements which remain in A, and make B active next.

figure 6: Stop and Copy (2)
figure 6: Stop and Copy (2

Stop and Copy also has two advantages:

And also two disadvantages:

It seems what exist in this world are not only positive things.

Reference counting

Reference counting differs a bit from the aforementioned GCs, the reach-check code is distributed in several places.

First, attach an integer count to each element. When referring via variables or arrays, the counter of the referenced object is increased. When quitting to refer, decrease the counter. When the counter of an object becomes zero, release the object. This is the method called reference counting (Figure 7).

figure 7: Reference counting
figure 7: Reference counting

This method also has two advantages:

And also two disadvantages.

I’ll explain about the second point just in case. A cycle is a cycle of references as shown in Figure 8. If this is the case the counters will never decrease and the objects will never be released.

figure 8: Cycle
figure 8: Cycle

By the way, latest Python(2.2) uses reference counting GC but it can free cycles. However, it is not because of the reference counting itself, but because it sometimes invokes mark and sweep GC to check.

Object Management

Ruby’s garbage collection is only concerned with ruby objects. Moreover, it only concerned with the objects created and managed by ruby. Conversely speaking, if the memory is allocated without following a certain procedure, it won’t be taken care of. For instance, the following function will cause a memory leak even if ruby is running.

void not_ok()
{
    malloc(1024);  /* receive memory and discard it */
}

However, the following function does not cause a memory leak.

void this_is_ok()
{
    rb_ary_new();  /* create a ruby array and discard it */
}

Since rb_ary_new() uses Ruby’s proper interface to allocate memory, the created object is under the management of the GC of ruby, thus ruby will take care of it.

struct RVALUE

Since the substance of an object is a struct, managing objects means managing that structs. Of course the non-pointer objects like Fixnum Symbol nil true false are exceptions, but I won’t always describe about it to prevent descriptions from being redundant.

Each struct type has its different size, but probably in order to keep management simpler, a union of all the structs of built-in classes is declared and the union is always used when dealing with memory. The declaration of that union is as follows.

RVALUE

 211  typedef struct RVALUE {
 212      union {
 213          struct {
 214              unsigned long flags;   /* 0 if not used */
 215              struct RVALUE *next;
 216          } free;
 217          struct RBasic  basic;
 218          struct RObject object;
 219          struct RClass  klass;
 220          struct RFloat  flonum;
 221          struct RString string;
 222          struct RArray  array;
 223          struct RRegexp regexp;
 224          struct RHash   hash;
 225          struct RData   data;
 226          struct RStruct rstruct;
 227          struct RBignum bignum;
 228          struct RFile   file;
 229          struct RNode   node;
 230          struct RMatch  match;
 231          struct RVarmap varmap;
 232          struct SCOPE   scope;
 233      } as;
 234  } RVALUE;

(gc.c)

struct RVALUE is a struct that has only one element. I’ve heard that the reason why union is not directly used is to enable to easily increase its members when debugging or when extending in the future.

First, let’s focus on the first element of the union free.flags. The comment says “0 if not used”, but is it true? Is there not any possibility for free.flags to be 0 by chance?

As we’ve seen in Chapter 2: Objects, all object structs have struct RBasic as its first element. Therefore, by whichever element of the union we access, obj->as.free.flags means the same as it is written as obj->as.basic.flags. And objects always have the struct-type flag (such as T_STRING), and the flag is always not 0. Therefore, the flag of an “alive” object will never coincidentally be 0. Hence, we can confirm that setting their flags to 0 is necessity and sufficiency to represent “dead” objects.

Object heap

The memory for all the object structs has been brought together in global variable heaps. Hereafter, let’s call this an object heap.

▼ Object heap

 239  #define HEAPS_INCREMENT 10
 240  static RVALUE **heaps;
 241  static int heaps_length = 0;
 242  static int heaps_used   = 0;
 243
 244  #define HEAP_MIN_SLOTS 10000
 245  static int *heaps_limits;
 246  static int heap_slots = HEAP_MIN_SLOTS;

(gc.c)

heaps is an array of arrays of struct RVALUE. Since it is heapS, the each contained array is probably each heap. Each element of heap is each slot (Figure 9).

figure 9: `heaps`, `heap`, `slot`
figure 9: heaps, heap, slot

The length of heaps is heap_length and it can be changed. The number of the slots actually in use is heaps_used. The length of each heap is in the corresponding heaps_limits[index]. Figure 10 shows the structure of the object heap.

figure 10: conceptual diagram of `heaps` in memory
figure 10: conceptual diagram of heaps in memory

This structure has a necessity to be this way. For instance, if all structs are stored in an array, the memory space would be the most compact, but we cannot do realloc() because it could change the addresses. This is because VALUEs are mere pointers.

In the case of an implementation of Java, the counterpart of VALUEs are not addresses but the indexes of objects. Since they are handled through a pointer table, objects are movable. However in this case, indexing of the array comes in every time an object access occurs and it lowers the performance in some degree.

On the other hand, what happens if it is an one-dimensional array of pointers to RVALUEs (it means VALUEs)? This seems to be able to go well at the first glance, but it does not when GC. That is, as I’ll describe in detail, the GC of ruby needs to know the integers “which seems VALUE (the pointers to RVALUE). If all RVALUE are allocated in addresses which are far from each other, it needs to compare all address of RVALUE with all integers “which could be pointers”. This means the time for GC becomes the order more than O(n^2), and not acceptable.

According to these requirements, it is good that the object heap form a structure that the addresses are cohesive to some extent and whose position and total amount are not restricted at the same time.

freelist

Unused RVALUEs are managed by being linked as a single line which is a linked list that starts with freelist. The as.free.next of RVALUE is the link used for this purpose.

freelist

 236  static RVALUE *freelist = 0;

(gc.c)

add_heap()

As we understood the data structure, let’s read the function add_heap() to add a heap. Because this function contains a lot of lines not part of the main line, I’ll show the one simplified by omitting error handlings and castings.

add_heap() (simplified)

static void
add_heap()
{
    RVALUE *p, *pend;

    /* extend heaps if necessary */
    if (heaps_used == heaps_length) {
        heaps_length += HEAPS_INCREMENT;
        heaps        = realloc(heaps,        heaps_length * sizeof(RVALUE*));
        heaps_limits = realloc(heaps_limits, heaps_length * sizeof(int));
    }

    /* increase heaps by 1 */
    p = heaps[heaps_used] = malloc(sizeof(RVALUE) * heap_slots);
    heaps_limits[heaps_used] = heap_slots;
    pend = p + heap_slots;
    if (lomem == 0 || lomem > p) lomem = p;
    if (himem < pend) himem = pend;
    heaps_used++;
    heap_slots *= 1.8;

    /* link the allocated RVALUE to freelist */
    while (p < pend) {
        p->as.free.flags = 0;
        p->as.free.next = freelist;
        freelist = p;
        p++;
    }
}

Please check the following points.

Plus, since lomem and himem are modified only by this function, only by this function you can understand the mechanism. These variables hold the lowest and the highest addresses of the object heap. These values are used later when determining the integers “which seems VALUE”.

rb_newobj()

Considering all of the above points, we can tell the way to create an object in a second. If there is at least a RVALUE linked from freelist, we can use it. Otherwise, do GC or increase the heaps. Let’s confirm this by reading the rb_newobj() function to create an object.

rb_newobj()

 297  VALUE
 298  rb_newobj()
 299  {
 300      VALUE obj;
 301
 302      if (!freelist) rb_gc();
 303
 304      obj = (VALUE)freelist;
 305      freelist = freelist->as.free.next;
 306      MEMZERO((void*)obj, RVALUE, 1);
 307      return obj;
 308  }

(gc.c)

If freelest is 0, in other words, if there’s not any unused structs, invoke GC and create spaces. Even if we could not collect not any object, there’s no problem because in this case a new space is allocated in rb_gc(). And take a struct from freelist, zerofill it by MEMZERO(), and return it.

Mark

As described, ruby’s GC is Mark & Sweep. Its “mark” is, concretely speaking, to set a FL_MARK flag: look for unused VALUE, set FL_MARK flags to found ones, then look at the object heap after investigating all and free objects that FL_MARK has not been set.

rb_gc_mark()

rb_gc_mark() is the function to mark objects recursively.

rb_gc_mark()

 573  void
 574  rb_gc_mark(ptr)
 575      VALUE ptr;
 576  {
 577      int ret;
 578      register RVALUE *obj = RANY(ptr);
 579
 580      if (rb_special_const_p(ptr)) return; /* special const not marked */
 581      if (obj->as.basic.flags == 0) return;       /* free cell */
 582      if (obj->as.basic.flags & FL_MARK) return;  /* already marked */
 583
 584      obj->as.basic.flags |= FL_MARK;
 585
 586      CHECK_STACK(ret);
 587      if (ret) {
 588          if (!mark_stack_overflow) {
 589              if (mark_stack_ptr - mark_stack < MARK_STACK_MAX) {
 590                  *mark_stack_ptr = ptr;
 591                  mark_stack_ptr++;
 592              }
 593              else {
 594                  mark_stack_overflow = 1;
 595              }
 596          }
 597      }
 598      else {
 599          rb_gc_mark_children(ptr);
 600      }
 601  }

(gc.c)

The definition of RANY() is as follows. It is not particularly important.

RANY()

 295  #define RANY(o) ((RVALUE*)(o))

(gc.c)

There are the checks for non-pointers or already freed objects and the recursive checks for marked objects at the beginning,

obj->as.basic.flags |= FL_MARK;

and obj (this is the ptr parameter of this function) is marked. Then next, it’s the turn to follow the references from obj and mark. rb_gc_mark_children() does it.

The others, what starts with CHECK_STACK() and is written a lot is a device to prevent the machine stack overflow. Since rb_gc_mark() uses recursive calls to mark objects, if there is a big object cluster, it is possible to run short of the length of the machine stack. To counter that, if the machine stack is nearly overflow, it stops the recursive calls, piles up the objects on a global list, and later it marks them once again. This code is omitted because it is not part of the main line.

rb_gc_mark_children()

Now, as for rb_gc_mark_children(), it just lists up the internal types and marks one by one, thus it is not just long but also not interesting. Here, it is shown but the simple enumerations are omitted:

rb_gc_mark_children()

 603  void
 604  rb_gc_mark_children(ptr)
 605      VALUE ptr;
 606  {
 607      register RVALUE *obj = RANY(ptr);
 608
 609      if (FL_TEST(obj, FL_EXIVAR)) {
 610          rb_mark_generic_ivar((VALUE)obj);
 611      }
 612
 613      switch (obj->as.basic.flags & T_MASK) {
 614        case T_NIL:
 615        case T_FIXNUM:
 616          rb_bug("rb_gc_mark() called for broken object");
 617          break;
 618
 619        case T_NODE:
 620          mark_source_filename(obj->as.node.nd_file);
 621          switch (nd_type(obj)) {
 622            case NODE_IF:         /* 1,2,3 */
 623            case NODE_FOR:
 624            case NODE_ITER:
                /* ………… omitted ………… */
 749          }
 750          return;   /* not need to mark basic.klass */
 751      }
 752
 753      rb_gc_mark(obj->as.basic.klass);
 754      switch (obj->as.basic.flags & T_MASK) {
 755        case T_ICLASS:
 756        case T_CLASS:
 757        case T_MODULE:
 758          rb_gc_mark(obj->as.klass.super);
 759          rb_mark_tbl(obj->as.klass.m_tbl);
 760          rb_mark_tbl(obj->as.klass.iv_tbl);
 761          break;
 762
 763        case T_ARRAY:
 764          if (FL_TEST(obj, ELTS_SHARED)) {
 765              rb_gc_mark(obj->as.array.aux.shared);
 766          }
 767          else {
 768              long i, len = obj->as.array.len;
 769              VALUE *ptr = obj->as.array.ptr;
 770
 771              for (i=0; i < len; i++) {
 772                  rb_gc_mark(*ptr++);
 773              }
 774          }
 775          break;

            /* ………… omitted ………… */

 837        default:
 838          rb_bug("rb_gc_mark(): unknown data type 0x%x(0x%x) %s",
 839                 obj->as.basic.flags & T_MASK, obj,
 840                 is_pointer_to_heap(obj) ? "corrupted object"
                                             : "non object");
 841      }
 842  }

(gc.c)

It calls rb_gc_mark() recursively, is only what I’d like you to confirm. In the omitted part, NODE and T_xxxx are enumerated respectively. NODE will be introduced in Part 2.

Additionally, let’s see the part to mark T_DATA (the struct used for extension libraries) because there’s something we’d like to check. This code is extracted from the second switch statement.

rb_gc_mark_children() - T_DATA

 789        case T_DATA:
 790          if (obj->as.data.dmark) (*obj->as.data.dmark)(DATA_PTR(obj));
 791          break;

(gc.c)

Here, it does not use rb_gc_mark() or similar functions, but the dmark which is given from users. Inside it, of course, it might use rb_gc_mark() or something, but not using is also possible. For example, in an extreme situation, if a user defined object does not contain VALUE, there’s no need to mark.

rb_gc()

By now, we’ve finished to talk about each object. From now on, let’s see the function rb_gc() that presides the whole. The objects marked here are “objects which are obviously necessary”. In other words, “the roots of GC”.

rb_gc()

1110  void
1111  rb_gc()
1112  {
1113      struct gc_list *list;
1114      struct FRAME * volatile frame; /* gcc 2.7.2.3 -O2 bug??  */
1115      jmp_buf save_regs_gc_mark;
1116      SET_STACK_END;
1117
1118      if (dont_gc || during_gc) {
1119          if (!freelist) {
1120              add_heap();
1121          }
1122          return;
1123      }

          /* …… mark from the all roots …… */

1183      gc_sweep();
1184  }

(gc.c)

The roots which should be marked will be shown one by one after this, but I’d like to mention just one point here.

In ruby the CPU registers and the machine stack are also the roots. It means that the local variables and arguments of C are automatically marked. For example,

static int
f(void)
{
    VALUE arr = rb_ary_new();

    /* …… do various things …… */
}

like this way, we can protect an object just by putting it into a variable. This is a very significant trait of the GC of ruby. Because of this feature, ruby’s extension libraries are insanely easy to write.

However, what is on the stack is not only VALUE. There are a lot of totally unrelated values. How to resolve this is the key when reading the implementation of GC.

The Ruby Stack

First, it marks the (ruby’s) stack frames used by the interpretor. Since you will be able to find out who it is after reaching Part 3, you don’t have to think so much about it for now.

▼ Marking the Ruby Stack

1130      /* mark frame stack */
1131      for (frame = ruby_frame; frame; frame = frame->prev) {
1132          rb_gc_mark_frame(frame);
1133          if (frame->tmp) {
1134              struct FRAME *tmp = frame->tmp;
1135              while (tmp) {
1136                  rb_gc_mark_frame(tmp);
1137                  tmp = tmp->prev;
1138              }
1139          }
1140      }
1141      rb_gc_mark((VALUE)ruby_class);
1142      rb_gc_mark((VALUE)ruby_scope);
1143      rb_gc_mark((VALUE)ruby_dyna_vars);

(gc.c)

ruby_frame ruby_class ruby_scope ruby_dyna_vars are the variables to point to each top of the stacks of the evaluator. These hold the frame, the class scope, the local variable scope, and the block local variables at that time respectively.

Register

Next, it marks the CPU registers.

▼ marking the registers

1148      FLUSH_REGISTER_WINDOWS;
1149      /* Here, all registers must be saved into jmp_buf. */
1150      setjmp(save_regs_gc_mark);
1151      mark_locations_array((VALUE*)save_regs_gc_mark,
                               sizeof(save_regs_gc_mark) / sizeof(VALUE *));

(gc.c)

FLUSH_REGISTER_WINDOWS is special. We will see it later.

setjmp() is essentially a function to remotely jump, but the content of the registers are saved into the argument (which is a variable of type jmp_buf) as its side effect. Making use of this, it attempts to mark the content of the registers. Things around here really look like secret techniques.

However only djgpp and Human68k are specially treated. djgpp is a gcc environment for DOS. Human68k is an OS of SHARP X680x0 Series. In these two environments, the whole registers seem to be not saved only by the ordinary setjmp(), setjmp() is redefined as follows as an inline-assembler to explicitly write out the registers.

▼ the original version of `setjmp`

1072  #ifdef __GNUC__
1073  #if defined(__human68k__) || defined(DJGPP)
1074  #if defined(__human68k__)
1075  typedef unsigned long rb_jmp_buf[8];
1076  __asm__ (".even\n\                   2-byte alignment
1077  _rb_setjmp:\n\                       the label of rb_setjmp() function
1078          move.l  4(sp),a0\n\          load the first argument to the a0 register
1079          movem.l d3-d7/a3-a5,(a0)\n\  copy the registers to where a0 points to
1080          moveq.l #0,d0\n\             set 0 to d0 (as the return value)
1081          rts");                       return
1082  #ifdef setjmp
1083  #undef setjmp
1084  #endif
1085  #else
1086  #if defined(DJGPP)
1087  typedef unsigned long rb_jmp_buf[6];
1088  __asm__ (".align 4\n\                order 4-byte alignment
1089  _rb_setjmp:\n\                       the label for rb_setjmp() function
1090          pushl   %ebp\n\              push ebp to the stack
1091          movl    %esp,%ebp\n\         set the stack pointer to ebp
1092          movl    8(%ebp),%ebp\n\      pick up the first argument and set to ebp
1093          movl    %eax,(%ebp)\n\       in the followings, store each register
1094          movl    %ebx,4(%ebp)\n\        to where ebp points to
1095          movl    %ecx,8(%ebp)\n\
1096          movl    %edx,12(%ebp)\n\
1097          movl    %esi,16(%ebp)\n\
1098          movl    %edi,20(%ebp)\n\
1099          popl    %ebp\n\              restore ebp from the stack
1100          xorl    %eax,%eax\n\         set 0 to eax (as the return value)
1101          ret");                       return
1102  #endif
1103  #endif
1104  int rb_setjmp (rb_jmp_buf);
1105  #define jmp_buf rb_jmp_buf
1106  #define setjmp rb_setjmp
1107  #endif /* __human68k__ or DJGPP */
1108  #endif /* __GNUC__ */

(gc.c)

Alignment is the constraint when putting variables on memories. For example, in 32-bit machine int is usually 32 bits, but we cannot always take 32 bits from anywhere of memories. Particularly, RISC machine has strict constraints, it is decided like “from a multiple of 4 byte” or “from even byte”. When there are such constraints, memory access unit can be more simplified (thus, it can be faster). When there’s the constraint of “from a multiple of 4 byte”, it is called “4-byte alignment”.

Plus, in cc of djgpp or Human68k, there’s a rule that the compiler put the underline to the head of each function name. Therefore, when writing a C function in Assembler, we need to put the underline (_) to its head by ourselves. This type of constraints are techniques in order to avoid the conflicts in names with library functions. Also in UNIX, it is said that the underline had been attached by some time ago, but it almost disappears now.

Now, the content of the registers has been able to be written out into jmp_buf, it will be marked in the next code:

▼ mark the registers (shown again)

1151      mark_locations_array((VALUE*)save_regs_gc_mark,
                               sizeof(save_regs_gc_mark) / sizeof(VALUE *));

(gc.c)

This is the first time that mark_locations_array() appears. I’ll describe it in the next section.

mark_locations_array()

▼ `mark_locations_array()`

 500  static void
 501  mark_locations_array(x, n)
 502      register VALUE *x;
 503      register long n;
 504  {
 505      while (n--) {
 506          if (is_pointer_to_heap((void *)*x)) {
 507              rb_gc_mark(*x);
 508          }
 509          x++;
 510      }
 511  }

(gc.c)

This function is to mark the all elements of an array, but it slightly differs from the previous mark functions. Until now, each place to be marked is where we know it surely holds a VALUE (a pointer to an object). However this time, where it attempts to mark is the register space, it is enough to expect that there’re also what are not VALUE. To counter that, it tries to detect whether or not the value is a VALUE (a pointer), then if it seems, the value will be handled as a pointer. This kind of methods are called “conservative GC”. It seems that it is conservative because it “tentatively inclines things to the safe side”

Next, we’ll look at the function to check if “it looks like a VALUE”, it is is_pointer_to_heap().

is_pointer_to_heap()

▼ `is_pointer_to_heap()`

 480  static inline int
 481  is_pointer_to_heap(ptr)
 482      void *ptr;
 483  {
 484      register RVALUE *p = RANY(ptr);
 485      register RVALUE *heap_org;
 486      register long i;
 487
 488      if (p < lomem || p > himem) return Qfalse;
 489
 490      /* check if there's the possibility that p is a pointer */
 491      for (i=0; i < heaps_used; i++) {
 492          heap_org = heaps[i];
 493          if (heap_org <= p && p < heap_org + heaps_limits[i] &&
 494              ((((char*)p)-((char*)heap_org))%sizeof(RVALUE)) == 0)
 495              return Qtrue;
 496      }
 497      return Qfalse;
 498  }

(gc.c)

If I briefly explain it, it would look like the followings:

Since the mechanism is like this, it’s obviously possible that a non-VALUE value is mistakenly handled as a VALUE. But at least, it will never fail to find out the used VALUEs. And, with this amount of tests, it may rarely pick up a non-VALUE value unless it intentionally does. Therefore, considering about the benefits we can obtain by GC, it’s sufficient to compromise.

Register Window

This section is about FLUSH_REGISTER_WINDOWS() which has been deferred.

Register windows are the mechanism to enable to put a part of the machine stack into inside the CPU. In short, it is a cache whose purpose of use is narrowed down. Recently, it exists only in Sparc architecture. It’s possible that there are also VALUEs in register windows, and it’s also necessary to get down them into memory.

The content of the macro is like this:

▼ `FLUSH_REGISTER_WINDOWS`

 125  #if defined(sparc) || defined(__sparc__)
 126  # if defined(linux) || defined(__linux__)
 127  #define FLUSH_REGISTER_WINDOWS  asm("ta  0x83")
 128  # else /* Solaris, not sparc linux */
 129  #define FLUSH_REGISTER_WINDOWS  asm("ta  0x03")
 130  # endif
 131  #else /* Not a sparc */
 132  #define FLUSH_REGISTER_WINDOWS
 133  #endif

(defines.h)

asm(...) is a built-in assembler. However, even though I call it assembler, this instruction named ta is the call of a privileged instruction. In other words, the call is not of the CPU but of the OS. That’s why the instruction is different for each OS. The comments describe only about Linux and Solaris, but actually FreeBSD and NetBSD are also works on Sparc, so this comment is wrong.

Plus, if it is not Sparc, it is unnecessary to flush, thus FLUSH_REGISTER_WINDOWS is defined as nothing. Like this, the method to get a macro back to nothing is very famous technique that is also convenient when debugging.

Machine Stack

Then, let’s go back to the rest of rb_gc(). This time, it marks VALUESs in the machine stack.

▼ mark the machine stack

1152      rb_gc_mark_locations(rb_gc_stack_start, (VALUE*)STACK_END);
1153  #if defined(__human68k__)
1154      rb_gc_mark_locations((VALUE*)((char*)rb_gc_stack_start + 2),
1155                           (VALUE*)((char*)STACK_END + 2));
1156  #endif

(gc.c)

rb_gc_stack_start seems the start address (the end of the stack) and STACK_END seems the end address (the top). And, rb_gc_mark_locations() practically marks the stack space.

There are rb_gc_mark_locations() two times in order to deal with the architectures which are not 4-byte alignment. rb_gc_mark_locations() tries to mark for each portion of sizeof(VALUE), so if it is in 2-byte alignment environment, sometimes not be able to properly mark. In this case, it moves the range 2 bytes then marks again.

Now, rb_gc_stack_start, STACK_END, rb_gc_mark_locations(), let’s examine these three in this order.

Init_stack()

The first thing is rb_gc_starck_start. This variable is set only during Init_stack(). As the name Init_ might suggest, this function is called at the time when initializing the ruby interpretor.

▼ `Init_stack()`

1193  void
1194  Init_stack(addr)
1195      VALUE *addr;
1196  {
1197  #if defined(__human68k__)
1198      extern void *_SEND;
1199      rb_gc_stack_start = _SEND;
1200  #else
1201      VALUE start;
1202
1203      if (!addr) addr = &start;
1204      rb_gc_stack_start = addr;
1205  #endif
1206  #ifdef HAVE_GETRLIMIT
1207      {
1208          struct rlimit rlim;
1209
1210          if (getrlimit(RLIMIT_STACK, &rlim) == 0) {
1211              double space = (double)rlim.rlim_cur*0.2;
1212
1213              if (space > 1024*1024) space = 1024*1024;
1214              STACK_LEVEL_MAX = (rlim.rlim_cur - space) / sizeof(VALUE);
1215          }
1216      }
1217  #endif
1218  }

(gc.c)

What is important is only the part in the middle. It defines an arbitrary local variable (it is allocated on the stack) and it sets its address to rb_gc_stack_start. The _SEND inside the code for __human68k__ is probably the variable defined by a library of compiler or system. Naturally, you can presume that it is the contraction of Stack END.

Meanwhile, the code after that bundled by HAVE_GETRLIMIT appears to check the length of the stack and do mysterious things. This is also in the same context of what is done at rb_gc_mark_children() to prevent the stack overflow. We can ignore this.

STACK_END

Next, we’ll look at the STACK_END which is the macro to detect the end of the stack.

▼ `STACK_END`

 345  #ifdef C_ALLOCA
 346  # define SET_STACK_END VALUE stack_end; alloca(0);
 347  # define STACK_END (&stack_end)
 348  #else
 349  # if defined(__GNUC__) && defined(USE_BUILTIN_FRAME_ADDRESS)
 350  #  define SET_STACK_END  VALUE *stack_end = __builtin_frame_address(0)
 351  # else
 352  #  define SET_STACK_END  VALUE *stack_end = alloca(1)
 353  # endif
 354  # define STACK_END (stack_end)
 355  #endif

(gc.c)

As there are three variations of SET_STACK_END, let’s start with the bottom one. alloca() allocates a space at the end of the stack and returns it, so the return value and the end address of the stack should be very close. Hence, it considers the return value of alloca() as an approximate value of the end of the stack.

Let’s go back and look at the one at the top. When the macro C_ALLOCA is defined, alloca() is not natively defined, … in other words, it indicates a compatible function is defined in C. I mentioned that in this case alloca() internally allocates memory by using malloc(). However, it does not help to get the position of the stack at all. To deal with this situation, it determines that the local variable stack_end of the currently executing function is close to the end of the stack and uses its address (&stack_end).

Plus, this code contains alloca(0) whose purpose is not easy to see. This has been a feature of the alloca() defined in C since early times, and it means “please check and free the unused space”. Since this is used when doing GC, it attempts to free the memory allocated with alloca() at the same time. But I think it’s better to put it in another macro instead of mixing into such place …

And at last, in the middle case, it is about __builtin_frame_address(). __GNUC__ is a symbol defined in gcc (the compiler of GNU C). Since this is used to limit, it is a built-in instruction of gcc. You can get the address of the n-times previous stack frame with __builtin_frame_address(n). As for __builtin_frame_adress(0), it provides the address of the current frame.

rb_gc_mark_locations()

The last one is the rb_gc_mark_locations() function that actually marks the stack.

▼ `rb_gc_mark_locations()`

 513  void
 514  rb_gc_mark_locations(start, end)
 515      VALUE *start, *end;
 516  {
 517      VALUE *tmp;
 518      long n;
 519
 520      if (start > end) {
 521          tmp = start;
 522          start = end;
 523          end = tmp;
 524      }
 525      n = end - start + 1;
 526      mark_locations_array(start,n);
 527  }

(gc.c)

Basically, delegating to the function mark_locations_array() which marks a space is sufficient. What this function does is properly adjusting the arguments. Such adjustment is required because in which direction the machine stack extends is undecided. If the machine stack extends to lower addresses, end is smaller, if it extends to higher addresses, start is smaller. Therefore, so that the smaller one becomes start, they are adjusted here.

The other root objects

Finally, it marks the built-in VALUE containers of the interpretor.

▼ The other roots

1159      /* mark the registered global variables */
1160      for (list = global_List; list; list = list->next) {
1161          rb_gc_mark(*list->varptr);
1162      }
1163      rb_mark_end_proc();
1164      rb_gc_mark_global_tbl();
1165
1166      rb_mark_tbl(rb_class_tbl);
1167      rb_gc_mark_trap_list();
1168
1169      /* mark the instance variables of true, false, etc if exist */
1170      rb_mark_generic_ivar_tbl();
1171
          /* mark the variables used in the ruby parser (only while parsing) */
1172      rb_gc_mark_parser();

(gc.c)

When putting a VALUE into a global variable of C, it is required to register its address by user via rb_gc_register_address(). As these objects are saved in global_List, all of them are marked.

rb_mark_end_proc() is to mark the procedural objects which are registered via kind of END statement of Ruby and executed when a program finishes. (END statements will not be described in this book).

rb_gc_mark_global_tbl() is to mark the global variable table rb_global_tbl. (See also the next chapter “Variables and Constants”)

rb_mark_tbl(rb_class_tbl) is to mark rb_class_tbl which was discussed in the previous chapter.

rb_gc_mark_trap_list() is to mark the procedural objects which are registered via the Ruby’s function-like method trap. (This is related to signals and will also not be described in this book.)

rb_mark_generic_ivar_tbl() is to mark the instance variable table prepared for non-pointer VALUE such as true.

rb_gc_mark_parser() is to mark the semantic stack of the parser. (The semantic stack will be described in Part 2.)

Until here, the mark phase has been finished.

Sweep

The special treatment for NODE

The sweep phase is the procedures to find out and free the not-marked objects. But, for some reason, the objects of type T_NODE are specially treated. Take a look at the next part:

▼ at the beggining of `gc_sweep()`

 846  static void
 847  gc_sweep()
 848  {
 849      RVALUE *p, *pend, *final_list;
 850      int freed = 0;
 851      int i, used = heaps_used;
 852
 853      if (ruby_in_compile && ruby_parser_stack_on_heap()) {
 854          /* If the yacc stack is not on the machine stack,
 855             do not collect NODE while parsing */
 856          for (i = 0; i < used; i++) {
 857              p = heaps[i]; pend = p + heaps_limits[i];
 858              while (p < pend) {
 859                  if (!(p->as.basic.flags & FL_MARK) &&
                                          BUILTIN_TYPE(p) == T_NODE)
 860                      rb_gc_mark((VALUE)p);
 861                  p++;
 862              }
 863          }
 864      }

(gc.c)

NODE is a object to express a program in the parser. NODE is put on the stack prepared by a tool named yacc while compiling, but that stack is not always on the machine stack. Concretely speaking, when ruby_parser_stack_on_heap() is false, it indicates it is not on the machine stack. In this case, a NODE could be accidentally collected in the middle of its creation, thus the objects of type T_NODE are unconditionally marked and protected from being collected while compiling (ruby_in_compile) .

Finalizer

After it has reached here, all not-marked objects can be freed. However, there’s one thing to do before freeing. In Ruby the freeing of objects can be hooked, and it is necessary to call them. This hook is called “finalizer”.

▼ `gc_sweep()` Middle

 869      freelist = 0;
 870      final_list = deferred_final_list;
 871      deferred_final_list = 0;
 872      for (i = 0; i < used; i++) {
 873          int n = 0;
 874
 875          p = heaps[i]; pend = p + heaps_limits[i];
 876          while (p < pend) {
 877              if (!(p->as.basic.flags & FL_MARK)) {
 878  (A)             if (p->as.basic.flags) {
 879                      obj_free((VALUE)p);
 880                  }
 881  (B)             if (need_call_final && FL_TEST(p, FL_FINALIZE)) {
 882                      p->as.free.flags = FL_MARK; /* remains marked */
 883                      p->as.free.next = final_list;
 884                      final_list = p;
 885                  }
 886                  else {
 887                      p->as.free.flags = 0;
 888                      p->as.free.next = freelist;
 889                      freelist = p;
 890                  }
 891                  n++;
 892              }
 893  (C)         else if (RBASIC(p)->flags == FL_MARK) {
 894                  /* the objects that need to finalize */
 895                  /* are left untouched */
 896              }
 897              else {
 898                  RBASIC(p)->flags &= ~FL_MARK;
 899              }
 900              p++;
 901          }
 902          freed += n;
 903      }
 904      if (freed < FREE_MIN) {
 905          add_heap();
 906      }
 907      during_gc = 0;

(gc.c)

This checks all over the object heap from the edge, and frees the object on which FL_MARK flag is not set by using obj_free() (A). obj_free() frees, for instance, only char[] used by String objects or VALUE[] used by Array objects, but it does not free the RVALUE struct and does not touch basic.flags at all. Therefore, if a struct is manipulated after obj_free() is called, there’s no worry about going down.

After it frees the objects, it branches based on FL_FINALIZE flag (B). If FL_FINALIZE is set on an object, since it means at least a finalizer is defined on the object, the object is added to final_list. Otherwise, the object is immediately added to freelist. When finalizing, basic.flags becomes FL_MARK. The struct-type flag (such as T_STRING) is cleared because of this, and the object can be distinguished from alive objects.

Then, this phase completes by executing the all finalizers. Notice that the hooked objects have already died when calling the finalizers. It means that while executing the finalizers, one cannot use the hooked objects.

▼ `gc_sweep()` the rest

 910      if (final_list) {
 911          RVALUE *tmp;
 912
 913          if (rb_prohibit_interrupt || ruby_in_compile) {
 914              deferred_final_list = final_list;
 915              return;
 916          }
 917
 918          for (p = final_list; p; p = tmp) {
 919              tmp = p->as.free.next;
 920              run_final((VALUE)p);
 921              p->as.free.flags = 0;
 922              p->as.free.next = freelist;
 923              freelist = p;
 924          }
 925      }
 926  }

(gc.c)

The for in the last half is the main finalizing procedure. The if in the first half is the case when the execution could not be moved to the Ruby program for various reasons. The objects whose finalization is deferred will be appear in the route ==(C)== of the previous list.

rb_gc_force_recycle()

I’ll talk about a little different thing at the end. Until now, the ruby’s garbage collector decides whether or not it collects each object, but there’s also a way that users explicitly let it collect a particular object. It’s rb_gc_force_recycle().

▼ `rb_gc_force_recycle()`

 928  void
 929  rb_gc_force_recycle(p)
 930      VALUE p;
 931  {
 932      RANY(p)->as.free.flags = 0;
 933      RANY(p)->as.free.next = freelist;
 934      freelist = RANY(p);
 935  }

(gc.c)

Its mechanism is not so special, but I introduced this because you’ll see it several times in Part 2 and Part 3.

Discussions

To free spaces

The space allocated by an individual object, say, char[] of String, is freed during the sweep phase, but the code to free the RVALUE struct itself has not appeared yet. And, the object heap also does not manage the number of structs in use and such. This means that if the ruby’s object space is once allocated it would never be freed.

For example, the mailer what I’m creating now temporarily uses the space almost 40M bytes when constructing the threads for 500 mails, but if most of the space becomes unused as the consequence of GC it will keep occupying the 40M bytes. Because my machine is also kind of modern, it does not matter if just the 40M bytes are used. But, if this occurs in a server which keeps running, there’s the possibility of becoming a problem.

However, one also need to consider that free() does not always mean the decrease of the amount of memory in use. If it does not return memory to OS, the amount of memory in use of the process never decrease. And, depending on the implementation of malloc(), although doing free() it often does not cause returning memory to OS.

… I had written so, but just before the deadline of this book, RVALUE became to be freed. The attached CD-ROM also contains the edge ruby, so please check by diff. … what a sad ending.

Generational GC

Mark & Sweep has an weak point, it is “it needs to touch the entire object space at least once”. There’s the possibility that using the idea of Generational GC can make up for the weak point.

The fundamental of Generational GC is the experiential rule that “Most objects are lasting for either very long or very short time”. You may be convinced about this point by thinking for seconds about the programs you write.

Then, thinking based on this rule, one may come up with the idea that “long-lived objects do not need to be marked or swept each and every time”. Once an object is thought that it will be long-lived, it is treated specially and excluded from the GC target. Then, for both marking and sweeping, it can significantly decrease the number of target objects. For example, if half of the objects are long-lived at a particular GC time, the number of the target objects is half.

There’s a problem, though. Generational GC is very difficult to do if objects can’t be moved. It is because the long-lived objects are, as I just wrote, needed to “be treated specially”. Since generational GC decreases the number of the objects dealt with and reduces the cost, if which generation a object belongs to is not clearly categorized, as a consequence it is equivalent to dealing with both generations. Furthermore, the ruby’s GC is also a conservative GC, so it also has to be created so that is_pointer_to_heap() work. This is particularly difficult.

How to solve this problem is … By the hand of Mr. Kiyama Masato, the implementation of Generational GC for ruby has been published. I’ll briefly describe how this patch deals with each problem. And this time, by courtesy of Mr. Kiyama, this Generational GC patch and its paper are contained in attached CD-ROM. (See also doc/generational-gc.html)

Then, I shall start the explanation. In order to ease explaining, from now on, the long-lived objects are called as “old-generation objects”, the short-lived objects are called as “new-generation objects”,

First, about the biggest problem which is the special treatment for the old-generation objects. This point is resolved by linking only the new-generation objects into a list named newlist. This list is substantialized by increasing RVALUE’s elements.

Second, about the way to detect the old-generation objects. It is very simply done by just removing the newlist objects which were not garbage collected from the newlist. In other words, once an object survives through GC, it will be treated as an old-generation object.

Third, about the way to detect the references from old-generation objects to new-generation objects. In Generational GC, it’s sort of, the old-generation objects keep being in the marked state. However, when there are links from old-generation to new-generation, the new-generation objects will not be marked. (Figure 11)

figure 11: reference over generations
figure 11: reference over generations

This is not good, so at the moment when an old-generational object refers to a new-generational object, the new-generational object must be turned into old-generational. The patch modifies the libraries and adds checks to where there’s possibility that this kind of references happens.

This is the outline of its mechanism. It was scheduled that this patch is included ruby 1.7, but it has not been included yet. It is said that the reason is its speed, There’s an inference that the cost of the third point “check all references” matters, but the precise cause has not figured out.

Compaction

Could the ruby’s GC do compaction? Since VALUE of ruby is a direct pointer to a struct, if the address of the struct are changed because of compaction, it is necessary to change the all VALUEs that point to the moved structs.

However, since the ruby’s GC is a conservative GC, “the case when it is impossible to determine whether or not it is really a VALUE” is possible. Changing the value even though in this situation, if it was not VALUE something awful will happen. Compaction and conservative GC are really incompatible.

But, let’s contrive countermeasures in one way or another. The first way is to let VALUE be an object ID instead of a pointer. (Figure 12) It means sandwiching a indirect layer between VALUE and a struct. In this way, as it’s not necessary to rewrite VALUE, structs can be safely moved. But as trade-offs, accessing speed slows down and the compatibility of extension libraries is lost.

figure 12: reference through the object ID
figure 12: reference through the object ID

Then, the next way is to allow moving the struct only when they are pointed from only the pointers that “is surely VALUE” (Figure 13). This method is called Mostly-copying garbage collection. In the ordinary programs, there are not so many objects that is_pointer_to_heap() is true, so the probability of being able to move the object structs is quite high.

figure 13: Mostly-copying garbage collection
figure 13: Mostly-copying garbage collection

Moreover and moreover, by enabling to move the struct, the implementation of Generational GC becomes simple at the same time. It seems to be worth to challenge.

volatile to protect from GC

I wrote that GC takes care of VALUE on the stack, therefore if a VALUE is located as a local variable the VALUE should certainly be marked. But in reality due to the effects of optimization, it’s possible that the variables disappear. For example, there’s a possibility of disappearing in the following case:

VALUE str;
str = rb_str_new2("...");
printf("%s\n", RSTRING(str)->ptr);

Because this code does not access the str itself, some compilers only keeps str->ptr in memory and deletes the str. If this happened, the str would be collected and the process would be down. There’s no choice in this case

volatile VALUE str;

we need to write this way. volatile is a reserved word of C, and it has an effect of forbidding optimizations that have to do with this variable. If volatile was attached in the code relates to Ruby, you could assume almost certainly that its exists for GC. When I read K & R, I thought “what is the use of this?”, and totally didn’t expect to see the plenty of them in ruby.

Considering these aspects, the promise of the conservative GC “users don’t have to care about GC” seems not always true. There was once a discussion that “the Scheme’s GC named KSM does not need volatile”, but it seems it could not be applied to ruby because its algorithm has a hole.

When to invoke

Inside gc.c

When to invoke GC? Inside gc.c, there are three places calling rb_gc() inside of gc.c,

As for ruby_xmalloc() and ruby_xrealloc(), it is when failing to allocate memory. Doing GC may free memories and it’s possible that a space becomes available again. rb_newobj() has a similar situation, it invokes when freelist becomes empty.

Inside the interpritor

There’s several places except for gc.c where calling rb_gc() in the interpretor.

First, in io.c and dir.c, when it runs out of file descriptors and could not open, it invokes GC. If IO objects are garbage collected, it’s possible that the files are closed and file descriptors become available.

In ruby.c, rb_gc() is sometimes done after loading a file. As I mentioned in the previous Sweep section, it is to compensate for the fact that NODE cannot be garbage collected while compiling.

Object Creation

We’ve finished about GC and come to be able to deal with the Ruby objects from its creation to its freeing. So I’d like to describe about object creations here. This is not so related to GC, rather, it is related a little to the discussion about classes in the previous chapter.

Allocation Framework

We’ve created objects many times. For example, in this way:

class C
end
C.new()

At this time, how does C.new create a object?

First, C.new is actually Class#new. Its actual body is this:

▼ `rb_class_new_instance()`

 725  VALUE
 726  rb_class_new_instance(argc, argv, klass)
 727      int argc;
 728      VALUE *argv;
 729      VALUE klass;
 730  {
 731      VALUE obj;
 732
 733      obj = rb_obj_alloc(klass);
 734      rb_obj_call_init(obj, argc, argv);
 735
 736      return obj;
 737  }

(object.c)

rb_obj_alloc() calls the allocate method against the klass. In other words, it calls C.allocate in this example currently explained. It is Class#allocate by default and its actual body is rb_class_allocate_instance().

▼ `rb_class_allocate_instance()`

 708  static VALUE
 709  rb_class_allocate_instance(klass)
 710      VALUE klass;
 711  {
 712      if (FL_TEST(klass, FL_SINGLETON)) {
 713          rb_raise(rb_eTypeError,
                       "can't create instance of virtual class");
 714      }
 715      if (rb_frame_last_func() != alloc) {
 716          return rb_obj_alloc(klass);
 717      }
 718      else {
 719          NEWOBJ(obj, struct RObject);
 720          OBJSETUP(obj, klass, T_OBJECT);
 721          return (VALUE)obj;
 722      }
 723  }

(object.c)

rb_newobj() is a function that returns a RVALUE by taking from the freelist. NEWOBJ() is just a rb_newobj() with type-casting. The OBJSETUP() is a macro to initialize the struct RBasic part, you can think that this exists only in order not to forget to set the FL_TAINT flag.

The rest is going back to rb_class_new_instance(), then it calls rb_obj_call_init(). This function calls initialize on the just created object, and the initialization completes.

This is summarized as follows:

SomeClass.new            = Class#new (rb_class_new_instance)
    SomeClass.allocate       = Class#allocate (rb_class_allocate_instance)
    SomeClass#initialize     = Object#initialize (rb_obj_dummy)

I could say that the allocate class method is to physically initialize, the initialize is to logically initialize. The mechanism like this, in other words the mechanism that an object creation is divided into allocate / initialize and new presides them, is called the “allocation framework”.

Creating User Defined Objects

Next, we’ll examine about the instance creations of the classes defined in extension libraries. As it is called user-defined, its struct is not decided, without telling how to allocate it, ruby don’t understand how to create its object. Let’s look at how to tell it.

Data_Wrap_Struct()

Whichever it is user-defined or not, its creation mechanism itself can follow the allocation framework. It means that when defining a new SomeClass class in C, we overwrite both SomeClass.allocate and SomeClass#initialize.

Let’s look at the allocate side first. Here, it does the physical initialization. What is necessary to allocate? I mentioned that the instance of the user-defined class is a pair of struct RData and a user-prepared struct. We’ll assume that the struct is of type struct my. In order to create a VALUE based on the struct my, you can use Data_Wrap_Struct(). This is how to use:

struct my *ptr = malloc(sizeof(struct my));  /* arbitrarily allocate in the heap */
VALUE val = Data_Wrap_Struct(data_class, mark_f, free_f, ptr);

data_class is the class that val belongs to, ptr is the pointer to be wrapped. mark_f is (the pointer to) the function to mark this struct. However, this does not mark the ptr itself and is used when the struct pointed by ptr contains VALUE. On the other hand, free_f is the function to free the ptr itself. The argument of the both functions is ptr. Going back a little and reading the code to mark may help you to understand things around here in one shot.

Let’s also look at the content of Data_Wrap_Struct().

▼ `Data_Wrap_Struct()`

 369  #define Data_Wrap_Struct(klass, mark, free, sval) \
 370      rb_data_object_alloc(klass, sval,             \
                               (RUBY_DATA_FUNC)mark,    \
                               (RUBY_DATA_FUNC)free)

 365  typedef void (*RUBY_DATA_FUNC) _((void*));

(ruby.h)

Most of it is delegated to rb_object_alloc().

▼ `rb_data_object_alloc()`

 310  VALUE
 311  rb_data_object_alloc(klass, datap, dmark, dfree)
 312      VALUE klass;
 313      void *datap;
 314      RUBY_DATA_FUNC dmark;
 315      RUBY_DATA_FUNC dfree;
 316  {
 317      NEWOBJ(data, struct RData);
 318      OBJSETUP(data, klass, T_DATA);
 319      data->data = datap;
 320      data->dfree = dfree;
 321      data->dmark = dmark;
 322
 323      return (VALUE)data;
 324  }

(gc.c)

This is not complicated. As the same as the ordinary objects, it prepares a RVALUE by using NEWOBJ() OBJSETUP(), and sets the members.

Here, let’s go back to allocate. We’ve succeeded to create a VALUE by now, so the rest is putting it in an arbitrary function and defining the function on a class by rb_define_singleton_method().

Data_Get_Struct()

The next thing is initialize. Not only for initialize, the methods need a way to pull out the struct my* from the previously created VALUE. In order to do it, you can use the Data_Get_Struct() macro.

▼ `Data_Get_Struct()`

 378  #define Data_Get_Struct(obj,type,sval) do {\
 379      Check_Type(obj, T_DATA); \
 380      sval = (type*)DATA_PTR(obj);\
 381  } while (0)

 360  #define DATA_PTR(dta) (RDATA(dta)->data)

(ruby.h)

As you see, it just takes the pointer (to struct my) from a member of RData. This is simple. Check_Type() just checks the struct type.

The Issues of the Allocation Framework

So, I’ve explained innocently until now, but actually the current allocation framework has a fatal issue. I just described that the object created with allocate appears to the initialize or the other methods, but if the passed object that was created with allocate is not of the same class, it must be a very serious problem. For example, if the object created with the default Objct.allocate (Class#allocate) is passed to the method of String, this cause a serious problem. That is because even though the methods of String are written based on the assumption that a struct of type struct RString is given, the given object is actually a struct RObject. In order to avoid such situation, the object created with C.allocate must be passed only to the methods of C or its subclasses.

Of course, this is always true when things are ordinarily done. As C.allocate creates the instance of the class C, it is not passed to the methods of the other classes. As an exception, it is possible that it is passed to the method of Object, but the methods of Object does not depend on the struct type.

However, what if it is not ordinarily done? Since C.allocate is exposed at the Ruby level, though I’ve not described about them yet, by making use of alias or super or something, the definition of allocate can be moved to another class. In this way, you can create an object whose class is String but whose actual struct type is struct RObject. It means that you can freely let ruby down from the Ruby level. This is a problem.

The source of the issue is that allocate is exposed to the Ruby level as a method. Conversely speaking, a solution is to define the content of allocate on the class by using a way that is anything but a method. So,

rb_define_allocator(rb_cMy, my_allocate);

an alternative like this is currently in discussion.