Ruby Hacking Guide

Chapter 17: Dynamic evaluation

Overview

I have already finished to describe about the mechanism of the evaluator by the previous chapter. In this chapter, by including the parser in addition to it, let’s examine the big picture as “the evaluator in a broad sense”. There are three targets: eval, Module#module_eval and Object#instance_eval.

eval

I’ve already described about eval, but I’ll introduce more tiny things about it here.

By using eval, you can compile and evaluate a string at runtime in the place. Its return value is the value of the last expression of the program.

p eval("1 + 1")   # 2

You can also refer to a variable in its scope from inside of a string to eval.

lvar = 5
@ivar = 6
p eval("lvar + @ivar")   # 11

Readers who have been reading until here cannot simply read and pass over the word “its scope”. For instance, you are curious about how is its “scope” of constants, aren’t you? I am. To put the bottom line first, basically you can think it directly inherits the environment of outside of eval.

And you can also define methods and define classes.

def a
  eval('class C;  def test() puts("ok") end   end')
end

a()          # define class C and C#test
C.new.test   # shows ok

Moreover, as mentioned a little in the previous chapter, when you pass a Proc as the second argument, the string can be evaluated in its environment.

def new_env
  n = 5
  Proc.new { nil }   # turn the environment of this method into an object and return it
end

p eval('n * 3', new_env())   # 15

module_eval and instance_eval

When a Proc is passed as the second argument of eval, the evaluations can be done in its environment. module_eval and instance_eval is its limited (or shortcut) version. With module_eval, you can evaluate in an environment that is as if in a module statement or a class statement.

lvar = "toplevel lvar"   # a local variable to confirm this scope

module M
end
M.module_eval(<<'EOS')   # a suitable situation to use here-document
    p lvar   # referable
    p self   # shows M
    def ok   # define M#ok
      puts 'ok'
    end
EOS

With instance_eval, you can evaluate in an environment whose self of the singleton class statement is the object.

lvar = "toplevel lvar"   # a local variable to confirm this scope

obj = Object.new
obj.instance_eval(<<'EOS')
    p lvar   # referable
    p self   # shows #<Object:0x40274f5c>
    def ok   # define obj.ok
      puts 'ok'
    end
EOS

Additionally, these module_eval and instance_eval can also be used as iterators, a block is evaluated in each environment in that case. For instance,

obj = Object.new
p obj                 # #<Object:0x40274fac>
obj.instance_eval {
    p self            # #<Object:0x40274fac>
}

Like this.

However, between the case when using a string and the case when using a block, the behavior around local variables is different each other. For example, when creating a block in the a method then doing instance_eval it in the b method, the block would refer to the local variables of a. When creating a string in the a method then doing instance_eval it in the b method, from inside of the string, it would refer to the local variables of b. The scope of local variables is decided “at compile time”, the consequence differs because a string is compiled every time but a block is compiled when loading files.

eval

eval()

The eval of Ruby branches many times based on the presence and absence of the parameters. Let’s assume the form of call is limited to the below:

eval(prog_string, some_block)

Then, since this makes the actual interface function rb_f_eval() almost meaningless, we’ll start with the function eval() which is one step lower. The function prototype of eval() is:

static VALUE
eval(VALUE self, VALUE src, VALUE scope, char *file, int line);

scope is the Proc of the second parameter. file and line is the file name and line number of where a string to eval is supposed to be located. Then, let’s see the content:

▼ `eval()` (simplified)

4984  static VALUE
4985  eval(self, src, scope, file, line)
4986      VALUE self, src, scope;
4987      char *file;
4988      int line;
4989  {
4990      struct BLOCK *data = NULL;
4991      volatile VALUE result = Qnil;
4992      struct SCOPE * volatile old_scope;
4993      struct BLOCK * volatile old_block;
4994      struct RVarmap * volatile old_dyna_vars;
4995      VALUE volatile old_cref;
4996      int volatile old_vmode;
4997      volatile VALUE old_wrapper;
4998      struct FRAME frame;
4999      NODE *nodesave = ruby_current_node;
5000      volatile int iter = ruby_frame->iter;
5001      int state;
5002
5003      if (!NIL_P(scope)) {  /* always true now */
5009          Data_Get_Struct(scope, struct BLOCK, data);
5010          /* push BLOCK from data */
5011          frame = data->frame;
5012          frame.tmp = ruby_frame; /* to prevent from GC */
5013          ruby_frame = &(frame);
5014          old_scope = ruby_scope;
5015          ruby_scope = data->scope;
5016          old_block = ruby_block;
5017          ruby_block = data->prev;
5018          old_dyna_vars = ruby_dyna_vars;
5019          ruby_dyna_vars = data->dyna_vars;
5020          old_vmode = scope_vmode;
5021          scope_vmode = data->vmode;
5022          old_cref = (VALUE)ruby_cref;
5023          ruby_cref = (NODE*)ruby_frame->cbase;
5024          old_wrapper = ruby_wrapper;
5025          ruby_wrapper = data->wrapper;
5032          self = data->self;
5033          ruby_frame->iter = data->iter;
5034      }
5045      PUSH_CLASS();
5046      ruby_class = ruby_cbase;  /* == ruby_frame->cbase */
5047
5048      ruby_in_eval++;
5049      if (TYPE(ruby_class) == T_ICLASS) {
5050          ruby_class = RBASIC(ruby_class)->klass;
5051      }
5052      PUSH_TAG(PROT_NONE);
5053      if ((state = EXEC_TAG()) == 0) {
5054          NODE *node;
5055
5056          result = ruby_errinfo;
5057          ruby_errinfo = Qnil;
5058          node = compile(src, file, line);
5059          if (ruby_nerrs > 0) {
5060              compile_error(0);
5061          }
5062          if (!NIL_P(result)) ruby_errinfo = result;
5063          result = eval_node(self, node);
5064      }
5065      POP_TAG();
5066      POP_CLASS();
5067      ruby_in_eval--;
5068      if (!NIL_P(scope)) {  /* always true now */
5069          int dont_recycle = ruby_scope->flags & SCOPE_DONT_RECYCLE;
5070
5071          ruby_wrapper = old_wrapper;
5072          ruby_cref  = (NODE*)old_cref;
5073          ruby_frame = frame.tmp;
5074          ruby_scope = old_scope;
5075          ruby_block = old_block;
5076          ruby_dyna_vars = old_dyna_vars;
5077          data->vmode = scope_vmode; /* save the modification of the visibility scope */
5078          scope_vmode = old_vmode;
5079          if (dont_recycle) {
                  /* ……copy SCOPE BLOCK VARS…… */
5097          }
5098      }
5104      if (state) {
5105          if (state == TAG_RAISE) {
                  /* ……prepare an exception object…… */
5121              rb_exc_raise(ruby_errinfo);
5122          }
5123          JUMP_TAG(state);
5124      }
5125
5126      return result;
5127  }

(eval.c)

If this function is shown without any preamble, you probably feel “oww!”. But we’ve defeated many functions of eval.c until here, so this is not enough to be an enemy of us. This function is just continuously saving/restoring the stacks. The points we need to care about are only the below three:

And the main parts are the compile() and eval_node() located around the middle. Though it’s possible that eval_node() has already been forgotten, it is the function to start the evaluation of the parameter node. It was also used in ruby_run().

Here is compile().

▼ `compile()`

4968  static NODE*
4969  compile(src, file, line)
4970      VALUE src;
4971      char *file;
4972      int line;
4973  {
4974      NODE *node;
4975
4976      ruby_nerrs = 0;
4977      Check_Type(src, T_STRING);
4978      node = rb_compile_string(file, src, line);
4979
4980      if (ruby_nerrs == 0) return node;
4981      return 0;
4982  }

(eval.c)

ruby_nerrs is the variable incremented in yyerror(). In other words, if this variable is non-zero, it indicates more than one parse error happened. And, rb_compile_string() was already discussed in Part 2. It was a function to compile a Ruby string into a syntax tree.

One thing becomes a problem here is local variable. As we’ve seen in Chapter 12: Syntax tree construction, local variables are managed by using lvtbl. However, since a SCOPE (and possibly also VARS) already exists, we need to parse in the way of writing over and adding to it. This is in fact the heart of eval(), and is the worst difficult part. Let’s go back to parse.y again and complete this investigation.

top_local

I’ve mentioned that the functions named local_push() local_pop() are used when pushing struct local_vars, which is the management table of local variables, but actually there’s one more pair of functions to push the management table. It is the pair of top_local_init() and top_local_setup(). They are called in this sort of way.

▼ How `top_local_init()` is called

program :   { top_local_init(); }
          compstmt
            { top_local_setup(); }

Of course, in actuality various other things are also done, but all of them are cut here because it’s not important. And this is the content of it:

▼ `top_local_init()`

5273  static void
5274  top_local_init()
5275  {
5276      local_push(1);
5277      lvtbl->cnt = ruby_scope->local_tbl?ruby_scope->local_tbl[0]:0;
5278      if (lvtbl->cnt > 0) {
5279          lvtbl->tbl = ALLOC_N(ID, lvtbl->cnt+3);
5280          MEMCPY(lvtbl->tbl, ruby_scope->local_tbl, ID, lvtbl->cnt+1);
5281      }
5282      else {
5283          lvtbl->tbl = 0;
5284      }
5285      if (ruby_dyna_vars)
5286          lvtbl->dlev = 1;
5287      else
5288          lvtbl->dlev = 0;
5289  }

(parse.y)

This means that local_tbl is copied from ruby_scope to lvtbl. As for block local variables, since it’s better to see them all at once later, we’ll focus on ordinary local variables for the time being. Next, here is top_local_setup().

▼ `top_local_setup()`

5291  static void
5292  top_local_setup()
5293  {
5294      int len = lvtbl->cnt;  /* the number of local variables after parsing */
5295      int i;                 /* the number of local varaibles before parsing */
5296
5297      if (len > 0) {
5298          i = ruby_scope->local_tbl ? ruby_scope->local_tbl[0] : 0;
5299
5300          if (i < len) {
5301              if (i == 0 || (ruby_scope->flags & SCOPE_MALLOC) == 0) {
5302                  VALUE *vars = ALLOC_N(VALUE, len+1);
5303                  if (ruby_scope->local_vars) {
5304                      *vars++ = ruby_scope->local_vars[-1];
5305                      MEMCPY(vars, ruby_scope->local_vars, VALUE, i);
5306                      rb_mem_clear(vars+i, len-i);
5307                  }
5308                  else {
5309                      *vars++ = 0;
5310                      rb_mem_clear(vars, len);
5311                  }
5312                  ruby_scope->local_vars = vars;
5313                  ruby_scope->flags |= SCOPE_MALLOC;
5314              }
5315              else {
5316                  VALUE *vars = ruby_scope->local_vars-1;
5317                  REALLOC_N(vars, VALUE, len+1);
5318                  ruby_scope->local_vars = vars+1;
5319                  rb_mem_clear(ruby_scope->local_vars+i, len-i);
5320              }
5321              if (ruby_scope->local_tbl &&
                      ruby_scope->local_vars[-1] == 0) {
5322                  free(ruby_scope->local_tbl);
5323              }
5324              ruby_scope->local_vars[-1] = 0;  /* NODE is not necessary anymore */
5325              ruby_scope->local_tbl = local_tbl();
5326          }
5327      }
5328      local_pop();
5329  }

(parse.y)

Since local_vars can be either in the stack or in the heap, it makes the code complex to some extent. However, this is just updating local_tbl and local_vars of ruby_scope. (When SCOPE_MALLOC was set, local_vars was allocated by malloc()). And here, because there’s no meaning of using alloca(), it is forced to change its allocation method to malloc.

Block Local Variable

By the way, how about block local variables? To think about this, we have to go back to the entry point of the parser first, it is yycompile().

▼ setting `ruby_dyna_vars` aside

static NODE*
yycompile(f, line)
{
    struct RVarmap *vars = ruby_dyna_vars;
         :
    n = yyparse();
         :
    ruby_dyna_vars = vars;
}

This looks like a mere save-restore, but the point is that this does not clear the ruby_dyna_vars. This means that also in the parser it directly adds elements to the link of RVarmap created in the evaluator.

However, according to the previous description, the structure of ruby_dyna_vars differs between the parser and the evalutor. How does it deal with the difference in the way of attaching the header (RVarmap whose id=0)?

What is helpful here is the “1” of local_push(1) in top_local_init(). When the argument of local_push() becomes true, it does not attach the first header of ruby_dyna_vars. It means, it would look like Figure 1. Now, it is assured that we can refer to the block local variables of the outside scope from inside of a string to eval.

figure 1: `ruby_dyna_vars` inside `eval`
figure 1: ruby_dyna_vars inside eval

Well, it’s sure we can refer to, but didn’t you say that ruby_dyna_vars is entirely freed in the parser? What can we do if the link created at the evaluator will be freed? … I’d like the readers who noticed this to be relieved by reading the next part.

▼ `yycompile()` − freeing `ruby_dyna_vars`

2386      vp = ruby_dyna_vars;
2387      ruby_dyna_vars = vars;
2388      lex_strterm = 0;
2389      while (vp && vp != vars) {
2390          struct RVarmap *tmp = vp;
2391          vp = vp->next;
2392          rb_gc_force_recycle((VALUE)tmp);
2393      }

(parse.y)

It is designed so that the loop would stop when it reaches the link created at the evaluator (vars).

instance_eval

The Whole Picture

The substance of Module#module_eval is rb_mod_module_eval(), and the substance of Object#instance_eval is rb_obj_instance_eval().

▼ `rb_mod_module_eval() rb_obj_instance_eval()`

5316  VALUE
5317  rb_mod_module_eval(argc, argv, mod)
5318      int argc;
5319      VALUE *argv;
5320      VALUE mod;
5321  {
5322      return specific_eval(argc, argv, mod, mod);
5323  }

5298  VALUE
5299  rb_obj_instance_eval(argc, argv, self)
5300      int argc;
5301      VALUE *argv;
5302      VALUE self;
5303  {
5304      VALUE klass;
5305
5306      if (rb_special_const_p(self)) {
5307          klass = Qnil;
5308      }
5309      else {
5310          klass = rb_singleton_class(self);
5311      }
5312
5313      return specific_eval(argc, argv, klass, self);
5314  }

(eval.c)

These two methods have a common part as “a method to replace self with class”, that part is defined as specific_eval(). Figure 2 shows it and also what will be described. What with parentheses are calls by function pointers.

figure 2: Call Graph
figure 2: Call Graph

Whichever instance_eval or module_eval, it can accept both a block and a string, thus it branches for each particular process to yield and eval respectively. However, most of them are also common again, this part is extracted as exec_under().

But for those who reading, one have to simultaneously face at 2 times 2 = 4 ways, it is not a good plan. Therefore, here we assume only the case when

. And extracting all functions under rb_obj_instance_eval() in-line, folding constants, we’ll read the result.

After Absorbed

After all, it becomes very comprehensible in comparison to the one before being absorbed.

specific_eval()instance_eval, eval, string

static VALUE
instance_eval_string(self, src, file, line)
    VALUE self, src;
    const char *file;
    int line;
{
    VALUE sclass;
    VALUE result;
    int state;
    int mode;

    sclass = rb_singleton_class(self);

    PUSH_CLASS();
    ruby_class = sclass;
    PUSH_FRAME();
    ruby_frame->self       = ruby_frame->prev->self;
    ruby_frame->last_func  = ruby_frame->prev->last_func;
    ruby_frame->last_class = ruby_frame->prev->last_class;
    ruby_frame->argc       = ruby_frame->prev->argc;
    ruby_frame->argv       = ruby_frame->prev->argv;
    if (ruby_frame->cbase != sclass) {
        ruby_frame->cbase = rb_node_newnode(NODE_CREF, sclass, 0,
                                            ruby_frame->cbase);
    }
    PUSH_CREF(sclass);

    mode = scope_vmode;
    SCOPE_SET(SCOPE_PUBLIC);
    PUSH_TAG(PROT_NONE);
    if ((state = EXEC_TAG()) == 0) {
        result = eval(self, src, Qnil, file, line);
    }
    POP_TAG();
    SCOPE_SET(mode);

    POP_CREF();
    POP_FRAME();
    POP_CLASS();
    if (state) JUMP_TAG(state);

    return result;
}

It seems that this pushes the singleton class of the object to CLASS and CREF and ruby_frame->cbase. The main process is one-shot of eval(). It is unusual that things such as initializing FRAME by a struct-copy are missing, but this is also not create so much difference.

Before being absorbed

Though the author said it becomes more friendly to read, it’s possible it has been already simple since it was not absorbed, let’s check where is simplified in comparison to the before-absorbed one.

The first one is specific_eval(). Since this function is to share the code of the interface to Ruby, almost all parts of it is to parse the parameters. Here is the result of cutting them all.

▼ `specific_eval()` (simplified)

5258  static VALUE
5259  specific_eval(argc, argv, klass, self)
5260      int argc;
5261      VALUE *argv;
5262      VALUE klass, self;
5263  {
5264      if (rb_block_given_p()) {

5268          return yield_under(klass, self);
5269      }
5270      else {

5294          return eval_under(klass, self, argv[0], file, line);
5295      }
5296  }

(eval.c)

As you can see, this is perfectly branches in two ways based on whether there’s a block or not, and each route would never influence the other. Therefore, when reading, we should read one by one. To begin with, the absorbed version is enhanced in this point.

And file and line are irrelevant when reading yield_under(), thus in the case when the route of yield is absorbed by the main body, it might become obvious that we don’t have to think about the parse of these parameters at all.

Next, we’ll look at eval_under() and eval_under_i().

▼ `eval_under()`

5222  static VALUE
5223  eval_under(under, self, src, file, line)
5224      VALUE under, self, src;
5225      const char *file;
5226      int line;
5227  {
5228      VALUE args[4];
5229
5230      if (ruby_safe_level >= 4) {
5231          StringValue(src);
5232      }
5233      else {
5234          SafeStringValue(src);
5235      }
5236      args[0] = self;
5237      args[1] = src;
5238      args[2] = (VALUE)file;
5239      args[3] = (VALUE)line;
5240      return exec_under(eval_under_i, under, under, args);
5241  }

5214  static VALUE
5215  eval_under_i(args)
5216      VALUE *args;
5217  {
5218      return eval(args[0], args[1], Qnil, (char*)args[2], (int)args[3]);
5219  }

(eval.c)

In this function, in order to make its arguments single, it stores them into the args array and passes it. We can imagine that this args exists as a temporary container to pass from eval_under() to eval_under_i(), but not sure that it is truly so. It’s possible that args is modified inside evec_under().

As a way to share a code, this is a very right way to do. But for those who read it, this kind of indirect passing is incomprehensible. Particularly, because there are extra castings for file and line to fool the compiler, it is hard to imagine what were their actual types. The parts around this entirely disappeared in the absorbed version, so you don’t have to worry about getting lost.

However, it’s too much to say that absorbing and extracting always makes things easier to understand. For example, when calling exec_under(), under is passed as both the second and third arguments, but is it all right if the exec_under() side extracts the both parameter variables into under? That is to say, the second and third arguments of exec_under() are, in fact, indicating CLASS and CREF that should be pushed. CLASS and CREF are “different things”, it might be better to use different variables. Also in the previous absorbed version, for only this point,

VALUE sclass = .....;
VALUE cbase = sclass;

I thought that I would write this way, but also thought it could give the strange impression if abruptly only these variables are left, thus it was extracted as sclass. It means that this is only because of the flow of the texts.

By now, so many times, I’ve extracted arguments and functions, and for each time I repeatedly explained the reason to extract. They are

Definitely, I’m not saying “In whatever ways extracting various things always makes things simpler”.

In whatever case, what of the first priority is the comprehensibility for ourself and not keep complying the methodology. When extracting makes things simpler, extract it. When we feel that not extracting or conversely bundling as a procedure makes things easier to understand, let us do it. As for ruby, I often extracted them because the original is written properly, but if a source code was written by a poor programmer, aggressively bundling to functions should often become a good choice.