Skip to content

mathiashsteffensen/emerald

Repository files navigation

Emerald - A Ruby VM

Emerald is a Ruby compiler & virtual machine written in Go.

Building Emerald

Run the following from your command line:

git clone [email protected]:mathiashsteffensen/emerald.git
cd emerald && ./scripts/install

This will build an emerald & an iem executable in the current directory.

To run a source file of Ruby code:

./emerald main.rb

To start the Emerald REPL:

./iem

Architecture

Lexer

The lexer/tokenizer is located in the ./parser/lexer package. I have tried to keep it simple, but when implementing string templates (This is a #{template}), there are basically only 2 options.

  1. Don't have a separate lexer & parser.
  2. Turn the lexer into a stack machine since templates can be infinitely nested (or as deep as the stack allows)

And since I prefer the performance of having the parser & lexer separate (they run in parallel) I went with option 2, but this significantly increases the complexity of the lexer.

Parser

The parser is located in the ./parser package. It is a Pratt parser. Basically a recursive descent, operator precedence parser.

Compiler

The compiler is located in the ./compiler package. The compiler transforms the AST generated by the parser, into compact bytecode. The bytecode is, as the name may suggest, simply an array of bytes, they represent Opcodes & operators. All Opcode definitions can be found in the ./compiler/bytecode.go file.

Virtual Machine

The virtual machine is located in the ./vm package. It is a virtual stack machine and does not make use of any registers. This keeps the implementation & Opcode definitions simple, but it does mean that we need more bytecode to perform the same operations a register machine would. This ultimately means more execution cycles for the equivalent result.

Supported language features

This is still quite far away from being a real implementation. The below is a list of the features on the roadmap and the ones that have already been implemented.

NOTE: To say they have been implemented does not mean any features are guaranteed to be compatible with the reference Ruby implementation.

  • everything is an object
    • allow method calls on everything
    • operators are method calls
  • full UTF8 support
    • Unicode identifier
    • Unicode symbols
  • method definitions
    • with parens
    • without parens
    • without parens with args
    • return keyword
    • default values for parameters
    • keyword arguments
    • block arguments
    • yield
  • method calls
    • with parens
    • without parens
    • without parens with args
    • with block arguments
    • keyword arguments
  • conditionals
    • if
    • if modifier
    • if/else
    • if/elif/else
    • ternary ? :
    • unless
    • unless modifier
    • unless/else
    • case
    • ||
    • &&
  • control flow
    • for loop
    • while loop
    • until loop
    • break
    • next
    • redo
    • flip flop
  • numbers
    • integers
      • integer arithmetics
      • integers 1234
      • integers with underscores 1_234
      • decimal numbers 0d170, 0D170
      • octal numbers 0252, 0o252, 0O252
      • hexadecimal numbers 0xaa, 0xAa, 0xAA, 0Xaa, 0XAa, 0XaA
      • binary numbers 0b10101010, 0B10101010
    • floats
      • float arithmetics
      • 12.34
      • 1234e-2
      • 1.234E1
      • floats with underscores 2.2_22
  • booleans
  • strings
    • double quoted
    • single quoted
    • %q{}
    • %Q{}
    • heredoc
      • without indentation (<<EOF)
      • indented (<<-EOF)
      • “squiggly” heredoc <<~
      • quoted heredoc
        • single quotes <<-'HEREDOC'
        • double quotes <<-"HEREDOC"
        • backticks <<-`HEREDOC`"
    • escaped characters
      • \a bell, ASCII 07h (BEL)
      • \b backspace, ASCII 08h (BS)
      • \t horizontal tab, ASCII 09h (TAB)
      • \n newline (line feed), ASCII 0Ah (LF)
      • \v vertical tab, ASCII 0Bh (VT)
      • \f form feed, ASCII 0Ch (FF)
      • \r carriage return, ASCII 0Dh (CR)
      • \s space, ASCII 20h (SPC)
      • \\ backslash, \
      • \nnn octal bit pattern, where nnn is 1-3 octal digits ([0-7])
      • \xnn hexadecimal bit pattern, where nn is 1-2 hexadecimal digits ([0-9a-fA-F])
      • \unnnn Unicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])
      • \u{nnnn ...} Unicode character(s), where each nnnn is 1-6 hexadecimal digits ([0-9a-fA-F])
      • \cx or \C-x control character, where x is an ASCII printable character
      • \M-x meta character, where x is an ASCII printable character
      • \M-\C-x meta control character, where x is an ASCII printable character
      • \M-\cx same as above
      • \c\M-x same as above
      • \c? or \C-? delete, ASCII 7Fh (DEL)
    • interpolation #{}
  • arrays
    • array literal [1,2]
    • array indexing arr[2]
    • splat
    • array decomposition
    • implicit array assignment
    • array of strings %w{}
    • array of symbols %i{}
  • nil
  • hashes
    • literal with => notation
    • literal with key: notation
    • indexing hash[:foo]
    • every Ruby Object can be a hash key
  • symbols
    • :symbol
    • :"symbol"
    • :"symbol" with interpolation
    • :'symbol'
    • %s{symbol}
    • singleton symbols
  • regexp
    • /regex/
    • %r{regex}
  • ranges
    • .. inclusive
    • ... exclusive
  • procs ->
  • variables
    • variable assignments
    • globals
    • Ruby globals ($ notation)
  • operators
    • +
    • -
    • /
    • *
    • !
    • <
    • >
    • ** (pow)
    • % (modulus)
    • & (AND)
    • ^ (XOR)
    • >> (right shift)
    • << (left shift, append)
    • == (equal)
    • != (not equal)
    • === (case equality)
    • =~ (pattern match)
    • !~ (does not match)
    • <=> (comparison or spaceship operator)
    • <= (less or equal)
    • >= (greater or equal)
    • assignment operators
      • +=
      • -=
      • /=
      • *=
      • %=
      • **=
      • &=
      • |=
      • ^=
      • <<=
      • >>=
      • ||=
      • &&=
  • error handling
    • begin
    • rescue
    • ensure
    • retry
  • constants
  • scope operator ::
    • Constant access MyMod::MyClass
    • Method access String::new
  • classes
    • class objects
    • class Class
    • instance variables
    • class variables
    • class methods
    • instance methods
    • method overrides
    • private
    • protected
    • public
    • inheritance
    • constructors
    • new
    • self
    • singleton classes (also known as the metaclass or eigenclass) class << self
    • singleton methods def self.method
    • assigment methods
    • super in methods
  • modules
  • object main
  • comments '#'
  • backtraces (unwinding of the VM call stack for debugging)
  • C extension compatability

About

Experimental Ruby implementation written in Go

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages