Emerald is a Ruby compiler & virtual machine written in Go.
Run the following from your command line:
git clone [email protected]:mathiashsteffensen/emerald.git
cd emerald && ./scripts/installThis will build an emerald & an iem executable in the current directory.
To run a source file of Ruby code:
./emerald main.rbTo start the Emerald REPL:
./iemThe lexer/tokenizer is located in the ./parser/lexer package. I have tried to keep it simple, but when implementing
string templates (This is a #{template}), there are basically only 2 options.
- Don't have a separate lexer & parser.
- Turn the lexer into a stack machine since templates can be infinitely nested (or as deep as the stack allows)
And since I prefer the performance of having the parser & lexer separate (they run in parallel) I went with option 2, but this significantly increases the complexity of the lexer.
The parser is located in the ./parser package. It is a Pratt parser. Basically a recursive descent, operator precedence parser.
The compiler is located in the ./compiler package. The compiler transforms the AST generated by the parser, into compact bytecode. The bytecode is, as the name may suggest, simply an array of bytes, they represent Opcodes & operators. All Opcode definitions can be found in the ./compiler/bytecode.go file.
The virtual machine is located in the ./vm package. It is a virtual stack machine and does not make use of any registers. This keeps the implementation & Opcode definitions simple, but it does mean that we need more bytecode to perform the same operations a register machine would. This ultimately means more execution cycles for the equivalent result.
This is still quite far away from being a real implementation. The below is a list of the features on the roadmap and the ones that have already been implemented.
NOTE: To say they have been implemented does not mean any features are guaranteed to be compatible with the reference Ruby implementation.
- everything is an object
- allow method calls on everything
- operators are method calls
- full UTF8 support
- Unicode identifier
- Unicode symbols
- method definitions
- with parens
- without parens
- without parens with args
- return keyword
- default values for parameters
- keyword arguments
- block arguments
- yield
- method calls
- with parens
- without parens
- without parens with args
- with block arguments
- keyword arguments
- conditionals
- if
- if modifier
- if/else
- if/elif/else
- ternary
? : - unless
- unless modifier
- unless/else
- case
-
|| -
&&
- control flow
- for loop
- while loop
- until loop
- break
- next
- redo
- flip flop
- numbers
- integers
- integer arithmetics
- integers
1234 - integers with underscores
1_234 - decimal numbers
0d170,0D170 - octal numbers
0252,0o252,0O252 - hexadecimal numbers
0xaa,0xAa,0xAA,0Xaa,0XAa,0XaA - binary numbers
0b10101010,0B10101010
- floats
- float arithmetics
-
12.34 -
1234e-2 -
1.234E1 - floats with underscores
2.2_22
- integers
- booleans
- strings
- double quoted
- single quoted
-
%q{} -
%Q{} - heredoc
- without indentation (
<<EOF) - indented (
<<-EOF) - “squiggly” heredoc
<<~ - quoted heredoc
- single quotes
<<-'HEREDOC' - double quotes
<<-"HEREDOC" - backticks <<-`HEREDOC`"
- single quotes
- without indentation (
- escaped characters
-
\abell, ASCII 07h (BEL) -
\bbackspace, ASCII 08h (BS) -
\thorizontal tab, ASCII 09h (TAB) -
\nnewline (line feed), ASCII 0Ah (LF) -
\vvertical tab, ASCII 0Bh (VT) -
\fform feed, ASCII 0Ch (FF) -
\rcarriage return, ASCII 0Dh (CR) -
\sspace, ASCII 20h (SPC) -
\\backslash, \ -
\nnnoctal bit pattern, where nnn is 1-3 octal digits ([0-7]) -
\xnnhexadecimal bit pattern, where nn is 1-2 hexadecimal digits ([0-9a-fA-F]) -
\unnnnUnicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F]) -
\u{nnnn ...}Unicode character(s), where each nnnn is 1-6 hexadecimal digits ([0-9a-fA-F]) -
\cxor\C-xcontrol character, where x is an ASCII printable character -
\M-xmeta character, where x is an ASCII printable character -
\M-\C-xmeta control character, where x is an ASCII printable character -
\M-\cxsame as above -
\c\M-xsame as above -
\c?or\C-?delete, ASCII 7Fh (DEL)
-
- interpolation
#{}
- arrays
- array literal
[1,2] - array indexing
arr[2] - splat
- array decomposition
- implicit array assignment
- array of strings
%w{} - array of symbols
%i{}
- array literal
- nil
- hashes
- literal with
=>notation - literal with
key:notation - indexing
hash[:foo] - every Ruby Object can be a hash key
- literal with
- symbols
-
:symbol -
:"symbol" -
:"symbol"with interpolation -
:'symbol' -
%s{symbol} - singleton symbols
-
- regexp
-
/regex/ -
%r{regex}
-
- ranges
-
..inclusive -
...exclusive
-
- procs
-> - variables
- variable assignments
- globals
- Ruby globals ($ notation)
- operators
-
+ -
- -
/ -
* -
! -
< -
> -
**(pow) -
%(modulus) -
&(AND) -
^(XOR) -
>>(right shift) -
<<(left shift, append) -
==(equal) -
!=(not equal) -
===(case equality) -
=~(pattern match) -
!~(does not match) -
<=>(comparison or spaceship operator) -
<=(less or equal) -
>=(greater or equal) - assignment operators
-
+= -
-= -
/= -
*= -
%= -
**= -
&= -
|= -
^= -
<<= -
>>= -
||= -
&&=
-
-
- error handling
- begin
- rescue
- ensure
- retry
- constants
- scope operator
::- Constant access
MyMod::MyClass - Method access
String::new
- Constant access
- classes
- class objects
- class Class
- instance variables
- class variables
- class methods
- instance methods
- method overrides
- private
- protected
- public
- inheritance
- constructors
- new
-
self - singleton classes (also known as the metaclass or eigenclass)
class << self - singleton methods
def self.method - assigment methods
- super in methods
- modules
- object main
- comments '#'
- backtraces (unwinding of the VM call stack for debugging)
- C extension compatability