65.9K
CodeProject 正在变化。 阅读更多。
Home

使用 Lua 和 Mewa 编写 LLVM 的编译器前端

starIconstarIconstarIconstarIconstarIcon

5.00/5 (3投票s)

2021年5月26日

MIT

30分钟阅读

viewsIcon

12238

downloadIcon

151

在本文中,我们将了解如何使用 Mewa 在 Lua 中编写一个非常原始的编译器,以及如何在 shell 中编译和运行一个简单的演示程序。

引言

Mewa 编译器-编译器是一种用于在 Lua 中编写编译器前端脚本的工具。使用 Mewa 编写的编译器将源文件作为输入,并打印出中间表示作为下一个编译步骤的输入。本文中的示例使用 LLVM IR 的文本格式作为输出。可执行文件是使用 Clangllc 命令构建的。

Mewa 不支持对不同代码生成路径的评估。其思想是将程序结构一对一地映射到代码,并将所有分析优化步骤留给后端处理。

要使用 Mewa 实现编译器,您需要定义一个带有 Lua 函数调用属性的语法。程序 mewa 将生成一个 Lua 脚本,该脚本会将任何通过指定语法的源代码转换为一个 AST,并将 Lua 函数调用附加到节点上。编译器将调用 AST 顶层节点的函数,这些函数将控制编译器前端的进一步处理。以其 AST 节点为参数调用的函数会触发进一步的树遍历。在此过程中,程序的类型和数据结构被构建,并打印输出。

目标

我们首先通过一些独立的示例来学习 Mewatypedb 库在 Lua 中的使用教程。所谓独立,是指除了该库之外不使用任何其他东西。这些示例也相互独立。这使您即使没有完全理解,也可以继续阅读,并在以后返回来学习。

之后,本文将指导您实现一种缺少许多功能的简单编程语言的编译器。我们将要编译的示例程序如下:

extern function printf void( string, double);
// ... Variable argument list functions are beyond the scope of this intro

class Employee
{
  string name;
  int age;
  double salary;

  function setSalary void( double salary_)
  {
    salary = salary_;
  }
}

function salarySum double( Employee[10] list)
{
  var int idx = 0;
  var double sum = 0.0;
  while (idx < 10 && list[idx].age)
  {
    sum = sum + list[idx].salary;
    idx = idx + 1;
  }
  return sum;
}

function salaryRaise void( Employee[10] list, double percentRaise)
{
  var int idx = 0;
  var double factor = 1 + (percentRaise / 100);
  while (idx < 10 && list[idx].age)
  {
    list[idx].setSalary( list[idx].salary * factor);
    idx = idx + 1;
  }
}

main
{
  var Employee[ 10] list = {
    {"John Doe", 36, 60000},
    {"Suzy Sample", 36, 63400},
    {"Doe Joe", 36, 64400},
    {"Sandra Last", 36, 67400}
  };
  salaryRaise( list, 10);          // Give them all a 10% raise
  printf( "Salary sum: %.2f\n", salarySum( list));  // Expected result = 280720
}

我们将用我们的示例编译器编译这个程序,并在 shell 中运行它。我们还将研究包括该语言语法在内的不同部分。最后,我将讨论该语言中缺少的功能,以便为您提供关于如何实现一种完整编程语言编译器的展望。

目标读者

要理解本文,了解一些关于形式语言和解析器生成器的知识会很有帮助。要理解这些示例,您应该熟悉一些具有类似 Lua 的闭包概念的脚本语言。如果您读过 LLVM IR 的介绍,您将能够掌握示例中的代码生成。

深入探索

要进行更深入的探索,您需要查看 Mewa 项目本身以及主要示例语言的实现,这是一种静态类型的多范式编程语言,支持结构体、类、接口、自由函数、泛型、lambda 表达式和异常。还有一个 FAQ,它也试图回答解决问题方面的问题。

准备工作

Mewa 程序和 Lua 模块是用 C++ 编写的。请按照说明进行安装。安装说明是为 Ubuntu 编写的。您需要查找其他平台的软件包名称。

安装

要安装 Mewa,您需要一个支持 C++17 的 C++ 编译器。Mewa 已在 clanggcc 上测试过。

必需的系统软件包

对于 Lua 5.2
lua5.2 liblua5.2-dev luarocks llvm llvm-runtime
对于 Lua 5.1
lua5.1 liblua5.1-0-dev luarocks llvm llvm-runtime

必需的 luarocks 软件包

luarocks install bit32
luarocks install penlight 
luarocks install LuaBcd

从源代码构建 LuaBcd(如果 luarocks install LuaBcd 失败)

如果使用 luarocks 构建 LuaBcd 失败,您可以从 github 获取源代码并进行构建

git clone https://github.com/patrickfrey/luabcd.git
cd LuaBcd
./configure
make
make install

获取最新发布版本的源代码

git clone https://github.com/patrickfrey/mewa
cd mewa
git checkout -b 0.3

配置以查找 Lua 头文件并写入由 make 包含的 Lua.inc 文件

./configure

使用 GNU C/C++ 构建

make COMPILER=gcc RELEASE=YES

使用 Clang C/C++ 构建

make COMPILER=clang RELEASE=YES

运行测试

make test

Install

make install

Lua 环境

从命令行运行 Lua 脚本时,不要忘记正确设置 Lua 环境(LUA_CPATH, LUA_PATH)。您会在本文的归档文件中找到一个 luaenv.sh。用以下命令加载它。

. examples/intro/luaenv.sh

示例路径

本文中的所有示例都是 Mewa 项目的一部分。这里讨论的示例编译器位于相对于 Mewa 项目路径的 examples/intro 目录中。这些文件也以 tar 格式上传到在 codeproject.com 上发布的文章的附件中。但本文引用的示例文件路径始终是相对于 Mewa 项目根目录的。

教程

在本教程中,我们将通过一些独立的示例学习如何使用 typedb 库。

声明变量

让我们从一个复杂的例子开始,这是一个实质性的进步。我们将打印出将一个变量的值赋给另一个变量所需的 LLVM 代码。为此,我们需要引入类型(types)归约(reductions)

引言

Types

类型是以整数表示的项。它们通过类型名称和上下文类型进行声明和检索。上下文类型本身也是一个类型或 0。例如,全局自由变量没有关联的上下文类型,并以 0 作为上下文类型进行声明。类型与构造函数相关联。构造函数是一个值、结构或函数,描述了该类型的构造方式。可选地,类型可以附加参数。参数是类型/构造函数对的列表。类型用以下方式声明:

typeid = typedb:def_type( contextType, name, constructor, parameters)

返回的 typeid 是代表 typedb 类型的整数。

归约

归约是从一种类型派生出另一种类型的路径。您可以将类型系统想象成一个由顶点(类型)和边(归约)组成的有向图。我们稍后将介绍一些概念,允许对该图进行部分视图。现在,请将其想象成一个图。归约也有一个关联的构造函数。该构造函数描述了从其源头沿着归约方向构造类型的方式。这里有一个例子:

typedb:def_reduction( destType, srcType, constructor, 1)

作为参数的 1 是一个整数值,我们稍后会解释。

解析类型

类型可以通过其名称和单个或多个上下文类型来解析,其中一个上下文类型具有到被解析类型上下文类型的归约路径。

派生类型

类型可以通过查询从一种类型到另一种类型的归约路径来构建,并通过将该路径上的构造函数列表应用于源类型构造函数来构造类型。

我们来看一下这个例子

文件 intro/tutorial/variable.lua

mewa = require "mewa"
typedb = mewa.typedb()

-- [1] Define some helper functions
-- Map template for LLVM IR Code synthesis
--   substitute identifiers in curly brackets '{' '}' with the values in a table
function constructor_format( fmt, argtable)
  return (fmt:gsub("[{]([_%d%w]*)[}]",
    function( match) return argtable[ match] or "" end))
end

-- A register allocator as a generator function counting from 1,
-- returning the LLVM register identifiers:
function register_allocator()
  local i = 0
  return function ()
    i = i + 1
    return "%R" .. i
  end
end

-- Build a constructor by applying a constructor function on some arguments
--   or returning the first argument in case of a constructor function as nil
function applyConstructor( constructor, this, arg)
  if constructor then
    if (type(constructor) == "function") then
      return constructor( this, arg)
    else
      return constructor
    end
  else
    return this
  end
end

-- [2] Define the LLVM format strings
local fmt_def_local = "{out} = alloca i32, align 4 ; variable allocation\n"
local fmt_load = "{out} = load i32, i32* {this} ; reduction int <- int&\n"
local fmt_assign = "store i32 {arg1}, i32* {this} ; variable assignment\n"

-- [3] Define an integer type with assignment operator
local register = register_allocator()
local intValType = typedb:def_type( 0, "int")  -- integer value type
local intRefType = typedb:def_type( 0, "int&")  -- integer reference type
typedb:def_reduction( intValType, intRefType,  -- get value from reference type
  function(this)
    local out = register()
    local code = constructor_format( fmt_load, {out=out,this=this.out})
    return {out=out, code=code}
  end, 1)
typedb:def_type( intRefType, "=",    -- assignment operator
  function( this, arg)
    local code = (this.code or "") .. (arg[1].code or "")
      .. constructor_format( fmt_assign, {this=this.out,arg1=arg[1].out})
    return {code=code, out=this.out}
  end, {intValType})

-- [4] Define a variable "a" initialized with 1:
local register_a = register()
local variable_a = typedb:def_type( 0, "a", {out=register_a})
typedb:def_reduction( intRefType, variable_a, nil, 1)
io.stdout:write( "; SOURCE int a = 1;\n")
io.stdout:write( constructor_format( fmt_def_local, {out=register_a}))
io.stdout:write( constructor_format( fmt_assign, {this=register_a,arg1=1}))

-- [5] Define a variable b and assign the value of a to it:
io.stdout:write( "; SOURCE int b = a;\n")

-- [5.1] Define a variable "b":
local register_b = register()
local variable_b = typedb:def_type( 0, "b", {out=register_b})
typedb:def_reduction( intRefType, variable_b, nil, 1)

io.stdout:write( constructor_format( fmt_def_local, {out=register_b}))

-- [5.2] Assign the value of "a" to "b":
-- [5.2.1] Resolve the operator "b = .."
local resolveTypeId, reductions, items = typedb:resolve_type( variable_b, "=")
if not resolveTypeId then
  error( "Not found")
elseif type( resolveTypeId) == "table" then
  error( "Ambiguous")
end

-- [5.2.2] For all candidates of "b = ..", get the first one with one parameter
--   and match the argument to this parameter
local constructor = typedb:type_constructor( variable_b)
for _,redu in ipairs(reductions or {}) do
  constructor = applyConstructor( redu.constructor, constructor)
end
for _,item in ipairs(items) do
  local parameters = typedb:type_parameters( item)
  if #parameters == 1 then
    local reductions,weight,altpath
      = typedb:derive_type( parameters[1].type, variable_a)
    if altpath then error( "Ambiguous parameter") end
    if weight then
      -- [5.3] Synthesize the code for "b = a;"
      local param_constructor = typedb:type_constructor( variable_a)
      for _,redu in ipairs(reductions or {}) do
        param_constructor = applyConstructor( redu.constructor, param_constructor)
      end
      constructor = typedb:type_constructor(item)( constructor, {param_constructor})
      break
    end
  end
end
-- [5.3] Print the code of "b = a;"
io.stdout:write( constructor.code)

输出

; SOURCE int a = 1;
%R1 = alloca i32, align 4 ; variable allocation
store i32 1, i32* %R1 ; variable assignment
; SOURCE int b = a;
%R2 = alloca i32, align 4 ; variable allocation
%R3 = load i32, i32* %R1 ; reduction int <- int&
store i32 %R3, i32* %R2 ; variable assignment

结论

如果您读到这里,您已经取得了很大的进展。我们看到了一个带有参数的运算符('=')的应用。可以想象应用一个带有一个以上参数的函数。运算符的第一个匹配是我们的候选匹配,被选为结果。但是可以想象通过其他标准来选择匹配。我们声明了一个带名称的变量,这里缺少作用域的概念。我们将在下一个例子中讨论作用域。

范围

现在让我们看看作用域在 typedb 中是如何表示的。

引言

Mewa 中,作用域表示为一对非负整数。第一个数字是开始,即属于该作用域的第一个作用域步骤;第二个数字是结束,即不属于该作用域的第一个作用域步骤。作用域步骤由一个计数器生成,其增量在您的语言解析器的语法中声明。typedb 中的所有声明都绑定到一个作用域,所有对 typedb 的查询都绑定到当前的作用域步骤。如果一个声明的作用域覆盖了查询的作用域步骤,则认为该声明对结果有贡献。

设置当前作用域
scope_old = typedb:scope( scope_new )
获取当前作用域
current_scope = typedb:scope()

这个例子相当人为,但它展示了其工作原理:

文件 intro/tutorial/scope.lua

mewa = require "mewa"
typedb = mewa.typedb()

function defineVariable( name)
  local scopestr = mewa.tostring( (typedb:scope()))
  typedb:def_type( 0, name, string.format( "instance of scope %s", scopestr))
end
function findVariable( name)
  local res,reductions,items = typedb:resolve_type( 0, name)
  if not res then return "!NOTFOUND" end
  if type(res) == "table" then return "!AMBIGUOUS" end
  for _,item in ipairs(items) do
    if typedb:type_nof_parameters(item) == 0 then
      return typedb:type_constructor(item)
    end
  end
  return "!NO MATCH"
end

typedb:scope( {1,100} )
defineVariable( "X")
typedb:scope( {10,50} )
defineVariable( "X")
typedb:scope( {20,30} )
defineVariable( "X")
typedb:scope( {70,90} )
defineVariable( "X")

typedb:step( 25)
print( "Retrieve X " .. findVariable("X"))
typedb:step( 45)
print( "Retrieve X " .. findVariable("X"))
typedb:step( 95)
print( "Retrieve X " .. findVariable("X"))
typedb:step( 100)
print( "Retrieve X " .. findVariable("X"))

输出

Retrieve X instance of scope {20,30}
Retrieve X instance of scope {10,50}
Retrieve X instance of scope {1,100}
Retrieve X !NOTFOUND

结论

这很简单,不是吗?现在缺少的是如何设置当前的作用域步骤和作用域。我选择将其作为一个独立的方面来处理。用于 AST 树遍历的函数设置当前的作用域步骤和作用域。这在大多数情况下都有效。有时您必须在实现不同作用域声明的节点中手动设置作用域,例如带有函数体的函数声明,其函数体在自己的作用域中。

归约权重

现在我们必须再来看一些有点令人费解的东西。我们需要为归约分配一个权重。问题是,如果我们不引入一个能记忆偏好的权重方案,即使是简单的类型查询例子也会导致歧义。typedb:resolve_typetypedb:derive_type 函数被实现为最短路径搜索,它选择结果并实现这种偏好。我们希望将真正的歧义检测为错误。当出现两个或更多具有相同权重和的结果时,就表示存在歧义。我得出的结论是,最好在源代码的一个中心位置声明所有归约权重,并对其进行详细记录。

让我们看一个没有归约权重的例子,它会导致歧义。

失败的例子

文件 intro/tutorial/weight1.lua

mewa = require "mewa"
typedb = mewa.typedb()

-- Build a constructor
function applyConstructor( constructor, this, arg)
  if constructor then
    if (type(constructor) == "function") then
      return constructor( this, arg)
    else
      return constructor
    end
  else
    return this
  end
end

-- Type list as string
function typeListString( tpList, separator)
  if not tpList or #tpList == 0 then return "" end
  local tp = type(tpList[ 1]) == "table" and tpList[ 1].type or tpList[ 1]
  local rt = typedb:type_string( tp)
  for ii=2,#tpList do
    local tp = type(tpList[ ii]) == "table" and tpList[ ii].type or tpList[ ii]
    rt = rt .. separator .. typedb:type_string( tp)
  end
  return rt
end

-- Define a type with all its variations having different qualifiers
function defineType( name)
  local valTypeId = typedb:def_type( 0, name)
  local refTypeId = typedb:def_type( 0, name .. "&")
  local c_valTypeId = typedb:def_type( 0, "const " .. name)
  local c_refTypeId = typedb:def_type( 0, "const " .. name .. "&")
  typedb:def_reduction( valTypeId, refTypeId,
    function( arg) return "load ( " .. arg .. ")"  end, 1)
  typedb:def_reduction( c_valTypeId, c_refTypeId,
    function( arg) return "load ( " .. arg .. ")"  end, 1)
  typedb:def_reduction( c_valTypeId, valTypeId,
    function( arg) return "make const ( " .. arg .. ")"  end, 1)
  typedb:def_reduction( c_refTypeId, refTypeId,
    function( arg) return "make const ( " .. arg .. ")"  end, 1)
  return {val=valTypeId, ref=refTypeId, c_val=c_valTypeId, c_ref=c_refTypeId}
end

local t_int = defineType( "int")
local t_long = defineType( "long")

typedb:def_reduction( t_int.val, t_long.val,
  function( arg) return "convert int ( " .. arg .. ")"  end, 1)
typedb:def_reduction( t_int.c_val, t_long.c_val,
  function( arg) return "convert const int ( " .. arg .. ")"  end, 1)
typedb:def_reduction( t_long.val, t_int.val,
  function( arg) return "convert long ( " .. arg .. ")"  end, 1)
typedb:def_reduction( t_long.c_val, t_int.c_val,
  function( arg) return "convert const long ( " .. arg .. ")" end, 1)

local reductions,weight,altpath = typedb:derive_type( t_int.c_val, t_long.ref)
if not weight then
  print( "Not found!")
elseif altpath then
  print( "Ambiguous: "
    .. typeListString( reductions, " -> ")
    .. " | " .. typeListString( altpath, " -> "))
else
  local constructor = typedb:type_name( t_long.ref)
  for _,redu in ipairs(reductions or {}) do
    constructor = applyConstructor( redu.constructor, constructor)
  end
  print( constructor)
end

输出

Ambiguous: long -> const long -> const int | long -> int -> const int

添加权重

文件 intro/tutorial/weight2.lua

这是修复问题所需的编辑差异

3a4,12
> -- Centralized list of the ordering of the reduction rule weights:
> local weight_conv=  1.0    -- weight of a conversion
> local weight_const_1=  0.5 / (1*1)  -- make const of type with 1 qualifier
> local weight_const_2=  0.5 / (2*2)  -- make const of type with 2 qualifiers
> local weight_const_3=  0.5 / (3*3)  -- make const of type with 3 qualifiers
> local weight_ref_1=  0.25 / (1*1)  -- strip reference of type with 1 qualifier
> local weight_ref_2=  0.25 / (2*2)  -- strip reference of type with 2 qualifiers
> local weight_ref_3=  0.25 / (3*3)  -- strip reference of type with 3 qualifiers
> 
36c45
<     function( arg) return "load ( " .. arg .. ")"  end, 1)
---
>     function( arg) return "load ( " .. arg .. ")"  end, 1, weight_ref_2)
38c47
<     function( arg) return "load ( " .. arg .. ")"  end, 1)
---
>     function( arg) return "load ( " .. arg .. ")"  end, 1, weight_ref_1)
40c49
<     function( arg) return "make const ( " .. arg .. ")"  end, 1)
---
>     function( arg) return "make const ( " .. arg .. ")"  end, 1, weight_const_1)
42c51
<     function( arg) return "make const ( " .. arg .. ")"  end, 1)
---
>     function( arg) return "make const ( " .. arg .. ")"  end, 1, weight_const_2)
50c59
<   function( arg) return "convert int ( " .. arg .. ")"  end, 1)
---
>   function( arg) return "convert int ( " .. arg .. ")"  end, 1, weight_conv)
52c61
<   function( arg) return "convert const int ( " .. arg .. ")"  end, 1)
---
>   function( arg) return "convert const int ( " .. arg .. ")"  end, 1, weight_conv)
54c63
<   function( arg) return "convert long ( " .. arg .. ")"  end, 1)
---
>   function( arg) return "convert long ( " .. arg .. ")"  end, 1, weight_conv)
56c65
<   function( arg) return "convert const long ( " .. arg .. ")" end, 1)
---
>   function( arg) return "convert const long ( " .. arg .. ")" end, 1, weight_conv)

带权重的输出

convert const int ( load ( make const ( long&)))

结论

我们引入了 typedb:def_reduction 命令的第 5 个参数,如果未指定,则为 0。这个新参数旨在以一种记忆解决方案路径偏好的方式声明。权重应通过我们在中心位置声明的常量来引用,并在该位置记录导致权重方案的思路。

归约标签

将类型系统显示为单个图并不适用于所有情况。有些类型派生路径在一种情况下适用,在其他情况下则不适用。下面的例子声明了一个派生自基类的类的对象,该对象调用一个无参数的构造函数。构造函数只为基类声明。但是当调用一个对象构造函数时,如果该类不存在该构造函数,则应报告错误。在这种情况下,与方法调用相同的行为是不好的。

我们先来看一个失败的例子

失败的例子

文件 intro/tutorial/tags1.lua

mewa = require "mewa"
typedb = mewa.typedb()

-- Define some helper functions, that we discussed already before

function constructor_format( fmt, argtable)
  return (fmt:gsub("[{]([_%d%w]*)[}]",
    function( match) return argtable[ match] or "" end))
end

function register_allocator()
  local i = 0
  return function ()
    i = i + 1
    return "%R" .. i
  end
end

function applyConstructor( constructor, this, arg)
  if constructor then
    if (type(constructor) == "function") then
      return constructor( this, arg)
    else
      return constructor
    end
  else
    return this
  end
end

-- Define some format strings for the LLVM code
local fmt_load = "{out} = load %{name}, %{name}* {this}\n"
local fmt_member = "{out} = getelementptr inbounds %{name}, "
      .. "%{name}* {this}, i32 0, i32 {index}\n"
local fmt_init = "call void @__ctor_init_{name}( %{name}* {this})\n"

-- Define a type with variations having different qualifiers
function defineType( name)
  local valTypeId = typedb:def_type( 0, name)
  local refTypeId = typedb:def_type( 0, name .. "&")
  typedb:def_reduction( valTypeId, refTypeId,
    function(this)
      local out = register()
      local code = constructor_format( fmt_load, {name=name,out=out,this=this.out})
      return {out=out, code=code}
    end, 1)
  return {val=valTypeId, ref=refTypeId}
end

local register = register_allocator()

-- Constructor for loading a member reference
function loadMemberConstructor( classname, index)
  return function( this)
    local out = register()
    local subst = {out=out,this=this.out,name=classname,index=index}
    local code = (this.code or "") .. constructor_format( fmt_member, subst)
    return {code=code, out=out}
  end
end
-- Constructor for calling the default ctor, assuming that it exists
function callDefaultCtorConstructor( classname)
  return function(this)
    local code = (this.code or "") .. (arg[1].code or "")
      .. constructor_format( fmt_init, {this=this.out,name=classname})
    return {code=code, out=out}
  end
end

local t_base = defineType("baseclass")
local t_class = defineType("class")

-- Define the default ctor of baseclass
local ctor = callDefaultCtorConstructor("baseclass")
local constructor_base = typedb:def_type( t_base.ref, "constructor", ctor)

-- Define the inheritance of class from baseclass
local load_base = loadMemberConstructor( "class", 1)
typedb:def_reduction( t_base.ref, t_class.ref, load_base, 1)

-- Search for the constructor of class (that does not exist)
local resolveTypeId, reductions, items
  = typedb:resolve_type( t_class.ref, "constructor")
if not resolveTypeId then
  print( "Not found")
elseif type( resolveTypeId) == "table" then
  print( "Ambiguous")
end

for _,item in ipairs(items or {}) do
  print( "Found " .. typedb:type_string( item) .. "\n")
end

输出

Found baseclass& constructor

添加标签

文件 intro/tutorial/tags2.lua

这是修复问题所需的编辑差异

3a4,17
> -- [1] Define all tags and tag masks
> -- Tags attached to reductions.
> -- When resolving a type or deriving a type,
> --  we select reductions by specifying a set of valid tags
> tag_typeDeclaration = 1  -- Type declaration relation (e.g. variable to data type)
> tag_typeDeduction   = 2  -- Type deduction (e.g. inheritance)
> 
> -- Sets of tags used for resolving a type or deriving a type
> tagmask_declaration
>   = typedb.reduction_tagmask( tag_typeDeclaration)
> tagmask_resolveType
>   = typedb.reduction_tagmask( tag_typeDeduction, tag_typeDeclaration)
> 
> 
46c60
<     end, 1)
---
>     end, tag_typeDeduction)
79c93
< typedb:def_reduction( t_base.ref, t_class.ref, load_base, 1)
---
> typedb:def_reduction( t_base.ref, t_class.ref, load_base, tag_typeDeduction)
83c97
<   = typedb:resolve_type( t_class.ref, "constructor")
---
>   = typedb:resolve_type( t_class.ref, "constructor", tagmask_declaration)

带标签的输出

Not found

结论

我们现在解释了 typedb:def_reduction 的第 4 个参数,在第一个例子中定义为 1。它是分配给归约的强制性标签。命令 typedb.reduction_tagmask 用于声明为解析和派生类型选择的命名标签集。

备注

typedb:derive_type 有第二个可选的标签掩码参数,它选择要计数并限制到指定数量(默认为1)的归约。其目的是允许对某些类别的归约施加限制。大多数静态类型编程语言都对参数的转换次数施加限制。第二个标签掩码可以帮助您实现这些限制。

带作用域的对象

在编译器中,我们有绑定到作用域的构建块。例如函数。这些构建块最好表示为对象。当我们在 AST 树遍历的深处时,我们希望有一种方法可以在不使用复杂的结构作为参数传递的情况下处理这些对象。那样会非常容易出错。尤其是在像 Lua 这样的非严格类型语言中。

引言

在当前作用域中定义对象实例
typedb:set_instance( name, obj)
获取此作用域或最近的封闭作用域的对象实例
obj = typedb:get_instance( name)
获取在当前作用域中声明的对象实例
obj = typedb:this_instance( name)

mewa = require "mewa"
typedb = mewa.typedb()

-- Attach a data structure for a callable to its scope
function defineCallableEnvironment( name)
  local env = {name=name, scope=(typedb:scope())}
  if typedb:this_instance( "callenv") then
    error( "Callable environment defined twice")
  end
  typedb:set_instance( "callenv", env)
  return env
end
-- Get the active callable name
function getFunctionName()
  return typedb:get_instance( "callenv").name
end

typedb:scope( {1,100} )
defineCallableEnvironment( "function X")
typedb:scope( {20,30} )
defineCallableEnvironment( "function Y")
typedb:scope( {70,90} )
defineCallableEnvironment( "function Z")

typedb:step( 10)
print( "We are in " .. getFunctionName())
typedb:step( 25)
print( "We are in " .. getFunctionName())
typedb:step( 85)
print( "We are in " .. getFunctionName())

输出

We are in function X
We are in function Y
We are in function Z

结论

将命名对象附加到作用域的能力对于可读性来说非常重要,这是原型设计的一个重要属性。它在解释器上下文中为系统带来了结构,而在这种环境中我们几乎没有可能通过契约来确保安全。

控制结构,实现一个 IF

下一步是实现一个 IF 语句并引入一类新的类型。大多数编程语言要求布尔表达式的求值在其结果确定后立即停止。像 if ptr && ptr->value() 这样的表达式在 ptr 为 NULL 指针的情况下不应该崩溃。

引言

为了评估布尔表达式和作为条件语句的条件类型,我们定义了两种新类型。

控制真值类型 (Control True Type)

该类型有一个字段 code,其中包含只要其求值为真就执行的代码;还有一个字段 out,其中包含一个未绑定的标签,当求值为假时控制流会跳转到该标签。

控制假值类型 (Control False Type)

该类型是控制真值类型的镜像类型。它有一个字段 code,其中包含只要其求值为假就执行的代码;还有一个字段 out,其中包含一个未绑定的标签,当求值为真时控制流会跳转到该标签。

IF 语句

IF 语句接受条件参数并将其转换为控制真值类型。生成的构造函数的代码与在条件为真时要评估的语句的构造函数代码连接起来。未绑定的标签在代码的末尾被绑定和声明。

mewa = require "mewa"
typedb = mewa.typedb()

-- Define some helper functions, that we discussed already before

function constructor_format( fmt, argtable)
  return (fmt:gsub("[{]([_%d%w]*)[}]",
    function( match) return argtable[ match] or "" end))
end

function register_allocator()
  local i = 0
  return function ()
    i = i + 1
    return "%R" .. i
  end
end

-- Label allocator is the analogon to the register_allocator but for labels
function label_allocator()
  local i = 0
  return function ()
    i = i + 1
    return "L" .. i
  end
end

function applyConstructor( constructor, this, arg)
  if constructor then
    if (type(constructor) == "function") then
      return constructor( this, arg)
    else
      return constructor
    end
  else
    return this
  end
end

local register = register_allocator()
local label = label_allocator()

local controlTrueType = typedb:def_type( 0, " controlTrueType")
local controlFalseType = typedb:def_type( 0, " controlFalseType")
local scalarBooleanType = typedb:def_type( 0, "bool")

local fmt_conditional = "br i1 {inp}, label %{on}, label %{out}\n{on}:\n"
local fmt_bindlabel = "br label {out}\n{out}:\n"

-- Build a control true type from a boolean value
local function booleanToControlTrueType( constructor)
  local true_label = label()
  local false_label = label()
  local subst = {inp=constructor.out, out=false_label, on=true_label}
  return {code = constructor.code .. constructor_format( fmt_conditional, subst),
    out = false_label}
end
-- Build a control false type from a boolean value
local function booleanToControlFalseType( constructor)
  local true_label = label()
  local false_label = label()
  local subst = {inp=constructor.out, out=true_label, on=false_label}
  return {code = constructor.code .. constructor_format( fmt_conditional, subst),
    out = true_label}
end

typedb:def_reduction( controlTrueType, scalarBooleanType,
      booleanToControlTrueType, 1)
typedb:def_reduction( controlFalseType, scalarBooleanType,
      booleanToControlFalseType, 1)

-- Implementation of an 'if' statement
--   condition is a type/constructor pair,
--   block a constructor, return value is a constructor
function if_statement( condition, block)
  local reductions,weight,altpath
    = typedb:derive_type( controlTrueType, condition.type)
  if not weight then error( "Type not usable as conditional") end
  if altpath then error("Ambiguous") end
  local constructor = condition.constructor
  for _,redu in ipairs(reductions or {}) do
    constructor = applyConstructor( redu.constructor, constructor)
  end
  local code = constructor.code .. block.code
      .. constructor_format( fmt_bindlabel, {out=constructor.out})
  return {code=code}
end

local condition_in = register()
local condition_out = register()
local condition = {
  type=scalarBooleanType,
  constructor={code = constructor_format( "{out} = icmp ne i32 {this}, 0\n",
        {out=condition_out,this=condition_in}), out=condition_out}
}
local block = {code = "... this code is executed if the value in "
      .. condition_in .. " is not 0 ...\n"}

print( if_statement( condition, block).code)

输出

%R2 = icmp ne i32 %R1, 0
br i1 %R2, label %L1, label %L2
L1:
... this code is executed if the value in %R1 is not 0 ...
br label L2
L2:

结论

控制结构的实现是通过构建所需的控制布尔类型来完成的。像逻辑 AND 或逻辑 OR 这样的布尔运算符是通过将控制布尔类型与其他类型连接起来构建的。在这个例子中没有这样做,但在构建了一个 IF 之后,这是可以想象的。用归约规则构建类型的过程并未就此结束。控制结构并非完全是不同的东西。

我们现在已经看完了 typedb 库的全部函数。尚未解释的其余函数是用于设置和获取类型属性的便利函数。关于 API,虽然没有实质性的内容需要解释,但关于最佳实践以及如何使用此 API 还有很多可以讨论的地方。

我想我们现在已经准备好整体地看一下我们的示例编译器了。

示例语言语法

现在我们来看一下示例语言的语法。

解析器生成器

Mewa 实现了一个 LALR(1) 解析器生成器。带属性语法的源文件有 3 个部分。

  • 一些以前缀 '%' 标记的配置。
  • 词法单元(tokens)定义为名称/模式对。模式定义为匹配词素(lexeme)的正则表达式。可选地,可以指定一个索引来选择一个子匹配作为词法单元的值。
  • 一个带属性的语法,其中关键字为字符串,词法单元名称或产生式名称为标识符。

产生式属性运算符

  • 产生式属性中右侧圆括号内的运算符 >> 会增加作用域步骤。
  • 产生式属性中的运算符 {} 定义了一个作用域。

产生式头部属性

属性 L1, L2, ... 定义产生式为左结合,并指定一个数字作为优先级。属性 R1, R2, ... 定义产生式为右结合,并指定一个数字作为优先级。优先级数字越大,优先级越高。产生式优先级用于定义运算符的优先级。

% LANGUAGE intro;
% TYPESYSTEM "intro/typesystem";
% CMDLINE "cmdlinearg";
% COMMENT "/*" "*/";
% COMMENT "//";

BOOLEAN : '((true)|(false))';
IDENT    : '[a-zA-Z_]+[a-zA-Z_0-9]*';
DQSTRING: '["]((([^\\"\n]+)|([\\][^"\n]))*)["]' 1;
UINTEGER: '[0123456789]+';
FLOAT    : '[0123456789]*[.][0123456789]+';
FLOAT    : '[0123456789]*[.][0123456789]+[Ee][+-]{0,1}[0123456789]+';

program
          = extern_deflist free_deflist main_proc   (program)
          ;
extern_deflist
          = extern_def extern_deflist
          |
          ;
free_deflist
          = free_def free_deflist
          |
          ;
inclass_deflist
          = inclass_def inclass_deflist
          |
          ;
extern_def
          = "extern" "function" IDENT typespec
               "(" extern_paramlist ")" ";"         (extern_funcdef)
          ;
extern_paramdecl
          = typespec IDENT                          (extern_paramdef)
          | typespec                                (extern_paramdef)
          ;
extern_parameters
          = extern_paramdecl "," extern_parameters
          | extern_paramdecl
          ;
extern_paramlist
          = extern_parameters                       (extern_paramdeflist)
          |                                         (extern_paramdeflist)
          ;
inclass_def
          = variabledef ";"                         (definition 2)
          | classdef                                (definition 1)
          | functiondef                             (definition_2pass 4)
          ;
free_def
          = variabledef ";"                         (definition 1)
          | classdef                                (definition 1)
          | functiondef                             (definition 1)
          ;
typename/L1
          = IDENT
          | IDENT "::" typename
          ;
typehdr/L1
          = typename                                (typehdr)
          ;
typegen/L1
          = typehdr
          | typegen "[" UINTEGER "]"                (arraytype)
          ;
typespec/L1
          = typegen                                 (typespec)
          ;
classdef
          = "class" IDENT "{" inclass_deflist "}"   (>>classdef)
          ;
functiondef
          = "function" IDENT typespec callablebody  (funcdef)
          ;
callablebody
          = "(" impl_paramlist ")"
                  "{" statementlist "}"             ({}callablebody)
          ;
main_proc
          = "main" codeblock                        (main_procdef)
          |
          ;
impl_paramlist
          = impl_parameters                         (paramdeflist)
          |                                         (paramdeflist)
          ;
impl_parameters
          = impl_paramdecl "," impl_parameters
          | impl_paramdecl
          ;
impl_paramdecl
          = typespec IDENT                          (paramdef)
          ;
codeblock/L1
          = "{" statementlist "}"                   ({}codeblock)
          ;
statementlist/L1
          = statement statementlist                 (>>)
          |
          ;
elseblock/L1
          = "elseif" "(" expression ")"
                  codeblock elseblock               (conditional_elseif)
          | "elseif" "(" expression ")" codeblock   (conditional_elseif)
          | "else" codeblock                        (conditional_else)
          ;
statement/L1
          = classdef                                (definition 1)
          | functiondef                             (definition 1)
          | "var" variabledef ";"                   (>>definition 1)
          | expression ";"                          (free_expression)
          | "return" expression ";"                 (>>return_value)
          | "return" ";"                            (>>return_void)
          | "if" "(" expression ")"
               codeblock elseblock            (conditional_if)
          | "if" "(" expression ")" codeblock       (conditional_if)
          | "while" "(" expression ")" codeblock    (conditional_while)
          | codeblock
          ;
variabledef
          = typespec IDENT                          (>>vardef)
          | typespec IDENT "=" expression           (>>vardef)
          ;
expression/L1
          = "{" expressionlist "}"                  (>>structure)
          | "{" "}"                                 (>>structure)
          ;
expression/L2
          = IDENT                                   (variable)
          | BOOLEAN                                 (constant "constexpr bool")
          | UINTEGER                                (constant "constexpr int")
          | FLOAT                                   (constant "constexpr double")
          | DQSTRING                                (constant "constexpr string")
          | "(" expression ")"
          ;
expression/L3
          = expression  "="  expression             (>>binop "=")
          ;
expression/L4
          = expression  "||"  expression            (>>binop "||")
          ;
expression/L5
          = expression  "&&"  expression            (>>binop "&&")
          ;
expression/L8
          = expression  "=="  expression            (>>binop "==")
          | expression  "!="  expression            (>>binop "!=")
          | expression  "<="  expression            (>>binop "<=")
          | expression  "<"  expression             (>>binop "<")
          | expression  ">="  expression            (>>binop ">=")
          | expression  ">"  expression             (>>binop ">")
          ;
expression/L9
          = expression  "+"  expression             (>>binop "+")
          | expression  "-"  expression             (>>binop "-")
          | "-"  expression                         (>>unop "-")
          | "+"  expression                         (>>unop "+")
          ;
expression/L10
          = expression  "*"  expression             (>>binop "*")
          | expression  "/"  expression             (>>binop "/")
          ;
expression/L12
          = expression "." IDENT                    (member)
          ;
expression/L13
          = expression  "(" expressionlist ")"      (>>operator "()")
          | expression  "(" ")"                     (>>operator "()")
          | expression  "[" expressionlist "]"      (>>operator "[]")
          ;
expressionlist/L0
          = expression "," expressionlist
          | expression
          ;

类型系统

现在让我们来概览一下生成代码的 typesystem 模块的实现。现在展示的代码将更有条理、更完整,但与教程相反,它不再是自包含的了。

与示例 language1 不同,typesystem 模块被分成了几个部分,每部分最多 100 行代码,以便于理解。实现辅助函数的代码片段位于 examples/intro/sections 目录中。实现附加到 AST 节点的函数的代码片段位于 examples/intro/ast 目录中。

由于这部分代码量较大,我们不会像教程中那样仔细地检查它。但我希望在学习了教程之后,您仍然能够理解所展示的代码。

标题

我们先来看看 typesystem.lua 的头部。

子模块 llvmir

子模块 llvmir 实现了所有用于 LLVM IR 代码生成的模板。我们在教程的例子中看到过像 {out} = load i32, i32* {this} 这样的模板,其中花括号内是替代项。在示例编译器中,这些模板都在模块 llvmir 中声明,并通过名称引用。模块 llvmir 有一个子模块 llvmir_scalar,它是根据我们语言的标量类型描述生成的。

llvmir.lua

您在这里看到了文件 llvmir.lua 的一个示例定义。花括号 '{' '}' 中的数字会被寄存器分配器自动生成的值替换。标识符必须在表中明确提供。

llvmir.structTemplate = {
    symbol = "{symbol}",
    def_local = "{out} = alloca %{symbol}, align 8\n",
    def_global = "{out} = internal global %{symbol} zeroinitializer, align 8\n",
    llvmtype = "%{symbol}",
    scalar = false,
    class = "struct",
    align = 8,
    assign = "store %{symbol} {arg1}, %{symbol}* {this}\n",
    load = "{out} = load %{symbol}, %{symbol}* {this}\n",
    loadelemref = "{out} = getelementptr inbounds %{symbol},"
                .. " %{symbol}* {this}, i32 0, i32 {index}\n",
    loadelem = "{1} = getelementptr inbounds %{symbol},"
                .. " %{symbol}* {this}, i32 0, i32 {index}\n"
                .. "{out} = load {type}, {type}* {1}\n",
    typedef = "%{symbol} = type { {llvmtype} }\n"
}

子模块 utils

子模块 typesystem_utils 实现了与语言无关的函数。例如,在教程中实例化 llvmir 代码模板的函数 constructor_format。它在 typesystem_utils 模块中以更强大的形式实现。字符串编码、名称修饰和 AST 树遍历函数是那里提供的其他函数示例。

全局变量

Mewa 的方法并不纯粹。如果能提供帮助,事物就会存储在 typedb 中。对于大多数其他事物,我们使用全局变量。typesystem.lua 的头部声明了大约十几个全局变量。在示例 language1 中,定义了大约 50 个变量。例如 typeDescriptionMap,这是一个将每种类型与一个包含该类型代码生成模板的表关联起来的映射。

mewa = require "mewa"
local utils = require "typesystem_utils"
llvmir = require "llvmir"
typedb = mewa.typedb()

-- Global variables
localDefinitionContext = 0    -- Context type of local definitions
seekContextKey = "seekctx"    -- Key for seek context types defined for a scope
callableEnvKey = "env"        -- Key for current callable environment
typeDescriptionMap = {}       -- Map of type ids to their description
referenceTypeMap = {}         -- Map of value type to reference type
dereferenceTypeMap = {}       -- Map of reference type to value type
arrayTypeMap = {}             -- Map array key to its type
stringConstantMap = {}        -- Map of string constant value to its attributes
scalarTypeMap = {}            -- Map of scalar type name to its value type

dofile( "examples/intro/sections.inc")

-- AST Callbacks:
typesystem = {}
dofile( "examples/intro/ast.inc")
return typesystem

输出

我们的编译器前端的输出是通过编译器定义的函数 printprint_section 来打印的。函数 print 被重定向到一个附加到输出的 Code 部分的函数。这些部分是在目标文件中用于替换的变量。函数 print_section 附加到作为第一个参数指定的输出部分。

AST 遍历

所有 AST 节点函数都参与遍历,因为调用 AST 的子节点是它们的责任。我已在教程中提到,当前作用域是作为一个独立的方面实现的,并由 AST 遍历函数设置。遍历函数在 typesystem_utils 模块中实现。遍历函数有两种变体:

  • 函数 traverse( typedb, node, ...)
  • 函数 traverseRange( typedb, node, range, ...)

range 是一对 Lua 数组索引 {范围的第一个元素, 最后一个元素},用于指定要处理的子节点,以进行部分遍历。部分遍历用于选择性地处理子节点。例如,您可以先遍历函数名和参数,然后声明函数,再遍历函数体以启用递归。可变参数 "..." 会被转发给调用的 AST 节点函数。通过这种方式,您可以将参数向下传递给子节点。Mewa 的示例广泛使用参数。例如,传递声明上下文,以决定一个变量声明是结构体成员、局部变量还是全局变量。参数还用于实现多遍遍历或 AST,以特定顺序解析子树的声明,例如一个类。您将要处理的遍的索引作为参数向下传递给一个 AST 节点函数。但请注意,这种通信方式容易出错。您应该将其限制在最低限度,并改用作用域绑定的数据(typedb:set_instance, typedb:get_instance)甚至全局变量。

来自 typesystem_utils 的源代码
-- Tree traversal helper function setting the current scope/step and
--    calling the function of one node, and returning its result:
local function processSubnode( typedb, subnode, ...)
    local rt
    if subnode.call then
        if (subnode.scope) then
            local scope_bk,step_bk = typedb:scope( subnode.scope)
            typedb:set_instance( "node", subnode)
            if subnode.call.obj then
                rt = subnode.call.proc( subnode, subnode.call.obj, ...)
            else
                rt = subnode.call.proc( subnode, ...)
            end
            typedb:scope( scope_bk,step_bk)
        elseif (subnode.step) then
            local step_bk = typedb:step( subnode.step)
            if subnode.call.obj then
                rt = subnode.call.proc( subnode, subnode.call.obj, ...)
            else
                rt = subnode.call.proc( subnode, ...)
            end
            typedb:step( step_bk)
        else
            if subnode.call.obj then
                rt = subnode.call.proc( subnode, subnode.call.obj, ...)
            else
                rt = subnode.call.proc( subnode, ...)
            end
        end
    else
        rt = subnode.value
    end
    return rt
end
-- Tree traversal or a subrange of argument nodes, define scope/step and
--    process the AST node:
function utils.traverseRange( typedb, node, range, ...)
    if node.arg then
        local rt = {}
        local start,last = table.unpack(range)
        local lasti,ofs = last-start+1,start-1
        for ii=1,lasti do
    rt[ ii] = processSubnode( typedb, node.arg[ ofs+ii], ...)
  end
        return rt
    else
        return node.value
    end
end
-- Tree traversal, define scope/step and process the AST node:
function utils.traverse( typedb, node, ...)
    if node.arg then
        local rt = {}
        for ii,subnode in ipairs(node.arg) do
            rt[ii] = processSubnode( typedb, subnode, ...)
        end
        return rt
    else
        return node.value
    end
end

AST 节点函数

现在我们将访问附加到 AST 节点的函数。我将它们分成了涵盖不同方面的代码片段。大部分代码只是委托给我们将在下一节中检查的函数。

常量

在源文件中定义原子和结构体常量。

结构体,例如初始化列表

一个结构体有一个类型/构造函数对的列表作为其构造函数。这类似于函数的参数列表,而它本身就是。为了从初始化列表中递归地初始化对象,我们声明了一个从 constexprStructureType 类型到任何可以通过常量结构初始化的对象的归约。构造函数使用 typedb 来查找一个 ctor 类型,其列表与参数列表匹配。如果失败,构造函数返回 nil 来表示失败,并且依赖于此归约的解决方案应该被放弃。这种封装方式帮助我们递归地映射初始化列表。

function typesystem.constant( node, decl)
    local typeId = scalarTypeMap[ decl]
    local constructor = createConstexprValue( typeId, node.arg[1].value)
    return {type=typeId,constructor=constructor}
end
function typesystem.structure( node)
    local args = utils.traverse( typedb, node)
    return {type=constexprStructureType, constructor={list=args}}
end

变量

本节包含定义和查询变量的节点函数。这些 AST 节点函数只是委托给了我们稍后将重新审视的函数。

function typesystem.vardef( node, context)
    local datatype,varnam,initval
        = table.unpack( utils.traverse( typedb, node, context))
    return defineVariable( node, context, datatype, varnam, initval)
end
function typesystem.variable( node)
    local varname = node.arg[ 1].value
    return getVariable( node, varname)
end

外部函数声明

这里处理的是外部函数声明及其参数。函数 typesystem.extern_funcdef 调用树遍历来获取函数名和参数。参数的收集可以通过不同方式实现。这里,一个表由声明参数列表的节点定义,作为参数向下传递给递归列表声明,并由参数声明节点 typesystem.extern_paramdef 填充。使用 defineFunctionCall 进行的函数调用定义与自由函数和类方法的定义相同。

function typesystem.extern_funcdef( node)
    local context = {domain="global"}
    local name,ret,param = table.unpack( utils.traverse( typedb, node, context))
    if ret == voidType then ret = nil end
    -- ... void type as return value means we are in a procedure without return
    local descr = {
            name = name,
            symbolname = name,
            func ='@' .. name,
            ret =ret,
            param = param,
            signature="" }
    descr.rtllvmtype = ret and typeDescriptionMap[ ret].llvmtype or "void"
    defineFunctionCall( node, descr, context)
    local fmt = llvmir.control.extern_functionDeclaration
    print_section( "Typedefs", utils.constructor_format( fmt, descr))
end
function typesystem.extern_paramdef( node, param)
    local typeId = table.unpack( utils.traverseRange( typedb, node, {1,1}, context))
    local descr = typeDescriptionMap[ typeId]
    if not descr then
        utils.errorMessage( node.line,
            "Type '%s' not allowed as parameter of extern function",
            typedb:type_string( typeId))
    end
    table.insert( param, {type=typeId,llvmtype=descr.llvmtype})
end
function typesystem.extern_paramdeflist( node)
    local param = {}
    utils.traverse( typedb, node, param)
    return param
end

数据类型

本节定义了类、数组和原子数据类型。我们这里需要深入讨论的函数是 typesystem.classdef,即类的定义。

  1. 上下文已更改。无论将什么作为上下文传递给此 AST 节点函数,我们都会定义一个新上下文(classContext),在遍历中向下传递给子节点。该上下文是一个结构体,其中有一个字段 domain,用于指示我们是否在类定义内部("member")、函数体内部("local")或不在任何结构内部("global")。根据上下文,它还有其他字段。它也用于在类中收集数据(structsizemembers)从其子节点中收集。
  2. 子节点的遍历有 4 个遍。遍的索引作为参数向下传递,供子节点决定要做什么。
    • 第 1 遍:类型定义
    • 第 2 遍:成员变量声明
    • 第 3 遍:方法声明(遍历调用函数声明的第 1 遍)
    • 第 4 遍:方法实现(遍历调用函数声明的第 2 遍)

类节点定义了一个多遍求值,节点 typesystem.definition_2passtypesystem.definition 实现了封装何时处理何种内容的决策的门控。

function typesystem.typehdr( node, decl)
    return resolveTypeFromNamePath( node, utils.traverse( typedb, node))
end
function typesystem.arraytype( node)
    local dim = tonumber( node.arg[2].value)
    local elem = table.unpack( utils.traverseRange( typedb, node, {1,1}))
    return {type=getOrCreateArrayType( node, expectDataType( node, elem), dim)}
end
function typesystem.typespec( node)
    return expectDataType( node, table.unpack( utils.traverse( typedb, node)))
end
function typesystem.classdef( node, context)
    local typnam = node.arg[1].value
    local declContextTypeId = getDeclarationContextTypeId( context)
    pushSeekContextType( declContextTypeId)
    local tpl = llvmir.structTemplate
    local typeId,descr = defineStructureType( node, declContextTypeId, typnam, tpl)
    local classContext = {domain="member", decltype=typeId, members={}, descr=descr}
    utils.traverse( typedb, node, classContext, 1)    -- 1st pass: type definitions
    utils.traverse( typedb, node, classContext, 2)    -- 2nd pass: member variables
    descr.size = classContext.structsize
    descr.members = classContext.members
    defineClassStructureAssignmentOperator( node, typeId)
    utils.traverse( typedb, node, classContext, 3)    -- 3rd pass: method decl
    utils.traverse( typedb, node, classContext, 4)    -- 4th pass: method impl
    local subst = {llvmtype=classContext.llvmtype}
    print_section( "Typedefs", utils.constructor_format( descr.typedef, subst))
    popSeekContextType( declContextTypeId)
end
function typesystem.definition( node, pass, context, pass_selected)
    if not pass_selected or pass == pass_selected then    -- pass as in the grammar
        local statement = table.unpack(
            utils.traverse( typedb, node, context or {domain="local"}))
        return statement and statement.constructor or nil
    end
end
function typesystem.definition_2pass( node, pass, context, pass_selected)
    if not pass_selected then
        return typesystem.definition( node, pass, context, pass_selected)
    elseif pass == pass_selected+1 then
        utils.traverse( typedb, node, context, 1)     -- 3rd pass: method decl
    elseif pass == pass_selected then
        utils.traverse( typedb, node, context, 2)    -- 4th pass: method impl
    end
end

函数声明

现在我们来看一下在被处理的源文件中实现的函数。正如数据类型部分已经提到的,遍历分为两遍:声明和实现。一个辅助函数 utils.allocNodeData 将第一遍中创建的函数描述附加到节点上。函数 utils.getNodeData 在第二遍中引用它。在第二遍结束时,LLVM 函数声明被打印到输出。函数调用和带有参数的函数类型的构造函数已经在第一遍中声明了。

还值得一提的是 defineCallableEnvironment 的调用,它创建了一个称为可调用环境的结构,并将其绑定到函数体的作用域。我们在教程中已经看到了这是如何工作的(带作用域的对象)。可调用环境结构包含了所有与函数相关的数据,如返回类型、寄存器分配器等。它通过函数 getCallableEnvironment 进行访问。

AST 节点函数 typesystem.callablebody 在可调用体的作用域内收集自由函数或类方法描述的元素。它以与我们已经见过的外部函数声明类似的方式收集参数列表。

function typesystem.funcdef( node, context, pass)
    local typnam = node.arg[1].value
    if not pass or pass == 1 then    -- 1st pass: function declaration
        local rtype = table.unpack(
                        utils.traverseRange( typedb, node, {2,2}, context))
        if rtype == voidType then rtype = nil end -- void return => procedure
        local symbolname = (context.domain == "member")
                    and (typedb:type_name(context.decltype) .. "__" .. typnam)
                    or typnam
        local rtllvmtype = rtype and typeDescriptionMap[ rtype].llvmtype or "void"
        local descr = {
                lnk = "internal",
                name = typnam,
                symbolname = symbolname,
                func = '@'..symbolname,
                ret = rtype,
                rtllvmtype = rtllvmtype,
                attr = "#0"}
        utils.traverseRange( typedb, node, {3,3}, context, descr, 1)
        utils.allocNodeData( node, localDefinitionContext, descr)
        defineFunctionCall( node, descr, context)
    end
    if not pass or pass == 2 then     -- 2nd pass: function implementation
        local descr = utils.getNodeData( node, localDefinitionContext)
        utils.traverseRange( typedb, node, {3,3}, context, descr, 2)
        if descr.ret then
            local rtdescr = typeDescriptionMap[descr.ret]
            local subst = {type=rtdescr.llvmtype,this=rtdescr.default}
            descr.body = descr.body
                .. utils.constructor_format( llvmir.control.returnStatement, subst)
        else
            descr.body = descr.body
                .. utils.constructor_format( llvmir.control.returnVoidStatement)
        end
        print( utils.constructor_format( llvmir.control.functionDeclaration, descr))
    end
end
function typesystem.callablebody( node, context, descr, selectid)
    local rt
    local context_ = {domain="local"}
    if selectid == 1 then -- parameter declarations
        local envname = "body " .. descr.name
        descr.env = defineCallableEnvironment( node, envname, descr.ret)
        descr.param = table.unpack(
                            utils.traverseRange( typedb, node, {1,1}, context_))
        descr.paramstr = getDeclarationLlvmTypeRegParameterString( descr, context)
    elseif selectid == 2 then -- statements in body
        if context.domain == "member" then
            expandMethodEnvironment( node, context, descr, descr.env)
        end
        local stmlist = utils.traverseRange( typedb, node, {2,#node.arg}, context_)
        local code = ""
        for _,statement in ipairs(stmlist) do code = code .. statement.code end
        descr.body = code
    end
    return rt
end
function typesystem.paramdef( node, param)
    local datatype,varname = table.unpack( utils.traverse( typedb, node, param))
    table.insert( param, defineParameter( node, datatype, varname))
end
function typesystem.paramdeflist( node, param)
    local param = {}
    utils.traverse( typedb, node, param)
    return param
end

运算符

本节定义了实现运算符的 AST 节点的函数。运算符、成员引用、函数调用的构造被重定向到我们稍后将访问的函数 applyCallable。函数调用被实现为可调用类型的 "()" 运算符。这里使用的函数 expectValueTypeexpectDataType 是断言。

function typesystem.binop( node, operator)
    local this,operand = table.unpack( utils.traverse( typedb, node))
    expectValueType( node, this)
    expectValueType( node, operand)
    return applyCallable( node, this, operator, {operand})
end
function typesystem.unop( node, operator)
    local this = table.unpack( utils.traverse( typedb, node))
    expectValueType( node, operand)
    return applyCallable( node, this, operator, {})
end
function typesystem.member( node)
    local struct,name = table.unpack( utils.traverse( typedb, node))
    return applyCallable( node, struct, name)
end
function typesystem.operator( node, operator)
    local args = utils.traverse( typedb, node)
    local this = table.remove( args, 1)
    return applyCallable( node, this, operator, args)
end

控制结构

正如我们在教程中学到的,条件语句是通过将一个控制布尔类型与一些代码块连接起来实现的。控制布尔类型是通过 getRequiredTypeConstructor 从条件类型派生出来的。

返回节点函数也使用 getRequiredTypeConstructor 从其参数派生返回值。返回类型可在可调用环境结构中访问。

function typesystem.conditional_if( node)
    local env = getCallableEnvironment()
    local exitLabel = env.label()
    local condition,yesblock,noblock
        = table.unpack( utils.traverse( typedb, node, exitLabel))
    local rt = conditionalIfElseBlock(
            node, condition, yesblock, noblock, exitLabel)
    rt.code = rt.code
        .. utils.constructor_format( llvmir.control.label, {inp=exitLabel})
    return rt
end
function typesystem.conditional_else( node, exitLabel)
    return table.unpack( utils.traverse( typedb, node))
end
function typesystem.conditional_elseif( node, exitLabel)
    local env = getCallableEnvironment()
    local condition,yesblock,noblock
        = table.unpack( utils.traverse( typedb, node, exitLabel))
    return conditionalIfElseBlock(
            node, condition, yesblock, noblock, exitLabel)
end
function typesystem.conditional_while( node)
    local env = getCallableEnvironment()
    local condition,yesblock = table.unpack( utils.traverse( typedb, node))
    local cond_constructor
        = getRequiredTypeConstructor(
            node, controlTrueType, condition,
            tagmask_matchParameter, tagmask_typeConversion)
    if not cond_constructor then
        utils.errorMessage( node.line, "Can't use type '%s' as a condition",
                    typedb:type_string(condition.type))
    end
    local start = env.label()
    local startcode = utils.constructor_format( llvmir.control.label, {inp=start})
    local subst = {out=start,inp=cond_constructor.out}
    return {code = startcode .. cond_constructor.code .. yesblock.code
            .. utils.constructor_format( llvmir.control.invertedControlType, subst)}
end
function typesystem.return_value( node)
    local operand = table.unpack( utils.traverse( typedb, node))
    local env = getCallableEnvironment()
    local rtype = env.returntype
    local descr = typeDescriptionMap[ rtype]
    expectValueType( node, operand)
    if rtype == 0 then
        utils.errorMessage( node.line, "Can't return value from procedure")
    end
    local this,code = constructorParts(
                getRequiredTypeConstructor(
                    node, rtype, operand,
                    tagmask_matchParameter, tagmask_typeConversion))
    local subst = {this=this, type=descr.llvmtype}
    return {code = code
            .. utils.constructor_format( llvmir.control.returnStatement, subst)}
end
function typesystem.return_void( node)
    local env = getCallableEnvironment()
    if env.returntype ~= 0 then
        utils.errorMessage( node.line, "Can't return without value from function")
    end
    local code = utils.constructor_format( llvmir.control.returnVoidStatement, {})
    return {code = code}
end

代码块及其他

最后,我们来到包含不适合放在其他地方的代码块的部分。

  • 顶层节点,即程序,它在通过子节点遍历进行进一步处理之前,对语言进行一些初始化。
  • 主程序节点在这里做的不多,因为这里没有实现程序参数和程序返回值。
  • 节点函数 typesystem.codeblock 将一系列语句合并成一个构造函数。
  • 节点函数 typesystem.free_expression 从一个表达式节点创建一个语句节点。
function typesystem.program( node, options)
    defineConstExprArithmetics()
    defineConstExprTypeConversions()
    initBuiltInTypes()
    initControlBooleanTypes()
    local context = {domain="global"}
    utils.traverse( typedb, node, context)
end
function typesystem.main_procdef( node)
    local env = defineCallableEnvironment( node, "main ", scalarIntegerType)
    local block = table.unpack( utils.traverse( typedb, node, {domain="local"}))
    local subst = {type="i32",this="0"}
    local body = block.code
            .. utils.constructor_format( llvmir.control.returnStatement, subst)
    print( "\n" .. utils.constructor_format(
                llvmir.control.mainDeclaration, {body=body}))
end
function typesystem.codeblock( node)
    local stmlist = utils.traverse( typedb, node)
    local code = ""
    for _,stm in ipairs(stmlist) do code = code .. stm.code end
    return {code=code}
end
function typesystem.free_expression( node)
    local expression = table.unpack( utils.traverse( typedb, node))
    if expression.type == controlTrueType
            or expression.type == controlFalseType then
        return {code = expression.constructor.code
                .. utils.constructor_format(
                        llvmir.control.label,
                        {inp=expression.constructor.out})}
    else
        return {code=expression.constructor.code}
    end
end

类型系统函数

本章将概述我们现在已经看到的 AST 节点函数中使用的函数。它们也被分成了涵盖不同方面的代码片段。

归约权重

这里定义了所有归约的权重。我们在教程中已经解释了为归约加权的必要性。

-- List of reduction weights:
rdw_conv = 1.0               -- Weight of conversion
rdw_constexpr_float = 0.375  -- Conversion of constexpr float -> int or int -> float
rdw_constexpr_bool = 0.50    -- Conversion of constexpr num -> bool or bool -> num
rdw_load = 0.25              -- Weight of loading a value
rdw_strip_r_1 = 0.25 / (1*1) -- Weight of fetch value type with 1 qualifier
rdw_strip_r_2 = 0.25 / (2*2) -- Weight of fetch value type with 2 qualifiers
rdw_strip_r_3 = 0.25 / (3*3) -- Weight of fetch value type with 3 qualifiers
rwd_control = 0.125 / (4*4)  -- Weight of control true/false type <-> bool

weightEpsilon = 1E-8         -- Epsilon used for comparing weights for equality

归约标签和标签掩码

本节定义了所有的归约标签和标签掩码。我们已经在教程中解释了标记的必要性。

-- Tags attached to reduction definitions
tag_typeDeclaration = 1   -- Type declaration relation (e.g. variable to data type)
tag_typeDeduction = 2     -- Type deduction (e.g. inheritance)
tag_typeConversion = 3    -- Type value conversion
tag_typeInstantiation = 4 -- Type value construction from const expression

-- Sets of tags used for resolving a type or deriving a type, depending on the case
tagmask_declaration = typedb.reduction_tagmask( tag_typeDeclaration)
tagmask_resolveType = typedb.reduction_tagmask(
                tag_typeDeduction, tag_typeDeclaration)
tagmask_matchParameter = typedb.reduction_tagmask(
                tag_typeDeduction, tag_typeDeclaration,
                tag_typeConversion, tag_typeInstantiation)
tagmask_typeConversion = typedb.reduction_tagmask( tag_typeConversion)

类型声明字符串

此函数提供类型的签名字符串,包括上下文类型和参数类型。该签名字符串用于在错误消息中引用类型。

-- Type string of a type declaration built from its parts for error messages
function typeDeclarationString( this, typnam, args)
    local rt = (this == 0 or not this)
            and typnam
            or (typedb:type_string(type(this) == "table"
                    and this.type
                    or this
                ) .. " " .. typnam)
    if (args) then
        rt = rt .. "(" .. utils.typeListString( typedb, args) .. ")"
    end
    return rt
end

调用和提升调用

这里是定义带参数和返回值的调用的函数。对于一等标量类型,我们通常还需要查看第二个参数来确定要调用的构造函数。大多数静态类型编程语言将整数与浮点数的乘法指定为浮点数的乘法。如果我们将运算符定义为依赖于第一个参数,我们就必须将 int * double 的调用定义为将第一个操作数转换为 double,然后进行 double * double 乘法。我在示例中称这些调用为“提升调用”(将第一个参数提升到第二个参数的类型)。整数参数被提升为 double,然后调用 double * double 的构造函数。

-- Constructor for a promote call (implementing int + double by promoting
--   the first argument int to double and do a double + double to get the result)
function promoteCallConstructor( call_constructor, promote)
    return function( this, arg)
        return call_constructor( promote( this), arg)
    end
end
-- Define an operation with involving the promotion of the left-hand argument to
--   another type and executing the operation as defined for the type promoted to.
function definePromoteCall( rType, thisType, proType, opr, argTypes, promote)
    local call_type = typedb:get_type( proType, opr, argTypes)
    local call_constructor = typedb:type_constructor( call_type)
    local constructor = promoteCallConstructor( call_constructor, promote)
    local callType = typedb:def_type( thisType, opr, constructor, argTypes)
    if callType == -1 then
        utils.errorMessage( node.line, "Duplicate definition '%s'",
                typeDeclarationString( thisType, opr, argTypes))
    end
    if rType then
        typedb:def_reduction( rType, callType, nil, tag_typeDeclaration)
    end
end
-- Define an operation generalized
function defineCall( rType, thisType, opr, argTypes, constructor)
    local callType = typedb:def_type( thisType, opr, constructor, argTypes)
    if callType == -1 then
        local declstr = typeDeclarationString( thisType, opr, argTypes)
        utils.errorMessage( 0, "Duplicate definition of call '%s'", declstr)
    end
    if rType then
        typedb:def_reduction( rType, callType, nil, tag_typeDeclaration)
    end
    return callType
end

应用构造函数

本节定义了构造函数的调用。我们在教程中(变量赋值)对它的工作方式有了一些印象。函数 tryApplyConstructorapplyConstructor 通过将构造函数应用于参数列表来构建一个构造函数。函数 tryApplyReductionListapplyReductionList 通过将一系列归约应用于一个基础构造函数来构建一个构造函数。带有前缀 try 的函数在出现错误时通过其返回值报告未命中或歧义,而不是使用 utils.errorMessage 退出。

-- Application of a constructor depending on its type and its argument type.
-- Return false as 2nd result on failure, true on success
function tryApplyConstructor( node, typeId, constructor, arg)
    if constructor then
        if (type(constructor) == "function") then
            local rt = constructor( arg)
            return rt, rt ~= nil
        elseif arg then
            utils.errorMessage( node.line,
                    "Reduction constructor overwriting constructor for '%s'",
                    typedb:type_string(typeId))
        else
            return constructor, true
        end
    else
        return arg, true
    end
end
-- Application of a constructor depending on its type and its argument type.
-- Throw error on failure.
function applyConstructor( node, typeId, constructor, arg)
    local result_constructor,success
        = tryApplyConstructor( node, typeId, constructor, arg)
    if not success then
        utils.errorMessage( node.line, "Failed to create type '%s'",
                    typedb:type_string(typeId))
    end
    return result_constructor
end
-- Try to apply a list of reductions on a constructor.
-- Return false as 2nd result on failure, true on success
function tryApplyReductionList( node, redulist, constructor)
    local success = true
    for _,redu in ipairs(redulist) do
        constructor,success
            = tryApplyConstructor( node, redu.type, redu.constructor, constructor)
        if not success then return nil end
    end
    return constructor, true
end
-- Apply a list of reductions on a constructor, throw error on failure
function applyReductionList( node, redulist, constructor)
    local success = true
    for _,redu in ipairs(redulist) do
        constructor,success
            = tryApplyConstructor(
                node, redu.type, redu.constructor, constructor)
        if not success then
            utils.errorMessage( node.line, "Reduction constructor failed for '%s'",
                        typedb:type_string(redu.type))
        end
    end
    return constructor
end

解析类型

这里是用于解析类型、映射类型和断言类型属性的函数。

  • 函数 selectNoArgumentType 过滤出一个没有参数的解析类型匹配,即一个变量或一个数据类型。
  • 函数 resolveTypeFromNamePath 从结构名称路径(classname1 "::" classname2 ...)解析一个类型。
  • 函数 tryGetWeightedParameterReductionList 获取用于参数匹配的归约及其权重。该权重可用于累积一个函数匹配的整体权重,以确定最佳匹配。在示例语言中,最大值被用作函数匹配权重的累积函数。
  • 函数 getRequiredTypeConstructor 从一种类型派生出另一种类型。我们在实现 return 或像 ifwhile 等条件语句时看到了该函数的应用。
  • 函数 expectValueType 断言参数类型具有构造函数。这意味着它是一个值类型。
  • 函数 expectDataType 断言参数类型没有构造函数。这意味着它是一个数据类型。
-- Get the handle of a type expected to have no arguments
function selectNoArgumentType(
        node, seekctx, typeName, tagmask, resolveContextType, reductions, items)
    if not resolveContextType or type(resolveContextType) == "table" then
        io.stderr:write( "TRACE typedb:resolve_type\n"
                .. utils.getResolveTypeTrace( typedb, seekctx, typeName, tagmask)
                .. "\n")
        utils.errorResolveType( typedb, node.line,
                      resolveContextType, seekctx, typeName)
    end
    for ii,item in ipairs(items) do
        if typedb:type_nof_parameters( item) == 0 then
            local constructor = applyReductionList( node, reductions, nil)
            local item_constr = typedb:type_constructor( item)
            constructor = applyConstructor( node, item, item_constr, constructor)
            return item,constructor
        end
    end
    utils.errorMessage( node.line, "Failed to resolve %s with no arguments",
                  utils.resolveTypeString( typedb, seekctx, typeName))
end
-- Get the type handle of a type defined as a path (elements of the
--    path are namespaces and parent structures followed by the type name resolved)
function resolveTypeFromNamePath( node, arg, argidx)
    if not argidx then argidx = #arg end
    local typeName = arg[ argidx]
    local seekContextTypes
    if argidx > 1 then
        seekContextTypes = resolveTypeFromNamePath( node, arg, argidx-1)
        expectDataType( node, seekContextTypes)
    else
        seekContextTypes = getSeekContextTypes()
    end
    local resolveContextType, reductions, items
            = typedb:resolve_type( seekContextTypes, typeName, tagmask_namespace)
    local typeId,constructor
            = selectNoArgumentType( node, seekContextTypes, typeName,
                    tagmask_namespace, resolveContextType, reductions, items)
    return {type=typeId, constructor=constructor}
end
-- Try to get the constructor and weight of a parameter passed with the
--    deduction tagmask optionally passed as an argument
function tryGetWeightedParameterReductionList(
        node, redutype, operand, tagmask_decl, tagmask_conv)
    if redutype ~= operand.type then
        local redulist,weight,altpath
            = typedb:derive_type( redutype, operand.type,
                                        tagmask_decl, tagmask_conv)
        if altpath then
            utils.errorMessage( node.line, "Ambiguous derivation for '%s': %s | %s",
                        typedb:type_string(operand.type),
                        utils.typeListString(typedb,altpath," =>"),
                        utils.typeListString(typedb,redulist," =>"))
        end
        return redulist,weight
    else
        return {},0.0
    end
end
-- Get the constructor of a type required. The deduction tagmasks are arguments
function getRequiredTypeConstructor(
        node, redutype, operand, tagmask_decl, tagmask_conv)
    if redutype ~= operand.type then
        local redulist,weight,altpath
            = typedb:derive_type( redutype, operand.type,
                                        tagmask_decl, tagmask_conv)
        if not redulist or altpath then
            io.stderr:write( "TRACE typedb:derive_type "
                    .. typedb:type_string(redutype)
                    .. " <- " .. typedb:type_string(operand.type) .. "\n"
                    .. utils.getDeriveTypeTrace( typedb, redutype, operand.type,
                                    tagmask_decl, tagmask_conv)
                    .. "\n")
        end
        if not redulist then
            utils.errorMessage( node.line, "Type mismatch, required type '%s'",
                        typedb:type_string(redutype))
        elseif altpath then
            utils.errorMessage( node.line, "Ambiguous derivation for '%s': %s | %s",
                        typedb:type_string(operand.type),
                        utils.typeListString(typedb,altpath," =>"),
                        utils.typeListString(typedb,redulist," =>"))
        end
        local rt = applyReductionList( node, redulist, operand.constructor)
        if not rt then
            utils.errorMessage( node.line, "Construction of '%s' <- '%s' failed",
                        typedb:type_string(redutype),
                        typedb:type_string(operand.type))
        end
        return rt
    else
        return operand.constructor
    end
end
-- Issue an error if the argument does not refer to a value type
function expectValueType( node, item)
    if not item.constructor then
        utils.errorMessage( node.line, "'%s' does not refer to a value",
                    typedb:type_string(item.type))
    end
end
-- Issue an error if the argument does not refer to a data type
function expectDataType( node, item)
    if item.constructor then
        utils.errorMessage( node.line, "'%s' does not refer to a data type",
                    typedb:type_string(item.type))
    end
    return item.type
end
注意

这里调用的 getSeekContextTypes() 函数在解析类型时很重要。它提供了用于解析类型的候选上下文类型/构造函数对的列表。例如,在结构体内部,允许通过名称访问该结构体的元素,而无需该元素的完整命名空间路径。但我们也可以访问全局类型。因此,我们需要维护一个当前上下文类型的列表,该列表在进入结构体时扩展,在退出时缩小。上下文类型的列表绑定到一个作用域。这用一行代码实现了 C++usingPascalWITH。您将参数添加到当前作用域的查找上下文类型列表中。

应用可调用对象

本节是整个示例编译器中最复杂的部分。它展示了如何找到带参数的可调用对象的最佳匹配的代码。候选对象从一个按权重排序的优先队列中获取。调用排名靠前的候选对象的构造函数,如果它们成功构建了对象,则返回该匹配。构造失败的候选对象将被丢弃。

函数 tryApplyCallableapplyCallable 用于解析所有带参数的类型(运算符、成员访问、函数调用等)。

-- For a callable item, create for each argument the lists of reductions needed to
--   pass the arguments to it, with accumulation of the reduction weights
function collectItemParameter( node, item, args, parameters)
    local rt = {redulist={},llvmtypes={},weight=0.0}
    for pi=1,#args do
        local redutype,redulist,weight
        if pi <= #parameters then
            redutype = parameters[ pi].type
            redulist,weight = tryGetWeightedParameterReductionList(
                        node, redutype, args[ pi],
                        tagmask_matchParameter, tagmask_typeConversion)
        else
            redutype = getVarargArgumentType( args[ pi].type)
            if redutype then
                redulist,weight = tryGetWeightedParameterReductionList(
                            node, redutype, args[ pi],
                            tagmask_pushVararg, tagmask_typeConversion)
            end
        end
        if not weight then return nil end
        if rt.weight < weight then -- use max(a,b) as weight accumulation function
            rt.weight = weight
        end
        table.insert( rt.redulist, redulist)
        local descr = typeDescriptionMap[ redutype]
        table.insert( rt.llvmtypes, descr.llvmtype)
    end
    return rt
end
-- Select the candidate items with the highest weight not exceeding maxweight
function selectCandidateItemsBestWeight( items, item_parameter_map, maxweight)
    local candidates,bestweight = {},nil
    for ii,item in ipairs(items) do
        if item_parameter_map[ ii] then -- we have a match
            local weight = item_parameter_map[ ii].weight
            if not maxweight or maxweight > weight + weightEpsilon then
                -- we have a candidate not looked at yet
                if not bestweight then
                    candidates = {ii}
                    bestweight = weight
                elseif weight < bestweight + weightEpsilon then
                    if weight >= bestweight - weightEpsilon then
                        -- they are equal
                        table.insert( candidates, ii)
                    else
                        -- the new candidate is the single best match
                        candidates = {ii}
                        bestweight = weight
                    end
                end
            end
        end
    end
    return candidates,bestweight
end
-- Get the best matching item from a list of items by weighting the matching of
--   the arguments to the item parameter types
function selectItemsMatchParameters( node, items, args, this_constructor)
    local item_parameter_map = {}
    local bestmatch = {}
    local candidates = {}
    local bestweight = nil
    local bestgroup = 0
    for ii,item in ipairs(items) do
        local nof_params = typedb:type_nof_parameters( item)
        if nof_params == #args then
            local param = typedb:type_parameters( item)
            item_parameter_map[ ii] = collectItemParameter( node,item,args,param)
        end
    end
    while next(item_parameter_map) ~= nil do -- until no candidate groups left
        candidates,bestweight
            = selectCandidateItemsBestWeight( items,item_parameter_map,bestweight)
        for _,ii in ipairs(candidates) do
            local item_constructor = typedb:type_constructor( items[ ii])
            local call_constructor
            if not item_constructor and #args == 0 then
                call_constructor = this_constructor
            else
                local ac,success = nil,true
                local arg_constructors = {}
                for ai=1,#args do
                    ac,success = tryApplyReductionList( node,
                            item_parameter_map[ ii].redulist[ ai],
                            args[ ai].constructor)
                    if not success then break end
                    table.insert( arg_constructors, ac)
                end
                if success then
                    call_constructor
                        = item_constructor( this_constructor, arg_constructors,
                                        item_parameter_map[ ii].llvmtypes)
                end
            end
            if call_constructor then
                local match = {type=items[ ii], constructor=call_constructor}
                table.insert( bestmatch, match)
            end
            item_parameter_map[ ii] = nil
        end
        if #bestmatch ~= 0 then return bestmatch,bestweight end
    end
end
-- Find a callable identified by name and its arguments (parameter matching)
--   in the context of a type (this)
function findCallable( node, this, callable, args)
    local resolveContextType,reductions,items
        = typedb:resolve_type( this.type, callable, tagmask_resolveType)
    if not resolveContextType then return nil end
    if type(resolveContextType) == "table" then
        io.stderr:write( "TRACE typedb:resolve_type\n"
                .. utils.getResolveTypeTrace( typedb, this.type, callable,
                        tagmask_resolveType) .. "\n")
        utils.errorResolveType( typedb, node.line,
                    resolveContextType, this.type, callable)
    end
    local this_constructor
        = applyReductionList( node, reductions, this.constructor)
    local bestmatch,bestweight
        = selectItemsMatchParameters( node, items, args or {}, this_constructor)
    return bestmatch,bestweight
end
-- Filter best result and report error on ambiguity
function getCallableBestMatch( node, bestmatch, bestweight, this, callable, args)
    if #bestmatch == 1 then
        return bestmatch[1]
    elseif #bestmatch == 0 then
        return nil
    else
        local altmatchstr = ""
        for ii,bm in ipairs(bestmatch) do
            if altmatchstr ~= "" then
                altmatchstr = altmatchstr .. ", "
            end
            altmatchstr = altmatchstr .. typedb:type_string(bm.type)
        end
        utils.errorMessage( node.line,
                    "Ambiguous matches resolving callable '%s', candidates: %s",
                    typeDeclarationString( this, callable, args), altmatchstr)
    end
end
-- Retrieve and apply a callable in a specified context
function applyCallable( node, this, callable, args)
    local bestmatch,bestweight = findCallable( node, this, callable, args)
    if not bestweight then
        io.stderr:write( "TRACE typedb:resolve_type\n"
                .. utils.getResolveTypeTrace( typedb, this.type, callable,
                        tagmask_resolveType) .. "\n")
        utils.errorMessage( node.line, "Failed to find callable '%s'",
                    typeDeclarationString( this, callable, args))
    end
    local rt = getCallableBestMatch( node, bestmatch, bestweight,
                        this, callable, args)
    if not rt then
        utils.errorMessage( node.line, "Failed to match params to callable '%s'",
                    typeDeclarationString( this, callable, args))
    end
    return rt
end
-- Try to retrieve a callable in a specified context, apply it and return its
--   type/constructor pair, return nil if not found
function tryApplyCallable( node, this, callable, args)
    local bestmatch,bestweight = findCallable( node, this, callable, args)
    if bestweight then
        return getCallableBestMatch( node, bestmatch, bestweight)
    end
end

构造函数

这里是定义构造函数的构建块。

  • 函数 constructorPartsconstructorStructtypeConstructorPairStruct 是用于构造函数或类型/构造函数对结构的辅助函数。
  • 函数 callConstructor 从格式字符串和其他属性构建一个构造函数。本示例编译器的大多数构造函数都是用 callConstructor 定义的。它使用辅助函数 buildCallArguments 来构建一些格式字符串变量,以映射构造函数调用的参数。
-- Get the two parts of a constructor as tuple
function constructorParts( constructor)
    if type(constructor) == "table" then
        return constructor.out,(constructor.code or "")
    else
        return tostring(constructor),""
    end
end
-- Get a constructor structure
function constructorStruct( out, code)
    return {out=out, code=code or ""}
end
-- Create a type/constructor pair as used by most functions constructing a type
function typeConstructorPairStruct( type, out, code)
    return {type=type, constructor=constructorStruct( out, code)}
end
-- Builds the argument string and the argument build-up code for a function call
--   or interface method call constructors
function buildCallArguments( subst, thisTypeId, thisConstructor, args, types)
    local this_inp,code = constructorParts(thisConstructor or "%UNDEFINED")
    local callargstr = ""
    if types and thisTypeId and thisTypeId ~= 0 then
        callargstr = typeDescriptionMap[ thisTypeId].llvmtype
                    .. " " .. this_inp .. ", "
    end
    if args then
        for ii=1,#args do
            local arg = args[ ii]
            local arg_inp,arg_code = constructorParts( arg)
            subst[ "arg" .. ii] = arg_inp
            if types then
                local llvmtype = types[ ii].llvmtype
                if not llvmtype then
                    utils.errorMessage( 0, "Parameter has no LLVM type specified")
                end
                callargstr = callargstr .. llvmtype
                        .. " " .. tostring(arg_inp) .. ", "
            end
            code = code .. arg_code
        end
    end
    if types or thisTypeId ~= 0 then callargstr = callargstr:sub(1, -3) end
    subst.callargstr = callargstr
    subst.this = this_inp
    return code
end
-- Constructor of a call with an self argument and some additional arguments
function callConstructor( fmt, thisTypeId, argTypeIds, vartable)
    return function( this_constructor, arg_constructors)
        env = getCallableEnvironment()
        local out = env.register()
        local subst = utils.deepCopyStruct( vartable) or {}
        subst.out = out
        local code = buildCallArguments(
                    subst, thisTypeId or 0,
                    this_constructor, arg_constructors, argTypeIds)
        code = code .. utils.constructor_format( fmt, subst, env.register)
        return {code=code,out=out}
    end
end

定义函数调用

本节介绍的函数 defineFunctionCall 用于将自由函数调用或方法调用定义为可调用对象上的带参数的 "()" 运算符。可调用对象是一种中间类型,创建它是为了将所有函数调用的情况统一为一种情况。例如,函数变量也可以用圆括号 "(" ")" 中的参数来调用。如果我们想统一这些情况,我们不能将函数定义为一个带参数的符号,因为符号绑定对于函数变量是不可用的。因此,我们将函数符号定义为 callable 类型,其 "()" 运算符带有函数参数作为参数。函数变量可以引用 callable 作为其类型,并且也可以有一个带函数参数作为参数的 "()" 运算符。

-- Get the parameter string of a function declaration
function getDeclarationLlvmTypeRegParameterString( descr, context)
    local rt = ""
    if context.domain == "member" then
        rt = rt .. typeDescriptionMap[ context.decltype].llvmtype .. "* %ths, "
    end
    for ai,arg in ipairs(descr.param or {}) do
        rt = rt .. arg.llvmtype .. " " .. arg.reg .. ", "
    end
    if rt ~= "" then rt = rt:sub(1, -3) end
    return rt
end

-- Get the parameter string of a function declaration
function getDeclarationLlvmTypedefParameterString( descr, context)
    local rt = ""
    if context.domain == "member" then
        rt = rt .. (descr.llvmthis or context.descr.llvmtype) .. "*, "
    end
    for ai,arg in ipairs(descr.param) do
        rt = rt .. arg.llvmtype .. ", "
    end
    if rt ~= "" then rt = rt:sub(1, -3) end
    return rt
end

-- Get (if already defined) or create the callable context type (function name)
--   on which the "()" operator implements the function call
function getOrCreateCallableContextTypeId( contextTypeId, name, descr)
    local rt = typedb:get_type( contextTypeId, name)
    if not rt then
        rt = typedb:def_type( contextTypeId, name)
        typeDescriptionMap[ rt] = descr
    end
    return rt
end

-- Define a direct function call: class method call, free function call
function defineFunctionCall( node, descr, context)
    descr.argstr = getDeclarationLlvmTypedefParameterString( descr, context)
    descr.llvmtype = utils.template_format( llvmir.control.functionCallType, descr)
    local contextType = getDeclarationContextTypeId(context)
    local thisType = (contextType ~= 0) and referenceTypeMap[ contextType] or 0
    local callablectx = getOrCreateCallableContextTypeId(
                 thisType, descr.name, llvmir.callableDescr)
    local fmt = descr.ret
            and llvmir.control.functionCall
            or llvmir.control.procedureCall
    local constructor = callConstructor( fmt, thisType, descr.param, descr)
    return defineCall( descr.ret, callablectx, "()", descr.param, constructor)
end

常量表达式类型

源代码中的常量会触发常量表达式类型的创建。常量表达式类型有其专有的运算符实现。常量表达式运算符的构造函数不生成代码,而是返回一个值作为结果。常量表达式的构造函数是该类型数据项的值。

-- Constant expression scalar types and string type represented as Lua values
constexprIntegerType = typedb:def_type( 0, "constexpr int")
constexprDoubleType = typedb:def_type( 0, "constexpr double")
constexprBooleanType = typedb:def_type( 0, "constexpr bool")
constexprStringType = typedb:def_type( 0, "constexpr string")

-- structure initializer list type for structure declarations in the source
constexprStructureType = typedb:def_type( 0, "constexpr struct")

-- Void type handled as no type:
voidType = typedb:def_type( 0, "void")
-- String constant, this example language knows strings only as constants:
stringType = defineDataType( {line=0}, 0, "string", llvmir.string)

scalarTypeMap.void = voidType
scalarTypeMap.string = stringType

typeDescriptionMap[ constexprIntegerType] = llvmir.constexprIntegerDescr
typeDescriptionMap[ constexprDoubleType] = llvmir.constexprDoubleDescr
typeDescriptionMap[ constexprBooleanType] = llvmir.constexprBooleanDescr
typeDescriptionMap[ constexprStructureType] = llvmir.constexprStructDescr

function isScalarConstExprValueType( initType)
    return initType >= constexprIntegerType and initType <= stringType
end

scalarTypeMap["constexpr int"] = constexprIntegerType
scalarTypeMap["constexpr double"] = constexprDoubleType
scalarTypeMap["constexpr bool"] = constexprBooleanType
scalarTypeMap["constexpr struct"] = constexprStructureType
scalarTypeMap["constexpr string"] = constexprStringType

-- Create a constexpr node from a lexem in the AST
function createConstexprValue( typeId, value)
    if typeId == constexprBooleanType then
        if value == "true" then return true else return false end
    elseif typeId == constexprIntegerType or typeId == constexprDoubleType then
        return tonumber(value)
    elseif typeId == constexprStringType then
        if not stringConstantMap[ value] then
            local encval,enclen = utils.encodeLexemLlvm(value)
            local name = utils.uniqueName( "string")
            stringConstantMap[ value] = {size=enclen+1,name=name}
            local subst = {name=name, size=enclen+1, value=encval}
            local fmt = llvmir.control.stringConstDeclaration
            print_section( "Constants",
                utils.constructor_format( fmt, subst) .. "\n")
        end
        return stringConstantMap[ value]
    end
end
-- List of value constructors from constexpr constructors
function constexprDoubleToDoubleConstructor( val)
    return constructorStruct( "0x" .. mewa.llvm_double_tohex( val))
end
function constexprDoubleToIntegerConstructor( val)
    return constructorStruct( tostring(tointeger(val)))
end
function constexprDoubleToBooleanConstructor( val)
    return constructorStruct( math.abs(val) < epsilon and "0" or "1")
end
function constexprIntegerToDoubleConstructor( val)
    return constructorStruct( "0x" .. mewa.llvm_double_tohex( val))
end
function constexprIntegerToIntegerConstructor( val)
    return constructorStruct( tostring(val))
end
function constexprIntegerToBooleanConstructor( val)
    return constructorStruct( val == "0" and "0" or "1")
end
function constexprBooleanToScalarConstructor( val)
    return constructorStruct( val == true and "1" or "0")
end

function defineBinopConstexpr( typ, opr, constructor)
    defineCall( typ, typ, opr, {typ}, constructor)
end
function defineBinopConstexprPromote( typ, promotetyp, opr, promote)
    definePromoteCall( promotetyp, typ, promotetyp, opr, {promotetyp}, promote)
end
-- Define arithmetics of constant expressions
function defineConstExprArithmetics()
    defineCall( constexprIntegerType, constexprIntegerType, "-", {},
                function(this) return -this end)
    defineCall( constexprDoubleType, constexprDoubleType, "-", {},
                function(this) return -this end)
    do
        local typ = constexprIntegerType
        defineBinopConstexpr( typ, "+", function(t,a) return t+a[1] end)
        defineBinopConstexpr( typ, "-", function(t,a) return t-a[1] end)
        defineBinopConstexpr( typ, "*", function(t,a) return tointeger(t*a[1]) end)
        defineBinopConstexpr( typ, "/", function(t,a) return tointeger(t/a[1]) end)
    end
    do
        local typ = constexprDoubleType
        defineBinopConstexpr( typ, "+", function(t,a) return t+a[1] end)
        defineBinopConstexpr( typ, "-", function(t,a) return t-a[1] end)
        defineBinopConstexpr( typ, "*", function(t,a) return t*a[1] end)
        defineBinopConstexpr( typ, "/", function(t,a) return t/a[1] end)
    end
    -- Define arithmetic operators of integers with a double as promotion of the
    --   left-hand integer to a double and an operator of a double with a double:
    for _,opr in ipairs({"+","-","*","/"}) do
        defineBinopConstexprPromote( constexprIntegerType, constexprDoubleType, opr,
                    function(this) return this end)
    end
end
-- Define the type conversions of const expressions to other const expression types
function defineConstExprTypeConversions()
    typedb:def_reduction( constexprBooleanType, constexprIntegerType,
            function( value) return math.abs(value) > epsilon end,
            tag_typeConversion, rdw_constexpr_bool)
    typedb:def_reduction( constexprBooleanType, constexprDoubleType,
            function( value) return math.abs(value) > epsilon end,
            tag_typeConversion, rdw_constexpr_bool)
    typedb:def_reduction( constexprDoubleType, constexprIntegerType,
            function( value) return value end,
            tag_typeConversion, rdw_constexpr_float)
    typedb:def_reduction( constexprIntegerType, constexprDoubleType,
            function( value) return math.floor(value) end,
            tag_typeConversion, rdw_constexpr_float)
    typedb:def_reduction( constexprIntegerType, constexprBooleanType,
            function( value) return value and 1 or 0 end,
            tag_typeConversion, rdw_constexpr_bool)
end
备注

函数 mewa.llvm_double_tohexmewa.llvm_float_tohexMewa 库提供的函数,用于将浮点数常量转换为 LLVM IR 所需的二进制表示形式,适用于那些没有精确二进制表示形式的十进制浮点数,例如 0.1。这种情况并不理想。Mewa 库不应该有任何与 LLVM 的 API 绑定。

一等标量类型

本节实现了根据模块 llvmir_scalar.lua 中的描述创建一等标量类型。字符串类型也在这里定义。与常量表达式类型一样,我们在这里也定义了提升调用,例如 integer + double 实现为在提升左操作数后进行 double + double 运算。

-- Define built-in promote calls for first class citizen scalar types
function defineBuiltInTypePromoteCalls( typnam, descr)
    local typeId = scalarTypeMap[ typnam]
    for i,promote_typnam in ipairs( descr.promote) do
        local promote_typeId = scalarTypeMap[ promote_typnam]
        local promote_descr = typeDescriptionMap[ promote_typeId]
        local promote_conv_fmt = promote_descr.conv[ typnam].fmt
        local promote_conv = promote_conv_fmt and callConstructor( promote_conv_fmt)
        if promote_descr.binop then
            for operator,operator_fmt in pairs( promote_descr.binop) do
                definePromoteCall(
                    promote_typeId, typeId, promote_typeId,
                    operator, {promote_typeId}, promote_conv)
            end
        end
        if promote_descr.cmpop then
            for operator,operator_fmt in pairs( promote_descr.cmpop) do
                definePromoteCall(
                    scalarBooleanType, typeId, promote_typeId,
                    operator, {promote_typeId}, promote_conv)
            end
        end
    end
end
-- Define the promote calls of a const expression for a binary scalar operator
function defineBinopConstexprPromoteCalls( resultType, operator, argtype, descr)
    if descr.llvmtype == "double" then
        definePromoteCall( resultType, constexprDoubleType, argtype,
                    operator, {argtype}, constexprDoubleToDoubleConstructor)
        definePromoteCall( resultType, constexprIntegerType, argtype,
                    operator, {argtype}, constexprIntegerToDoubleConstructor)
        definePromoteCall( resultType, constexprBooleanType, argtype,
                    operator, {argtype}, constexprBooleanToScalarConstructor)
    elseif descr.class == "bool" then
        definePromoteCall( resultType, constexprBooleanType, argtype,
                    operator, {argtype}, constexprBooleanToScalarConstructor)
    elseif descr.class == "signed" then
        definePromoteCall( resultType, constexprIntegerType, argtype,
                    operator, {argtype}, constexprIntegerToIntegerConstructor)
        definePromoteCall( resultType, constexprBooleanType, argtype,
                    operator, {argtype}, constexprBooleanToScalarConstructor)
    end
end

-- Helper functions to define binary operators of first class scalar types
function defineBuiltInBinaryOperators( typnam, descr, operators, resultTypeId)
    for opr,fmt in pairs(operators) do
        local typeId = scalarTypeMap[ typnam]
        local descr = typeDescriptionMap[ typeId]
        defineCall( resultTypeId, typeId, opr, {typeId}, callConstructor( fmt))
        defineBinopConstexprPromoteCalls( resultTypeId, opr, typeId, descr)
    end
end
-- Helper functions to define binary operators of first class scalar types
function defineBuiltInUnaryOperators( typnam, descr, operators, resultTypeId)
    for opr,fmt in ipairs(operators) do
        local typeId = scalarTypeMap[ typnam]
        local descr = typeDescriptionMap[ typeId]
        defineCall( resultTypeId, typeId, opr, {}, callConstructor( fmt))
        defineBinopConstexprPromoteCalls( resultTypeId, opr, typeId, descr)
    end
end
-- Constructor of a string pointer from a string definition
function stringPointerConstructor( stringdef)
    local env = getCallableEnvironment()
    local out = env.register()
    local subst = {size=stringdef.size,name=stringdef.name,out=out}
    local fmt = llvmir.control.stringConstConstructor
    local code = utils.template_format( fmt, subst)
    return constructorStruct( out, code)
end
-- Initialize all built-in types
function initBuiltInTypes()
    -- Define the first class scalar types
    for typnam, descr in pairs( llvmir.scalar) do
        local typeId = defineDataType( {line=0}, 0, typnam, descr)
        if typnam == "int" then
            scalarIntegerType = typeId
            typedb:def_reduction( typeId, constexprIntegerType,
                        constexprIntegerToIntegerConstructor, tag_typeInstantiation)
        elseif typnam == "double" then
            scalarDoubleType = typeId
            typedb:def_reduction( typeId, constexprDoubleType,
                        constexprDoubleToDoubleConstructor, tag_typeInstantiation)
        elseif typnam == "bool" then
            scalarBooleanType = typeId
            typedb:def_reduction( typeId, constexprBooleanType,
                        constexprBooleanToScalarConstructor, tag_typeInstantiation)
        end
        scalarTypeMap[ typnam] = typeId
    end
    -- Define the conversions between built-in types
    for typnam, descr in pairs( llvmir.scalar) do
        local typeId = scalarTypeMap[ typnam]
        for typnam_conv,conv in pairs(descr.conv) do
            local typeId_conv = scalarTypeMap[ typnam_conv]
            typedb:def_reduction( typeId, typeId_conv, callConstructor( conv.fmt),
                         tag_typeConversion, conv.weight)
        end
    end
    -- Define operators
    for typnam, descr in pairs( llvmir.scalar) do
        local typeId = scalarTypeMap[ typnam]
        defineBuiltInBinaryOperators( typnam, descr, descr.binop, typeId)
        defineBuiltInBinaryOperators( typnam, descr, descr.cmpop, scalarBooleanType)
        defineBuiltInUnaryOperators( typnam, descr, descr.unop, typeId)
        defineCall( voidType, referenceTypeMap[ typeId], "=", {typeId},
                callConstructor( descr.assign))
        defineCall( voidType, referenceTypeMap[ typeId], "=", {},
                    callConstructor( descr.assign, 0, nil,
                    {arg1=descr.default}))
    end
    -- Define operators with promoting of the left side argument
    for typnam, descr in pairs( llvmir.scalar) do
        defineBuiltInTypePromoteCalls( typnam, descr)
    end
    -- String type
    local string_descr = typeDescriptionMap[ stringType]
    local string_refType = referenceTypeMap[ stringType]
    typedb:def_reduction( stringType, string_refType,
                    callConstructor( string_descr.load), tag_typeInstantiation)
    typedb:def_reduction( stringType, constexprStringType,
                    stringPointerConstructor, tag_typeInstantiation)
    defineCall( voidType, string_refType, "=", {stringType},
                    callConstructor( string_descr.assign))
    defineCall( voidType, string_refType, "=", {},
                    callConstructor( string_descr.assign, 0, nil,
                    {arg1=string_descr.default}))
end

变量

这里是根据上下文声明各种变量的函数:局部变量、全局变量、成员变量、函数参数等。现在大部分代码都是自解释的了。

一个新事物是在定义局部变量时使用全局变量 localDefinitionContext 作为上下文类型。我们也可以使用 0 来代替该变量。在实现泛型的语言中,localDefinitionContext 的值可能会变化,因为泛型实例共享作用域。泛型实例必须为局部定义上下文定义一个人工类型,以将其局部定义与其他定义分开。

-- Define a member variable of a class or a structure
function defineMemberVariable(
            node, descr, context, typeId, refTypeId, name)
    local memberpos = context.structsize or 0
    local index = #context.members
    local subst = {index=index, type=descr.llvmtype}
    local load_ref = utils.template_format( context.descr.loadelemref, subst)
    local load_val = utils.template_format( context.descr.loadelem, subst)

    while math.fmod( memberpos, descr.align) ~= 0 do
        memberpos = memberpos + 1
    end
    context.structsize = memberpos + descr.size
    local member = { type=typeId, name=name, descr=descr, bytepos=memberpos }
    table.insert( context.members, member)
    if not context.llvmtype then
        context.llvmtype = descr.llvmtype
    else
        context.llvmtype = context.llvmtype  .. ", " .. descr.llvmtype
    end
    defineCall( refTypeId, referenceTypeMap[ context.decltype], name, {},
            callConstructor( load_ref))
    defineCall( typeId, context.decltype, name, {},
            callConstructor( load_val))
end
-- Define a free global variable
function defineGlobalVariable(
            node, descr, context, typeId, refTypeId, name, initVal)
    local out = "@" .. name
    local var = typedb:def_type( 0, name, out)
    if var == -1 then
        utils.errorMessage( node.line, "Duplicate definition of '%s'", name)
    end
    typedb:def_reduction( refTypeId, var, nil, tag_typeDeclaration)
    if initVal
    and descr.scalar == true
    and isScalarConstExprValueType( initVal.type) then
        local initScalarConst
            = constructorParts(
                getRequiredTypeConstructor(
                    node, typeId, initVal,
                    tagmask_matchParameter, tagmask_typeConversion))
        local subst = {out = out, val = initScalarConst}
        print( utils.constructor_format( descr.def_global_val, subst))
    else
        utils.errorMessage( node.line, "Only const scalars as init of globals")
    end
end
-- Define a free local variable
function defineLocalVariable(
            node, descr, context, typeId, refTypeId, name, initVal)
    local env = getCallableEnvironment()
    local out = env.register()
    local code = utils.constructor_format( descr.def_local, {out=out}, env.register)
    local var = typedb:def_type( localDefinitionContext,name,constructorStruct(out))
    if var == -1 then
        utils.errorMessage( node.line, "Duplicate definition of '%s'", name)
    end
    typedb:def_reduction( refTypeId, var, nil, tag_typeDeclaration)
    local decl = {type=var, constructor={code=code,out=out}}
    if type(initVal) == "function" then initVal = initVal() end
    return applyCallable( node, decl, "=", {initVal})
end
-- Define a variable (what kind is depending on the context)
function defineVariable( node, context, typeId, name, initVal)
    local descr = typeDescriptionMap[ typeId]
    local refTypeId = referenceTypeMap[ typeId]
    if not refTypeId then
        utils.errorMessage( node.line, "Reference in variable declarations,: %s",
                           typedb:type_string(typeId))
    end
    if context.domain == "local" then
        return defineLocalVariable( node, descr, context,
                        typeId, refTypeId, name, initVal)
    elseif context.domain == "global" then
        return defineGlobalVariable( node, descr, context,
                        typeId, refTypeId, name, initVal)
    elseif context.domain == "member" then
        if initVal then
            utils.errorMessage( node.line, "No init value for member allowed")
        end
        defineMemberVariable( node, descr, context, typeId, refTypeId, name)
    else
        utils.errorMessage( node.line, "Context domain undefined (%s)",
                            mewa.tostring(context))
    end
end
-- Declare a variable implicitly that does not appear as definition in source
function defineImplicitVariable( node, typeId, name, reg)
    local var = typedb:def_type( localDefinitionContext, name, reg)
    if var == -1 then
        local declstr = typeDeclarationString( typeId, name)
        utils.errorMessage( node.line, "Duplicate type '%s'", declstr)
    end
    typedb:def_reduction( typeId, var, nil, tag_typeDeclaration)
    return var
end
-- Make a function parameter addressable by name in the callable body
function defineParameter( node, typeId, name)
    local env = getCallableEnvironment()
    local paramreg = env.register()
    local var = typedb:def_type( localDefinitionContext, name, paramreg)
    if var == -1 then
        utils.errorMessage( node.line, "Duplicate definition of parameter '%s'",
                typeDeclarationString( localDefinitionContext, name))
    end
    local descr = typeDescriptionMap[ typeId]
    local ptype = (descr.scalar or descr.class == "pointer")
                and typeId
                or referenceTypeMap[ typeId]
    if not ptype then
        utils.errorMessage( node.line, "Cannot use type '%s' as parameter type",
                    typedb:type_string(typeId))
    end
    typedb:def_reduction( ptype, var, nil, tag_typeDeclaration)
    return {type=ptype, llvmtype=typeDescriptionMap[ ptype].llvmtype, reg=paramreg}
end
-- Resolve a variable by name and return its type/constructor structure
function getVariable( node, varname)
    local env = getCallableEnvironment()
    local seekctx = getSeekContextTypes()
    local resolveContextTypeId, reductions, items
            = typedb:resolve_type( seekctx, varname, tagmask_resolveType)
    local typeId,constructor = selectNoArgumentType( node, seekctx, varname,
                    tagmask_resolveType, resolveContextTypeId, reductions, items)
    local variableScope = typedb:type_scope( typeId)
    if variableScope[1] == 0 or env.scope[1] <= variableScope[1] then
        -- the local variable is not belonging to another function
        return {type=typeId, constructor=constructor}
    else
        utils.errorMessage( node.line, "Not allowed to access variable '%s' not "
                        .. "defined in current function or global scope",
                        typedb:type_string(typeId))
    end
end

定义数据类型

本节提供了声明示例语言数据类型的函数,包括结构体和数组。示例语言的数据类型相当简单,因为在失败时不需要进行清理。

类类型和数组类型的逐元素初始化器使用函数 applyCallable 进行成员初始化。这为子结构的赋值提供了递归。

getOrCreateArrayType 函数是一个手动设置当前作用域的罕见情况。数组类型定义的作用域被设置为元素类型的作用域。函数完成其工作后,当前作用域被恢复为原始值。

-- Define a data type with all its qualifiers
function defineDataType( node, contextTypeId, typnam, descr)
    local typeId = typedb:def_type( contextTypeId, typnam)
    local refTypeId = typedb:def_type( contextTypeId, typnam .. "&")
    if typeId <= 0 or refTypeId <= 0 then
        utils.errorMessage( node.line, "Duplicate definition of '%s'", typnam)
    end
    referenceTypeMap[ typeId] = refTypeId
    dereferenceTypeMap[ refTypeId] = typeId
    typeDescriptionMap[ typeId] = descr
    typeDescriptionMap[ refTypeId] = llvmir.pointerDescr(descr)
    typedb:def_reduction( typeId, refTypeId, callConstructor( descr.load),
                    tag_typeDeduction, rdw_load)
    return typeId
end
-- Structure type definition for a class
function defineStructureType( node, declContextTypeId, typnam, fmt)
    local descr = utils.template_format( fmt, {symbol=typnam})
    local typeId = defineDataType( node, declContextTypeId, typnam, descr)
    return typeId,descr
end
-- Define the assignment operator of a class as memberwise assignment
function defineClassStructureAssignmentOperator( node, typeId)
    local descr = typeDescriptionMap[ typeId]
    local function assignElementsConstructor( this, args)
        local env = getCallableEnvironment()
        if args and #args ~= 0 and #args ~= #descr.members then
            utils.errorMessage( node.line, "Nof elements %d != members %d in '%s'",
                        #args, #descr.members, typedb:type_string( typeId))
        end
        local this_inp,code = constructorParts( this)
        for mi,member in ipairs(descr.members) do
            local out = env.register()
            local loadref = descr.loadelemref
            local llvmtype = member.descr.llvmtype
            local member_reftype = referenceTypeMap[ member.type]
            local ths = {type=member_reftype,constructor=constructorStruct(out)}
            local member_element
                       = applyCallable( node, ths, "=", {args and args[mi]})
            local subst = {out=out,this=this_inp,index=mi-1, type=llvmtype}
            code = code
                .. utils.constructor_format(loadref,subst)
                .. member_element.constructor.code
        end
        return {code=code}
    end
    local function assignStructTypeConstructor( this, args)
        return assignElementsConstructor( this, args[1].list)
    end
    defineCall( nil, referenceTypeMap[ typeId], "=", {constexprStructureType},
                assignStructTypeConstructor)
    defineCall( nil, referenceTypeMap[ typeId], "=", {},
                assignElementsConstructor)
end
-- Define the index access operator for arrays
function defineArrayIndexOperator( elemTypeId, arTypeId, arDescr)
    defineCall( referenceTypeMap[elemTypeId], referenceTypeMap[arTypeId], "[]",
                {scalarIntegerType}, callConstructor( arDescr.index[ "int"]))
end
-- Constructor for a memberwise assignment of an array from an initializer-list
function memberwiseInitArrayConstructor(
            node, thisType, elementType, nofElements)
    return function( this, args)
        if #args > nofElements then
            utils.errorMessage( node.line, "Nof elements %d > %d for init '%s'",
                        #args, nofElements, typedb:type_string( thisType))
        end
        local descr = typeDescriptionMap[ thisType]
        local descr_element = typeDescriptionMap[ elementType]
        local elementRefTypeId = referenceTypeMap[ elementType] or elementType
        local env = getCallableEnvironment()
        local this_inp,code = constructorParts( this)
        for ai=1,nofElements do
            local arg = (ai <= nofElements) and args[ai] or nil
            local elemreg = env.register()
            local elem = typeConstructorPairStruct( elementRefTypeId, elemreg)
            local init = tryApplyCallable( node, elem, "=", {arg})
            if not init then
                utils.errorMessage( node.line, "Failed to find ctor '%s'",
                            typeDeclarationString( elem, "=", {arg}))
            end
            local subst = {index=ai-1,this=this_inp,out=elemreg}
            local fmt = descr.memberwise_index
            local memberwise_next
                    = utils.constructor_format( fmt, subst, env.register)
            code = code .. memberwise_next .. init.constructor.code
        end
        return {out=this_inp, code=code}
    end
end
-- Constructor for an assignment of a structure (initializer list) to an array
function arrayStructureAssignmentConstructor(
        node, thisType, elementType, nofElements)
    local initfunction
        = memberwiseInitArrayConstructor( node, thisType, elementType, nofElements)
    return function( this, args)
        return initfunction( this, args[1].list)
    end
end
-- Implicit on demand type definition for array
function getOrCreateArrayType( node, elementType, arraySize)
    local arrayKey = string.format( "%d[%d]", elementType, arraySize)
    if not arrayTypeMap[ arrayKey] then
        local scope_bk,step_bk = typedb:scope( typedb:type_scope( elementType))
        local typnam = string.format( "[%d]", arraySize)
        local elementDescr = typeDescriptionMap[ elementType]
        local arrayDescr = llvmir.arrayDescr( elementDescr, arraySize)
        local arrayType = defineDataType( node, elementType, typnam, arrayDescr)
        local arrayRefType = referenceTypeMap[ arrayType]
        arrayTypeMap[ arrayKey] = arrayType
        defineArrayIndexOperator( elementType, arrayType, arrayDescr)
        local constructor = arrayStructureAssignmentConstructor(
                        node, arrayType, elementType, arraySize)
        defineCall( voidType, arrayRefType, "=",
                        {constexprStructureType}, constructor)
        typedb:scope( scope_bk,step_bk)
    end
    return arrayTypeMap[ arrayKey]
end

上下文类型

用于解析类型的上下文类型列表已经解释过了。函数 getDeclarationContextTypeId 提供了根据上下文访问用于声明的上下文类型的途径。函数 getSeekContextTypes 提供了访问用于 typedb:resolve_type 的列表上下文类型的途径。

查找上下文类型列表也依赖于作用域。当向一个尚未绑定到当前作用域的列表添加元素时,会通过调用 thisInstanceTableClone( name, emptyInst) 克隆外围作用域的列表,并将元素添加到克隆的列表中。当将 self 类型添加到上下文类型列表时,这一点就会发挥作用。它只在方法体内可见。

-- Get the context type for type declarations
function getDeclarationContextTypeId( context)
    if context.domain == "local" then return localDefinitionContext
    elseif context.domain == "member" then return context.decltype
    elseif context.domain == "global" then return 0
    end
end
-- Get an object instance and clone it if it is not stored in the current scope,
--   making it possible to add elements to an inherited instance
function thisInstanceTableClone( name, emptyInst)
    local inst = typedb:this_instance( name)
    if not inst then
        inst = utils.deepCopyStruct( typedb:get_instance( name) or emptyInst)
        typedb:set_instance( name, inst)
    end
    return inst
end
-- Get the list of seek context types associated with the current scope
function getSeekContextTypes()
    return typedb:get_instance( seekContextKey) or {0}
end
-- Push an element to the current seek context type list
function pushSeekContextType( val)
    table.insert( thisInstanceTableClone( seekContextKey, {0}), val)
end
-- Remove the last element of the the list of seek context types
function popSeekContextType( val)
    local seekctx = typedb:this_instance( seekContextKey)
    if not seekctx or seekctx[ #seekctx] ~= val then
        utils.errorMessage( 0, "Corrupt definition context stack")
    end
    table.remove( seekctx, #seekctx)
end

可调用环境

绑定到函数体作用域的可调用环境也已经讨论过了。

需要提及的是函数 expandMethodEnvironment,它在方法体中声明了 self 变量(显式和隐式)。

-- Create a callable environent object
function createCallableEnvironment( node, name, rtype, rprefix, lprefix)
    if rtype then
        local descr = typeDescriptionMap[ rtype]
        if not descr.scalar and not descr.class == "pointer" then
            utils.errorMessage( node.line, "Only scalar types can be returned")
        end
    end
    return {
        name=name, line=node.line, scope=typedb:scope(),
        register=utils.register_allocator(rprefix),
        label=utils.label_allocator(lprefix), returntype=rtype
    }
end
-- Attach a newly created data structure for a callable to its scope
function defineCallableEnvironment( node, name, rtype)
    local env = createCallableEnvironment( node, name, rtype)
    if typedb:this_instance( callableEnvKey) then
        utils.errorMessage( node.line, "Callable env defined twice: %s %s",
                    name, mewa.tostring({(typedb:scope())}))
    end
    typedb:set_instance( callableEnvKey, env)
    return env
end
-- Get the active callable instance
function getCallableEnvironment()
    return typedb:get_instance( callableEnvKey)
end
-- Set some variables needed in a class method implementation body
function expandMethodEnvironment( node, context, descr, env)
    local selfTypeId = referenceTypeMap[ context.decltype]
    local classvar = defineImplicitVariable( node, selfTypeId, "self", "%ths")
    pushSeekContextType( {type=classvar, constructor={out="%ths"}})
end

控制布尔类型

这里我们有在教程中介绍的控制布尔类型的完整定义。函数定义是一对一地从示例 language1 中取过来的。我想在我们完成了这个例子的学习之后,代码本身就能解释自己了。

-- Initialize control boolean types used for implementing control structures
function initControlBooleanTypes()
    controlTrueType = typedb:def_type( 0, " controlTrueType")
    controlFalseType = typedb:def_type( 0, " controlFalseType")

    local function controlTrueTypeToBoolean( constructor)
        local env = getCallableEnvironment()
        local out = env.register()
        local subst = {falseExit=constructor.out,out=out}
        local fmt = llvmir.control.controlTrueTypeToBoolean
        return {code = constructor.code
                .. utils.constructor_format(fmt,subst,env.label),out=out}
    end
    local function controlFalseTypeToBoolean( constructor)
        local env = getCallableEnvironment()
        local out = env.register()
        local subst = {trueExit=constructor.out,out=out}
        local fmt = llvmir.control.controlFalseTypeToBoolean
        return {code = constructor.code
                .. utils.constructor_format( fmt,subst,env.label),out=out}
    end
    typedb:def_reduction( scalarBooleanType, controlTrueType,
                    controlTrueTypeToBoolean, tag_typeDeduction, rwd_control)
    typedb:def_reduction( scalarBooleanType, controlFalseType,
                    controlFalseTypeToBoolean, tag_typeDeduction, rwd_control)

    local function booleanToControlTrueType( constructor)
        local env = getCallableEnvironment()
        local out = env.label()
        local subst = {inp=constructor.out, out=out}
        local fmt = llvmir.control.booleanToControlTrueType
        return {code = constructor.code
                .. utils.constructor_format( fmt, subst, env.label),out=out}
    end
    local function booleanToControlFalseType( constructor)
        local env = getCallableEnvironment()
        local out = env.label()
        local subst = {inp=constructor.out, out=out}
        local fmt = llvmir.control.booleanToControlFalseType
        return {code = constructor.code
                .. utils.constructor_format( fmt, subst, env.label),out=out}
    end

    typedb:def_reduction( controlTrueType, scalarBooleanType,
                booleanToControlTrueType, tag_typeDeduction, rwd_control)
    typedb:def_reduction( controlFalseType, scalarBooleanType,
                booleanToControlFalseType, tag_typeDeduction, rwd_control)

    local function negateControlTrueType( this)
        return {type=controlFalseType, constructor=this.constructor}
    end
    local function negateControlFalseType( this)
        return {type=controlTrueType, constructor=this.constructor}
    end
    local function joinControlTrueTypeWithBool( this, arg)
        local out = this.out
        local subst = {inp=arg[1].out, out=out}
        local fmt = llvmir.control.booleanToControlTrueType
        local label_allocator = getCallableEnvironment().label
        local code2 = utils.constructor_format( fmt, subst, label_allocator)
        return {code=this.code .. arg[1].code .. code2, out=out}
    end
    local function joinControlFalseTypeWithBool( this, arg)
        local out = this.out
        local subst = {inp=arg[1].out, out=out}
        local fmt = llvmir.control.booleanToControlFalseType
        local label_allocator = getCallableEnvironment().label
        local code2 = utils.constructor_format( fmt, subst, label_allocator)
        return {code=this.code .. arg[1].code .. code2, out=out}
    end
    defineCall( controlTrueType, controlFalseType, "!", {}, nil)
    defineCall( controlFalseType, controlTrueType, "!", {}, nil)
    defineCall( controlTrueType, controlTrueType, "&&",
            {scalarBooleanType}, joinControlTrueTypeWithBool)
    defineCall( controlFalseType, controlFalseType, "||",
            {scalarBooleanType}, joinControlFalseTypeWithBool)

    local function joinControlFalseTypeWithConstexprBool( this, arg)
        if arg == false then
            return this
        else
            local env = getCallableEnvironment()
            local subst = {out=this.out}
            local fmt = llvmir.control.terminateTrueExit
            return {code = this.code
                    .. utils.constructor_format(fmt,subst,env.label), out=this.out}
        end
    end
    local function joinControlTrueTypeWithConstexprBool( this, arg)
        if arg == true then
            return this
        else
            local env = getCallableEnvironment()
            local subst = {out=this.out}
            local fmt = llvmir.control.terminateFalseExit
            return {code = this.code
                    .. utils.constructor_format(fmt,subst,env.label), out=this.out}
        end
    end
    defineCall( controlTrueType, controlTrueType, "&&",
            {constexprBooleanType}, joinControlTrueTypeWithConstexprBool)
    defineCall( controlFalseType, controlFalseType, "||",
            {constexprBooleanType}, joinControlFalseTypeWithConstexprBool)

    local function constexprBooleanToControlTrueType( value)
        local env = getCallableEnvironment()
        local out = label()
        local fmt = llvmir.control.terminateFalseExit
        local code = (value == true)
                and ""
                or utils.constructor_format( fmt, {out=out}, env.label)
        return {code=code, out=out}
    end
    local function constexprBooleanToControlFalseType( value)
        local env = getCallableEnvironment()
        local out = label()
        local fmt = llvmir.control.terminateFalseExit
        local code = (value == false)
                and ""
                or utils.constructor_format( fmt, {out=out}, env.label)
        return {code=code, out=out}
    end
    typedb:def_reduction( controlFalseType, constexprBooleanType,
                constexprBooleanToControlFalseType, tag_typeDeduction, rwd_control)
    typedb:def_reduction( controlTrueType, constexprBooleanType,
                constexprBooleanToControlTrueType, tag_typeDeduction, rwd_control)

    local function invertControlBooleanType( this)
        local out = getCallableEnvironment().label()
        local subst = {inp=this.out, out=out}
        local fmt = llvmir.control.invertedControlType
        return {code = this.code
                    .. utils.constructor_format( fmt, subst), out = out}
    end
    typedb:def_reduction( controlFalseType, controlTrueType,
                    invertControlBooleanType, tag_typeDeduction, rwd_control)
    typedb:def_reduction( controlTrueType, controlFalseType,
                    invertControlBooleanType, tag_typeDeduction, rwd_control)

    definePromoteCall( controlTrueType, constexprBooleanType, controlTrueType,
                    "&&", {scalarBooleanType}, constexprBooleanToControlTrueType)
    definePromoteCall( controlFalseType, constexprBooleanType, controlFalseType,
                    "||", {scalarBooleanType}, constexprBooleanToControlFalseType)
end

控制结构

最后一个代码片段实现了函数 conditionalIfElseBlock( node, condition, matchblk, elseblk, exitLabel),它为实现带或不带 elseifAST 节点函数完成了工作。如果我们有一个 else,那么我们必须跳转到末尾,并绑定控制真值类型的未绑定标签。这相当于在添加 else 块之前,将控制真值类型转换为控制假值类型。

-- Build a conditional if/elseif block
function conditionalIfElseBlock( node, condition, matchblk, elseblk, exitLabel)
    local cond_constructor
        = getRequiredTypeConstructor( node, controlTrueType, condition,
                        tagmask_matchParameter, tagmask_typeConversion)
    if not cond_constructor then
        local declstr = typedb:type_string(condition.type)
        utils.errorMessage( node.line, "Can't use type '%s' as condition", declstr)
    end
    local code = cond_constructor.code .. matchblk.code
    if exitLabel then
        local subst = {inp=cond_constructor.out, out=exitLabel}
        local fmt = llvmir.control.invertedControlType
        code = code .. utils.template_format( fmt, subst)
    else
        local subst = {inp=cond_constructor.out}
        local fmt = llvmir.control.label
        code = code .. utils.template_format( fmt, subst)
    end
    local out
    if elseblk then
        code = code .. elseblk.code
        out = elseblk.out
    else
        out = cond_constructor.out
    end
    return {code = code, out = out}
end

构建和运行编译器

最后,我们在开头展示的示例源代码上构建并运行我们的编译器。

翻译语法并构建编译器的 Lua 脚本
LUABIN=`which lua`
. examples/intro/luaenv.sh

mewa -b "$LUABIN" -g -o build/tutorial.compiler.lua examples/intro/grammar.g
chmod +x build/tutorial.compiler.lua
在测试程序上运行编译器前端
build/tutorial.compiler.lua -o build/program.llr examples/intro/program.prg
检查 LLVM IR 代码
less build/program.llr 
创建目标文件
llc -relocation-model=pic -O=3 -filetype=obj build/program.llr -o build/program.o
创建可执行文件
clang -lm -lstdc++ -o build/program build/program.o
运行可执行文件
build/program

输出

Salary sum: 280720.00

缺少什么

本文展示了一种缺少许多功能的原始语言的实现。以下是列表:

  • 实现延迟绑定的构造函数,例如用于实现复制省略。
  • 更多的类型限定符,如 const、private 等。
  • 指针
  • 动态内存分配
  • 异常
  • Generics
  • Lambda 表达式
  • 协程

还有更多。您可以访问 FAQ 来了解如何实现更丰富的语言。您可以深入研究主要的示例语言 language1,它实现了这里缺少的大部分功能作为示例。

改进

在通读这篇文章时,我发现我的做法违背了自己的建议。在两个示例语言中,context 都是作为参数在树遍历中传递的。context 最好被定义为绑定到其作用域。这是一个很好的例子,说明不要照搬这里展示的代码。

您不应该将这里展示的示例视为您编译器项目的代码库。不过,这些示例连同 FAQ 可以帮助您规划您的编译器项目。有了规划,您可以做得比我好得多。我当时是拿着一把砍刀在丛林中开路。而您现在有了全局的视野。

结论

我们已经看到如何使用 MewaLua 中编写一个非常原始的编译器。我们能够在 shell 中编译和运行一个简单的演示程序。

我希望我能为您提供一种编写编译器前端方法的概览。了解另一种方法可能很有趣,即使您走的是另一条路。

Mewa 基于一些非常古老的思想,并没有提供任何根本性的新东西。但它是一种务实的方法,使得单个人实现一个非凡编译器前端的原型成为可能。

链接

© . All rights reserved.