Symbols and Scopes¶

The scoping module for AST1 is a major service in SOLP that provides scope trees and tables to the AST2 Builder.

We’ll work through using this API by considering a service that takes LSP requests to find the definition of whatever you click on in the IDE (e.g., Visual Studio Code). This won’t be the full plug-in, just the SOLP code required to make it work. The code is adapted from an existing plug-in written with the pygls and lsprotocol libraries.

Line to Node¶

As we saw in the Working with Source Code tutorial, SOLP lets us map nodes to source code locations easily. Usually, IDEs make requests based on the line and column number and expect the language tool to figure out what is at that location.

Let’s make a function that does that: it should take a list of possible AST1 nodes and a source location and determine the exact node that is defined at that source location. The SourceLocationSpan.does_contain(), available for every node with get_source_span(), will work for this.

All we need to do is recurse until we find the deeepest node whose span includes the location:

def get_containing_ast_node(src_loc: SourceLocation, nodes: List[Node]) -> Optional[Node]:
    for n in nodes:
        if not n:
            continue
        n_span = n.get_source_span()
        if n_span.does_contain(src_loc):
            children = list(n.get_children())
            return get_containing_ast_node(src_loc, children) if children else n
    return None

When a node has no more children, it must be the deepest node in the tree (a leaf).

Note

Since each node has a get_children() function, we can do this in a generic way without having to handle each node separately using a visitor.

Idents Only¶

If this node is an identifier, then we can do the reference search. If it’s anything else (a Solidity keyword, a punctuator, etc.), then we can’t get a definition.

if isinstance(ast1_node, Ident):
    return get_definitions_for_node(ast1_node)

Resolving the Reference¶

The reference could be qualified (e.g., x.y) or unqualified (y). The way in which y is accessed changes the scopes we need to search. The differences between the cases are the following:

Unqualified: Search for y in the node scope of ast1_node.
Qualified: Figure out the type of x, search for that type in ast1_node.scope to find a type scope, and search for y in that type scope.

Qualified lookups are modelled by the GetMember node in AST1. So far we know that y is an Ident; we need to determine what type of lookup it is.

if isinstance(ast1_node.parent, solnodes.GetMember):
    # qualified
else:
    # unqualified

Check the parent! Qualified lookups have a base x, and the member is y.

Unqualified¶

In the unqualified lookup case, search the node’s scope directly:

symbols = ast1_node.scope.find(ast1_node.text)
for s in symbols:
    for rs in s.res_syms():
        links.append(get_symbol_link(rs))

Note

The get_symbol_link function will be shown later.

What does res_syms do? Why not just return the symbols found in the scope?

In short, res_syms resolves symbolic links in the symbol table to their underlying symbols. This is because SOLP has different types of symbols; some are actual symbols based on elements in the real source code and some are created because of links created from inherits and imports or using statements. Since we want to locate source code elements, we need to get the underlying symbol(s).

Qualified¶

To get the base type of x, we’re going to cheat a bit and use the TypeHelper that’s built into the AST2 builder.

type_helper = ast2builder.type_helper

base_obj: solnodes1.AST1Node = ast1_node.parent.obj_base
base_type: solnodes2.Types = type_helper.get_expr_type(base_obj)

This bit of code is tricky, so it’s best to use Python type hints here. The Type returned from the TypeHelper is an AST2 type.

This AST2 type is passed back to the type helper to find the scopes to search:

base_scopes = type_helper.scopes_for_type(base_obj, base_type)

Search these scopes in the same way as the previous case:

for scope in base_scopes:
    symbols = scope.find(n.text)
    for s in symbols:
        for rs in s.res_syms():
            links.append(get_symbol_link(rs))

Details of `get_symbol_link`¶

The exact details of get_symbol_link depend on what LSP framework you’re using. Usually, the following info is needed from the reference that’s found:

Whether it’s a built-in type/object
The file it’s is defined in
The span of the node that defines the symbol and the span of the node’s descriptor/name

Scope vs Node¶

The AST1 node is found by the value attribute of the symbol. In general, you can think of the value as being the node that caused the symbol to be created in the symbol’s scope.

For Solidity built-in symbols, the value is usually None, but even if it has a value, it can’t be a real AST1 node. SOLP doesn’t parse the built-ins; they are created only in the symbol table.

Checking for Built-ins¶

This part is simple. Check if the symbol is any of the following types:

BuiltinFunction (self explanatory, for example keccak256() or abi.encode())
BuiltinObject (this is the msg part of msg.value, that is the container object that has other built-ins)
BuiltinValue (e.g., msg.value)

def is_builtin(sym):
    return isinstance(sym, (symtab.BuiltinFunction, symtab.BuiltinObject, symtab.BuiltinValue))

Mock Built-in File¶

When the user tries to find the definition for a built-in, let’s give them a file to view that contains pseudocode with documentation. For example, when they click on msg.sender, it opens a file called builtins.sol and goes to a struct member in a struct named Msg.

To do this, we need to take our built-in symbol-table object from above, parse the builtins.sol file, and find a corresponding AST1 node that we will use for the rest of get_symbol_link.

To do this, let’s say we have another VFS and symbol-table builder setup with just the builtins.sol file loaded (to avoid any nasty mixing with the real Solidity code of the project open in the IDE).

builtin_symbol = ...
# getting this env(ironment) is an implementation detail
# it just contains the vfs and symtab builder for builtins.sol only
env = LSP_SERVER.builtin_env
builtins_fs = env.symtab_builder.process_or_find_from_base_dir('solsrc/builtins.sol')
symbol_path = compute_symbol_root_name(builtin_symbol)
real_builtins_symbol = builtins_fs.find_multi_part_symbol(symbol_path)

We compute a root path (i.e., a fully qualified path from the FileScope of the builtin_symbol to the symbol itself). For example, if we had the BuiltinValue representing msg.sender, the key we get is msg.sender.

Additionally, find_multi_part_symbol does the qualified search using the key and finds the real symbol.

To actually compute the key, there are a few tricky details.

def compute_symbol_root_name(symbol) -> str:
    parts = []
    s = symbol
    while not isinstance(s, (symtab.FileScope, symtab.RootScope)):
        name = s.aliases[0]
        if name == '<type:address>':
            name = '_address'
        elif name == '<type:address payable>':
            name = '_address_payable'
        parts.append(name)
        s = s.parent_scope
    parts.reverse()

    if parts[0] == '_address' and parts[1] in ['transfer', 'send']:
        parts[0] = '_address_payable'

    return '.'.join(parts)

The general algorithm goes like this:

Take the current symbol and find its parents recursively until we get to the FileScope (or RootScope for built-ins).
Store the primary alias of the symbol as part of the key (most symbols only have one alias). This gives a reversed list of each of the parts of the key (e.g., ['sender', 'msg']).
Reverse the list and join the parts together with dots.

These are the tricky parts:

Lines 6–9. We can’t name a contract address or address payable in Solidity as it’s a language keyword. Instead, prefix these names with an underscore.
Lines 14–15. The transfer and send functions are stored under the address object in the symtab as old versions of Solidity allowed this. Whereas now, it’s only supported for address payable. Remap these functions to address payable in builtins.sol.

Finding the File¶

The symbol table creates a FileScope when it parses each file from the VFS. It has the source unit name, which we use to find the file path from the VFS.

def get_symbol_file_uri(vfs, symbol):
    file_scope = symbol.find_first_ancestor_of(symtab.FileScope)
    sun = file_scope.source_unit_name
    file_path = vfs.sources[sun].origin

The LSP deals with URIs, not paths, so convert the resultant path:

from pygls import uris

uris.from_fs_path(str(file_path))

Note

If we pass in the appropriate VFS and real symbol for the built-ins case, this same function works to give the URI of the builtins.sol!

Node Spans¶

To recap, we can take a source location, find the AST node there, check if it’s a reference, resolve the reference, and find a corresponding AST node that the reference may be referring to. Now all we need to do is get the range of the name of this node and the range of the entire node to return to the LSP client.

def get_node_range(n: Node) -> lsp.Range:
    solp_start, solp_end = n.start_location, n.end_location
    start = lsp.Position(solp_start.line-1, solp_start.column-1)
    end = lsp.Position(solp_end.line-1, solp_end.column-1)
    return lsp.Range(start, end)

This function is very simple. It just copies the data from the node into the lsp.Range object. We’ve shown it as it highlights how SOLP source locations are 1 based whereas LSP/IDE locations for this use case are 0 based, hence the -1's on each position.

Definition Name Span¶

This gets the range of the name of the target node only. For example, it would highlight just the name of the function or the name of the contract that has been referenced.

if hasattr(node, 'name'):
    return get_node_range(node.name)
else:
    return None

Definition Span¶

This gets the range of the entire target node, for example from the keyword function all the way to the closing curly brace of a function definition.

return get_node_range(node)

Closing Notes¶

While this tutorial can’t cover the entire plumbing required to make a language server for Solidity, the concepts introduced here will help you get there. In fact, most of the code in this guide is taken from our open-source demo implementation available on GitHub.

Symbols and Scopes¶

Line to Node¶

Idents Only¶

Resolving the Reference¶

Unqualified¶

Qualified¶

Details of get_symbol_link¶

Scope vs Node¶

Checking for Built-ins¶

Mock Built-in File¶

Finding the File¶

Node Spans¶

Definition Name Span¶

Definition Span¶

Closing Notes¶

Details of `get_symbol_link`¶