Symbols and Scopes¶
The scoping module
for AST1 is a major service in SOLP that provides scope trees
and tables to the AST2 Builder
.
We’ll work through using this API by considering a service that takes LSP requests to find the definition of whatever you click on in the IDE (e.g., Visual Studio Code). This won’t be the full plug-in, just the SOLP code required to make it work. The code is adapted from an existing plug-in written with the pygls and lsprotocol libraries.
Line to Node¶
As we saw in the Working with Source Code tutorial, SOLP lets us map nodes to source code locations easily. Usually, IDEs make requests based on the line and column number and expect the language tool to figure out what is at that location.
Let’s make a function that does that: it should take a list of possible AST1 nodes and a source location and determine the
exact node that is defined at that source location. The SourceLocationSpan.does_contain()
,
available for every node with get_source_span()
, will work
for this.
All we need to do is recurse until we find the deeepest node whose span includes the location:
def get_containing_ast_node(src_loc: SourceLocation, nodes: List[Node]) -> Optional[Node]:
for n in nodes:
if not n:
continue
n_span = n.get_source_span()
if n_span.does_contain(src_loc):
children = list(n.get_children())
return get_containing_ast_node(src_loc, children) if children else n
return None
When a node has no more children, it must be the deepest node in the tree (a leaf).
Note
Since each node has a get_children()
function, we can
do this in a generic way without having to handle each node separately using a visitor.
Idents Only¶
If this node is an identifier, then we can do the reference search. If it’s anything else (a Solidity keyword, a punctuator, etc.), then we can’t get a definition.
if isinstance(ast1_node, Ident):
return get_definitions_for_node(ast1_node)
Resolving the Reference¶
The reference could be qualified (e.g., x.y
) or unqualified (y
). The way in which y
is accessed changes the
scopes we need to search. The differences between the cases are the following:
Unqualified: Search for
y
in thenode scope
ofast1_node
.Qualified: Figure out the type of
x
, search for that type inast1_node.scope
to find a type scope, and search fory
in that type scope.
Qualified lookups are modelled by the GetMember
node in AST1. So
far we know that y
is an Ident
; we need to determine what type of
lookup it is.
if isinstance(ast1_node.parent, solnodes.GetMember):
# qualified
else:
# unqualified
Check the parent! Qualified lookups have a base x
, and the member is y
.
Unqualified¶
In the unqualified lookup case, search the node’s scope directly:
symbols = ast1_node.scope.find(ast1_node.text)
for s in symbols:
for rs in s.res_syms():
links.append(get_symbol_link(rs))
Note
The get_symbol_link
function will be shown later.
What does res_syms
do? Why not just return the symbols found in the scope?
In short, res_syms
resolves symbolic links in the symbol table to their underlying symbols. This is because SOLP has different
types of symbols; some are actual symbols based on elements in the real source code
and some are created because of links created from inherits and imports or using statements. Since we want to locate
source code elements, we need to get the underlying symbol(s).
Qualified¶
To get the base type of x
, we’re going to cheat a bit and use the TypeHelper
that’s built into the AST2 builder.
type_helper = ast2builder.type_helper
base_obj: solnodes1.AST1Node = ast1_node.parent.obj_base
base_type: solnodes2.Types = type_helper.get_expr_type(base_obj)
This bit of code is tricky, so it’s best to use Python type hints here. The Type
returned from the TypeHelper is an AST2 type
.
This AST2 type is passed back to the type helper to find the scopes to search:
base_scopes = type_helper.scopes_for_type(base_obj, base_type)
Search these scopes in the same way as the previous case:
for scope in base_scopes:
symbols = scope.find(n.text)
for s in symbols:
for rs in s.res_syms():
links.append(get_symbol_link(rs))
Details of get_symbol_link
¶
The exact details of get_symbol_link
depend on what LSP framework you’re using. Usually, the following info is needed
from the reference that’s found:
Whether it’s a built-in type/object
The file it’s is defined in
The span of the node that defines the symbol and the span of the node’s descriptor/name
Scope vs Node¶
The AST1 node is found by the value
attribute of the symbol. In
general, you can think of the value as being the node that caused the symbol to be created in the symbol’s scope.
For Solidity built-in symbols, the value
is usually None
, but even if it has a value, it can’t
be a real AST1 node. SOLP doesn’t parse the built-ins; they are created only in the symbol table.
Checking for Built-ins¶
This part is simple. Check if the symbol is any of the following types:
BuiltinFunction
(self explanatory, for examplekeccak256()
orabi.encode()
)BuiltinObject
(this is themsg
part ofmsg.value
, that is the container object that has other built-ins)BuiltinValue
(e.g.,msg.value
)
def is_builtin(sym):
return isinstance(sym, (symtab.BuiltinFunction, symtab.BuiltinObject, symtab.BuiltinValue))
Mock Built-in File¶
When the user tries to find the definition for a built-in, let’s give them a file to view that contains pseudocode with
documentation. For example, when they click on msg.sender
, it opens a file called builtins.sol and goes to a struct
member in a struct named Msg
.
To do this, we need to take our built-in symbol-table object from above, parse the builtins.sol file, and find a
corresponding AST1 node that we will use for the rest of get_symbol_link
.
To do this, let’s say we have another VFS and symbol-table builder setup with just the builtins.sol file loaded (to avoid any nasty mixing with the real Solidity code of the project open in the IDE).
builtin_symbol = ...
# getting this env(ironment) is an implementation detail
# it just contains the vfs and symtab builder for builtins.sol only
env = LSP_SERVER.builtin_env
builtins_fs = env.symtab_builder.process_or_find_from_base_dir('solsrc/builtins.sol')
symbol_path = compute_symbol_root_name(builtin_symbol)
real_builtins_symbol = builtins_fs.find_multi_part_symbol(symbol_path)
We compute a root path (i.e., a fully qualified path from the FileScope of the builtin_symbol
to the symbol itself).
For example, if we had the BuiltinValue
representing msg.sender
, the key we get is msg.sender
.
Additionally, find_multi_part_symbol
does the qualified search using the key and finds the real symbol.
To actually compute the key, there are a few tricky details.
1def compute_symbol_root_name(symbol) -> str:
2 parts = []
3 s = symbol
4 while not isinstance(s, (symtab.FileScope, symtab.RootScope)):
5 name = s.aliases[0]
6 if name == '<type:address>':
7 name = '_address'
8 elif name == '<type:address payable>':
9 name = '_address_payable'
10 parts.append(name)
11 s = s.parent_scope
12 parts.reverse()
13
14 if parts[0] == '_address' and parts[1] in ['transfer', 'send']:
15 parts[0] = '_address_payable'
16
17 return '.'.join(parts)
The general algorithm goes like this:
Take the current symbol and find its parents recursively until we get to the FileScope (or RootScope for built-ins).
Store the primary alias of the symbol as part of the key (most symbols only have one alias). This gives a reversed list of each of the parts of the key (e.g.,
['sender', 'msg']
).Reverse the list and join the parts together with dots.
These are the tricky parts:
Lines 6–9. We can’t name a contract address or address payable in Solidity as it’s a language keyword. Instead, prefix these names with an underscore.
Lines 14–15. The
transfer
andsend
functions are stored under the address object in the symtab as old versions of Solidity allowed this. Whereas now, it’s only supported for address payable. Remap these functions to address payable in builtins.sol.
Finding the File¶
The symbol table creates a FileScope
when it parses each file from
the VFS. It has the source unit name,
which we use to find the file path from the VFS.
def get_symbol_file_uri(vfs, symbol):
file_scope = symbol.find_first_ancestor_of(symtab.FileScope)
sun = file_scope.source_unit_name
file_path = vfs.sources[sun].origin
The LSP deals with URIs, not paths, so convert the resultant path:
from pygls import uris
uris.from_fs_path(str(file_path))
Note
If we pass in the appropriate VFS and real symbol for the built-ins case, this same function works to give the URI of the builtins.sol!
Node Spans¶
To recap, we can take a source location, find the AST node there, check if it’s a reference, resolve the reference, and find a corresponding AST node that the reference may be referring to. Now all we need to do is get the range of the name of this node and the range of the entire node to return to the LSP client.
def get_node_range(n: Node) -> lsp.Range:
solp_start, solp_end = n.start_location, n.end_location
start = lsp.Position(solp_start.line-1, solp_start.column-1)
end = lsp.Position(solp_end.line-1, solp_end.column-1)
return lsp.Range(start, end)
This function is very simple. It just copies the data from the node into the lsp.Range
object. We’ve shown it as it
highlights how SOLP source locations are 1 based whereas LSP/IDE locations for this use case are 0 based, hence the
-1
's on each position.
Definition Name Span¶
This gets the range of the name of the target node only. For example, it would highlight just the name of the function or the name of the contract that has been referenced.
if hasattr(node, 'name'):
return get_node_range(node.name)
else:
return None
Definition Span¶
This gets the range of the entire target node, for example from the keyword function
all the way to the closing curly brace
of a function definition.
return get_node_range(node)
Closing Notes¶
While this tutorial can’t cover the entire plumbing required to make a language server for Solidity, the concepts introduced here will help you get there. In fact, most of the code in this guide is taken from our open-source demo implementation available on GitHub.