Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query

This post is part of a regular series of posts where the C++ product team and other guests answer questions we have received from customers. The questions can be about anything C++ related: MSVC toolset, the standard language and library, the C++ standards committee, isocpp.org, CppCon, etc.

Today’s post is by guest author Stephen Kelly, who is a developer at Havok, a contributor to Qt and CMake and a blogger. This post is part of a series where he is sharing his experience using Clang tooling in his current team.

In the last post, we created a new clang-tidy check following documented steps and encountered the first limitation in our own knowledge - how can we change both declarations and expressions such as function calls?

In order to create an effective refactoring tool, we need to understand the code generated by the create_new_check.py script and learn how to extend it.

Exploring C++ Code as C++ Code

When Clang processes C++, it creates an Abstract Syntax Tree representing the code. The AST needs to be able to represent all of the possible complexity that can appear in C++ code - variadic templates, lambdas, operator overloading, declarations of various kinds etc. If we can use the AST representation of the code in our tooling, we won't be discarding any of the meaning of the code in the process, as we would if we limit ourselves to processing only text.

Our goal is to harness the complexity of the AST so that we can describe patterns in it, and then replace those patterns with new text. The Clang AST Matcher API and FixIt API satisfy those requirements respectively.

The level of complexity in the AST means that detailed knowledge is required in order to comprehend it. Even for an experienced C++ developer, the number of classes and how they relate to each other can be daunting. Luckily, there is a rhythm to it all. We can identify patterns, use tools to discover what makes up the Clang model of the C++ code, and get to the point of having an instinct about how to create a clang-tidy check quickly.

Exploring a Clang AST

Let's dive in and create a simple piece of test code so we can examine the Clang AST for it:

 
int addTwo(int num) 
{ 
    return num + 2; 
} 

int main(int, char**) 
{ 
    return addTwo(3); 
} 

There are multiple ways to examine the Clang AST, but the most useful when creating AST Matcher based refactoring tools is clang-query. We need to build up our knowledge of AST matchers and the AST itself at the same time via clang-query.

So, let's return to MyFirstCheck.cpp which we created in the last post. The MyFirstCheckCheck::registerMatchers method contains the following line:

Finder->addMatcher(functionDecl().bind("x"), this); 

The first argument to addMatcher is an AST matcher, an Embedded Domain Specific Language of sorts. This is a predicate language which clang-tidy uses to traverses the AST and create a set of resulting 'bound nodes'. In the above case, a bound node with the name x is created for each function declaration in the AST. clang-tidy later calls MyFirstCheckCheck::check for each set of bound nodes in the result.

Let's start clang-query passing our test file as a parameter and following it with two dashes. Similar to use of clang-tidy in Part 1, this allows us to specify compile options and avoid warnings about a missing compilation database.

This command drops us into an interactive interpreter which we can use to query the AST:

$ clang-query.exe testfile.cpp -- 

clang-query>

Type help for a full set of commands available in the interpreter. The first command we can examine is match, which we can abbreviate to m. Let's paste in the matcher from MyFirstCheck.cpp:

clang-query> match functionDecl().bind("x") 

Match #1: 
 
testfile.cpp:1:1: note: "root" binds here 
int addTwo(int num) 
^~~~~~~~~~~~~~~~~~~ 
testfile.cpp:1:1: note: "x" binds here 
int addTwo(int num) 
^~~~~~~~~~~~~~~~~~~ 
 
Match #2: 
 
testfile.cpp:6:1: note: "root" binds here 
int main(int, char**) 
^~~~~~~~~~~~~~~~~~~~~ 
testfile.cpp:6:1: note: "x" binds here 
int main(int, char**) 
^~~~~~~~~~~~~~~~~~~~~ 
2 matches. 

clang-query automatically creates a binding for the root element in a matcher. This gets noisy when trying to match something specific, so it makes sense to turn that off if defining custom binding names:

clang-query> set bind-root false 
clang-query> m functionDecl().bind("x") 

Match #1: 

testfile.cpp:1:1: note: "x" binds here 
int addtwo(int num) 
^~~~~~~~~~~~~~~~~~~ 

Match #2: 

testfile.cpp:6:1: note: "x" binds here 
int main(int, char**) 
^~~~~~~~~~~~~~~~~~~~~ 
2 matches. 

So, we can see that for each function declaration that appeared in the translation unit, we get a resulting match. clang-tidy will later use these matches one at a time in the check method in MyFirstCheck.cpp to complete the refactoring.

Use quit to exit the clang-query interpreter. The interpreter must be restarted each time C++ code is changed in order for the new content to be matched.

Nesting matchers

The AST Matchers form a 'predicate language' where each matcher in the vocabulary is itself a predicate, and those predicates can be nested. The matchers fit into three broad categories as documented in the AST Matchers Reference.

functionDecl() is an AST Matcher which is invoked for each function declaration in the source code. In normal source code, there will be hundreds or thousands of results coming from external headers for such a simple matcher.

Let's match only functions with a particular name:

clang-query> m functionDecl(hasName("addTwo")) 

Match #1: 

testfile.cpp:1:1: note: "root" binds here 
int addTwo(int num) 
^~~~~~~~~~~~~~~~~~~ 
1 match. 

This matcher will only trigger on function declarations which have the name "addTwo". The middle column of the documentation indicates the name of each matcher, and the first column indicates the kind of matcher that it can be nested inside. The hasName documentation is not listed as being usable with the Matcher, but instead with Matcher.

Here, a developer without prior experience with the Clang AST needs to learn that the FunctionDecl AST class inherits from the NamedDecl AST class (as well as DeclaratorDecl, ValueDecl and Decl). Matchers documented as usable with each of those classes can also work with a functionDecl() matcher. That familiarity with the inheritance structure of Clang AST classes is essential to proficiency with AST Matchers. The names of classes in the Clang AST correspond to "node matcher" names by making the first letter lower-case. In the case of class names with an abbreviation prefix CXX such as CXXMemberCallExpr, the entire prefix is lowercased to produce the matcher name cxxMemberCallExpr.

So, instead of matching function declarations, we can match on all named declarations in our source code. Ignoring some noise in the output, we get results for each function declaration and each parameter variable declaration:

clang-query> m namedDecl() 
... 
Match #8: 

testfile.cpp:1:1: note: "root" binds here 
int addTwo(int num) 
^~~~~~~~~~~~~~~~~~~ 

Match #9: 

testfile.cpp:1:12: note: "root" binds here 
int addTwo(int num) 
           ^~~~~~~ 

Match #10: 

testfile.cpp:6:1: note: "root" binds here 
int main(int, char**) 
^~~~~~~~~~~~~~~~~~~~~ 

Match #11: 

testfile.cpp:6:10: note: "root" binds here 
int main(int, char**) 
         ^~~ 

Match #12: 

testfile.cpp:6:15: note: "root" binds here 
int main(int, char**) 
              ^~~~~~

Parameter declarations are in the match results because they are represented by the ParmVarDecl class, which also inherits NamedDecl. We can match only parameter variable declarations by using the corresponding AST node matcher:

clang-query> m parmVarDecl() 

Match #1: 

testfile.cpp:1:12: note: "root" binds here 
int addTwo(int num) 
           ^~~~~~~ 

Match #2: 

testfile.cpp:6:10: note: "root" binds here 
int main(int, char**) 
         ^~~ 

Match #3: 

testfile.cpp:6:15: note: "root" binds here 
int main(int, char**) 
              ^~~~~~

clang-query has a code-completion feature, triggered by pressing TAB, which shows the matchers which can be used at any particular context. This feature is not enabled on Windows however.

Discovery Through Clang AST Dumps

clang-query gets most useful as a discovery tool when exploring deeper into the AST and dumping intermediate nodes.

Let's query our testfile.cpp again, this time with the output set to dump:

clang-query> set output dump 
clang-query> m functionDecl(hasName(“addTwo”)) 

Match #1: 

Binding for "root": 
FunctionDecl 0x17a193726b8 18>12>12>12>3:5>2:1>12>1:1>

19>12>12>12>8:5>7:1>15>10>6:1>

6:1>15>

19>12>12>12>8:5>7:1>15>10>6:1>19>12>12>8:12>

36>32>15:32>28>28>24>20>20>15:20>15:14>

1:1>28>28>24>20>20>15:20>

––

Share the post

Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query

×

Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription

×