Read JavaScript Source Code, Using an AST

Published on February 9, 2019

By Bart Ledoux

Read JavaScript Source Code, Using an AST

Introduction

Let’s say you have a big JavaScript file, remaining from the old days. It’s 70,000 lines long and you desperately need to split it up using webpack or consorts. Then you need to know what function or constants it exposes to the global scope.

Let a computer read through your code and extract what you want from it.

It’s a job for Abstract Syntax Trees (AST).

Logo declaring AST in a design that resembles a comic book superhero.

The following example is small. Our mission, should you choose to accept it, will be to extract the names of all the functions exposed in the global scope.

test.js

// test the code
function decrementAndAdd(a, b) {
   function add(c, d) {
      return c + d;
   }
   a--;
   b = b - 1;
   return add(a,b)
}

// test the code
function incrementAndMultiply(a, b) {
    function multiply(c, d) {
      return c * d;
    }
    a++;
    b = b + 1;
    return multiply(a, b)
}

Result should be ["decrementAndAdd", "incrementAndMultiply"].

Parsing the Code

An AST is the result of parsing code. For JavaScript, an AST is a JavaScript object containing a tree representation of your source. Before we use it, we have to create it. Depending on the code we are parsing, we choose the appropriate parser.

Here since the code is ES5-compatible, we can choose the acorn parser.

Here are some of the most well known Open Source ECMAScript parsers:

Parser	Supported Languages	GitHub
acorn	esnext & JSX (using acorn-jsx)	https://github.com/acornjs/acorn
esprima	esnext & older	https://github.com/jquery/esprima
cherow	esnext & older	https://github.com/cherow/cherow
espree	esnext & older	https://github.com/eslint/espree
shift	esnext & older	https://github.com/shapesecurity/shift-parser-js
babel	esnext, JSX & typescript	https://github.com/babel/babel
TypeScript	esnext & typescript	https://github.com/Microsoft/TypeScript

All parsers work the same. Give it some code, get an AST.

const { Parser } = require('acorn')

const ast = Parser.parse(readFileSync(fileName).toString())

The TypeScript parser syntax is a little different. But it is well documented here.

This is the tree obtained with @babel/parser parsing:

Tree graph from @babel/parser

test-2.js

// test the code
function decrementAndAdd(a, b) {
  return add(a, b)
}

Traversing

In order to find what we are going to extract, it’s often better not to treat the whole AST at once. It’ll be a large object with thousands of nodes even for small code snippets. So, before we extract the information we need, we refine our search.

The best way to do that is to only filter the tokens one cares about.

Once again, many tools are available to do this traversing part. For our example, we are going to use recast. It’s very fast and has the advantage of keeping a version of your code untouched. This way, it can return the part of your code you want with its original formatting.

While traversing, we’ll find all the function tokens. This is why we use the visitFunctionDeclaration method.

If we wanted to look at variable assignments we would use visitAssignmentExpression.

recast-acorn-example.js

const recast = require('recast');
const { Parser } = require('acorn');

const ast = Parser.parse(readFileSync(fileName).toString());

recast.visit(
  ast,
  {
    visitFunctionDeclaration: (path) => {
      // the navigation code here...

      // return false to stop at this depth
      return false;
    }
  }
)

AST Node Types

Usually, the names of the token types are not obvious. One can use ast-explorer to look up the types researched. Just paste your code in the left panel, select the parser you are using, and “voilà!”. Browse the parsed code on the right and find what Node Type you’re looking for.

Shallow or Deep

We don’t always want to look at every level of the tree. Sometimes we want to do a deep search while other times we just want to look at the top layer. Depending on the framework, the syntax differs. Fortunately, it’s usually well documented.

With recast, if we want to stop searching at the current depth, just return false when you are done. This is what we did before. If we want to traverse through (go deep), we can use this.traverse(path) like you’ll see below.

With @babel/traverse no need to tell babel where to continue. One only needs to specify where to stop with a return false statement.

recast-acorn-example.js

recast.visit(
  ast,
  {
    visitFunctionDeclaration: (path) => {
      // deal with the path here...

      // run the visit on every child node
      this.traverse(path);
    }
  }
)

We went from a very broad search to a smaller sample. We can now extract the data we need.

Navigating from Path, to Node, to Property

The path object passed to the visitFunctionDeclaration is a NodePath. This object represents the connection between a parent and child AST Nodes. This path on its own is of no use to us because it represents the link between the function declaration and the body of the function.

Using ast-explorer, we find the contents of the path we are looking for.

The classic thing to do: path.node. It gets the child Node in the parent-child relationship. If you chose to search functions, the node in path.node will be of type Function:

recast-acorn-example.js

const functionNames = [];

recast.visit(
  ast,
  {
    visitFunctionDeclaration: (path) => {
      console.log(path.node.type); // will print "FunctionDeclaration"
      functionNames.push(path.node.id.name); // will add the name of the function to the array

      // return false to avoid looking inside of the functions body
      // we stop our search at this level
      return false;
    }
  }
)

Try wrapping traversing functions in each other to look at subtrees. The code below will return every function that’s exactly on the second level down. It would not recognize a function in a function in a function:

recast-acorn-example.js

const functionNames = [];

recast.visit(
  ast,
  {
    visitFunctionDeclaration: (path) => {
      let newPath = path.get('body');

      // subtraversing
      recast.visit(
        newPath,
        {
          visitFunctionDeclaration: (path) => {
            functionNames.push(path.node.id.name);
            return false;
          }
        }
      )

      // return false to not look at other functions contained in this function
      // leave this role to the sub-traversing
      return false;
    }
  }
)

Mission Accomplished!!

We programmatically found all the function names. We could as easily find the names of the arguments or the exposed variables.

Glossary

AST Node one object in a tree. Examples: function declaration, variable assignment, object expression

NodePath link between a parent Node and a child Node in a tree

NodeProperty parts of the definition of the node. Depending on the node, one can have just a name or more info

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Bart Ledoux

Author

Category:

Tutorial

Tags:

JavaScript

While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.