🔙 Loading a directory as a tree structure in Node

Hey you all 👋! This article is the first follow up on JSCity series. If you haven't read it yet, feel free to check it out in the post bellow:

In this post we will explore:

  • Loading directories using node APIs.
  • Building a tree structure that represents the directories loaded.
  • Test driven development to define the expectations around the output before implementing the code.

While in the MVP of JSCity all the processing happens in the browser (file upload, code parsing, visualization), for the second version I'm aiming to create modular packages, with the intention of increasing re-usability of these modules for future versions.

In this post, we'll be building the module that loads a local directory into a well-defined structure. The objective is to be able to export it as function of a package later.

I took this opportunity to use Typescript, but you can achieve the same goal with vanilla JavaScript.

Defining the structure

Directories in operating systems are displayed and represented in a hierarchical tree structure. The tree data structure is widely used to represent and traverse data efficiently.

The elements in a tree are called nodes and edges. A node contains some piece information, in our case information about the file or directory. In the following image, the arrows between the nodes are what we call edges.

tree structure

Nodes without children are often called leaf nodes and the highest node in a tree is called the root node.

There are multiple well known algorithms to traverse a tree. These will facilitate the process of building the city. So how can we make that directory tree in node?

The node file system API allows us to read any directory with fs.readdirSync, for example. It returns the array of strings representing the subfolders and files of that folder.

console.log(fs.readdirSync(initialPath));
// [ 'example.js', 'utils' ]

We can then leverage this to build our own tree structure!

I did find node-directory-tree package that does that for us. Despite that, I believe it is a good exercise to build the tree by ourselves.

To represent a node I decided to create the TreeNode class. The properties of a TreeNode are the path in the file system and an array of TreeNode (representing the sub-directories and files). When TreeNode is a file the children array will remain empty just like the leaf nodes we learned before.

class TreeNode {
  public path: string;
  public children: Array<TreeNode>;

  constructor(path: string) {
    this.path = path;
    this.children = [];
  }
}

That's a good enough first version of our tree nodes. Let's keep going.

Defining the root node

Now let's create some tests!

I will use a folder called fixtures as the input of our tests. That folder contains just some example files.

So given an initial path, we want it to return the root node representing that directory. We want to assert that the root contains the expected properties.

describe('buildTree', () => {
  const initialPath = path.join(__dirname, 'fixtures');

  it('should return root node', () => {
    const rootNode = buildTree(initialPath);
    expect(rootNode).not.toBeNull();
    expect(rootNode).toHaveProperty('path', initialPath);
    expect(rootNode).toHaveProperty('children');
  });
});

For now this test will fail, but that's expected. We still need to build the function mentioned in the code above.

The buildTree function receives a path as input and returns the tree structure for that directory.

function buildTree(rootPath: string) {
  return new TreeNode(rootPath);
}

That is enough to get our first test to pass ✅🎉

test pass printscreen

Reading the folder and its children

We can see that the buildTree function does not really build the full tree structure yet. That's our next step. The fixtures folder used by our test looks like the following.

fixtures
├── example.js
└── utils
   └── sum.js

The output of the function should represent the following tree.

folder structure

We can assert that the root, in our case fixtures, has two children: utils folder and example.js file.

it('should return root node with its exact 2 children', () => {
  const rootNode = buildTree(initialPath);
  expect(rootNode.children.length).toEqual(2);

  const childrenPath = rootNode.children.map(child => child.path);
  expect(childrenPath.includes(`${initialPath}/utils`)).toEqual(true);
  expect(childrenPath.includes(`${initialPath}/example.js`)).toEqual(true);
});

We can also assert that utils folder has the sum.js file inside of it.

it('should add utils node with its children inside root', () => {
  const rootNode = buildTree(initialPath);
  const utils = rootNode.children.find(
    child => child.path === `${initialPath}/utils`
  );

  expect(utils).not.toBeNull();
  expect(utils?.children.length).toEqual(1);
  expect(utils?.children[0]?.path).toEqual(`${initialPath}/utils/sum.js`);
});

And of course, they are going to fail at this point.

failed tests of buildTree function

Building the tree

We now need to extend buildTree so it builds the entire tree, not only the root node.

The Depth-first search aka DFS algorithm is a well-known technique to traverse a tree. In the iterative DFS algorithm we will need to use a Stack, which has the first-in-last-out (FILO) approach.

With DFS, our step by step looks like this:

  1. We first add the root to the stack.
  2. We loop while the stack is not empty (that means we still have nodes to visit).
  3. We pop an item from the stack to be our new currentNode.
  4. We use fs.readdirSync(currentNode.path) to get the node's sub-directories and files.
  5. For each one of them, we create a node and add it to the currentNode.children array. If it's a directory we also push it in the stack to visit it later.

animated gif of the build algorithm

In the end, we've visited all the directories, files and sub-directories and built our tree. The implementation looks like the this:

function buildTree(rootPath: string) {
  const root = new TreeNode(rootPath);

  const stack = [root];

  while (stack.length) {
    const currentNode = stack.pop();

    if (currentNode) {
      const children = fs.readdirSync(currentNode.path);

      for (let child of children) {
        const childPath = `${currentNode.path}/${child}`;
        const childNode = new TreeNode(childPath);
        currentNode.children.push(childNode);

        if (fs.statSync(childNode.path).isDirectory()) {
          stack.push(childNode);
        }
      }
    }
  }

  return root;
}

We used fs.readdirSync as before to discover the children of a folder. We also used fs.statSync to read the stats of the current path, it allows us to ask whether or not that child I'm looking at is a directory.

all tests passing

Green tests, yay 🙌, we have solved the problem of building the tree structure! When we log our root we are able to see its properties.

TreeNode {
  path: 'test/fixtures',
  children: [
    TreeNode {
      path: 'test/fixtures/example.js',
      children: []
    },
    TreeNode {
      path: 'test/fixtures/utils',
      children: [Array]
    }
  ]
}

What's next?

We got the desired output, but there is more we could do. For example, we can add a filter to exclude files of certain extension from our tree. I'll do that since I want to visualize .js files only.

There is also the possibility of adding properties like type, extension, size (...) to our TreeNode.

The next chapter will leverage this newly created structure to parse every JavaScript file in it and compute metrics about the code!

Was this post useful to you? I'm always keen to hear suggestions and comments. 👋

Let me know what you think about this post on twitter!

peaonunes © 2021, now built with Gatsby