🔙 Loading a directory as a tree structure in Node
6/8/2020
dev.to 🔗Hey you all 👋! This article is the first follow up on JSCity
series. If you haven't read it yet, feel free to check it out in the post bellow:
JSCity visualizing JavaScript code
Rafael Nunes ・ May 31 ・ 6 min read
In this post we will explore:
- Loading directories using
node
APIs. - Building a tree structure that represents the directories loaded.
- Test driven development to define the expectations around the output before implementing the code.
While in the MVP of JSCity
all the processing happens in the browser (file upload, code parsing, visualization), for the second version I'm aiming to create modular packages, with the intention of increasing re-usability of these modules for future versions.
In this post, we'll be building the module that loads a local directory into a well-defined structure. The objective is to be able to export it as function of a package later.
I took this opportunity to use Typescript, but you can achieve the same goal with vanilla JavaScript.
Defining the structure
Directories in operating systems are displayed and represented in a hierarchical tree structure. The tree data structure is widely used to represent and traverse data efficiently.
The elements in a tree are called nodes and edges. A node contains some piece information, in our case information about the file or directory. In the following image, the arrows between the nodes are what we call edges.
Nodes without children are often called leaf nodes and the highest node in a tree is called the root node.
There are multiple well known algorithms to traverse a tree. These will facilitate the process of building the city. So how can we make that directory tree in node
?
The node
file system API allows us to read any directory with fs.readdirSync
, for example. It returns the array of strings representing the subfolders and files of that folder.
console.log(fs.readdirSync(initialPath));
// [ 'example.js', 'utils' ]
We can then leverage this to build our own tree structure!
I did find node-directory-tree package that does that for us. Despite that, I believe it is a good exercise to build the tree by ourselves.
To represent a node I decided to create the TreeNode
class. The properties of a TreeNode
are the path in the file system and an array of TreeNode
(representing the sub-directories and files). When TreeNode
is a file the children
array will remain empty just like the leaf nodes we learned before.
class TreeNode {
public path: string;
public children: Array<TreeNode>;
constructor(path: string) {
this.path = path;
this.children = [];
}
}
That's a good enough first version of our tree nodes. Let's keep going.
Defining the root node
Now let's create some tests!
I will use a folder called fixtures
as the input of our tests. That folder contains just some example files.
So given an initial path, we want it to return the root
node representing that directory. We want to assert that the root
contains the expected properties.
describe('buildTree', () => {
const initialPath = path.join(__dirname, 'fixtures');
it('should return root node', () => {
const rootNode = buildTree(initialPath);
expect(rootNode).not.toBeNull();
expect(rootNode).toHaveProperty('path', initialPath);
expect(rootNode).toHaveProperty('children');
});
});
For now this test will fail, but that's expected. We still need to build the function mentioned in the code above.
The buildTree
function receives a path as input and returns the tree structure for that directory.
function buildTree(rootPath: string) {
return new TreeNode(rootPath);
}
That is enough to get our first test to pass ✅🎉
Reading the folder and its children
We can see that the buildTree
function does not really build the full tree structure yet. That's our next step. The fixtures
folder used by our test looks like the following.
fixtures
├── example.js
└── utils
└── sum.js
The output of the function should represent the following tree.
We can assert that the root
, in our case fixtures
, has two children: utils
folder and example.js
file.
it('should return root node with its exact 2 children', () => {
const rootNode = buildTree(initialPath);
expect(rootNode.children.length).toEqual(2);
const childrenPath = rootNode.children.map(child => child.path);
expect(childrenPath.includes(`${initialPath}/utils`)).toEqual(true);
expect(childrenPath.includes(`${initialPath}/example.js`)).toEqual(true);
});
We can also assert that utils
folder has the sum.js
file inside of it.
it('should add utils node with its children inside root', () => {
const rootNode = buildTree(initialPath);
const utils = rootNode.children.find(
child => child.path === `${initialPath}/utils`
);
expect(utils).not.toBeNull();
expect(utils?.children.length).toEqual(1);
expect(utils?.children[0]?.path).toEqual(`${initialPath}/utils/sum.js`);
});
And of course, they are going to fail at this point.
Building the tree
We now need to extend buildTree
so it builds the entire tree, not only the root node.
The Depth-first search aka DFS algorithm is a well-known technique to traverse a tree. In the iterative DFS algorithm we will need to use a Stack, which has the first-in-last-out (FILO) approach.
With DFS, our step by step looks like this:
- We first add the
root
to the stack. - We loop while the stack is not empty (that means we still have nodes to visit).
- We
pop
an item from the stack to be our newcurrentNode
. - We use
fs.readdirSync(currentNode.path)
to get the node's sub-directories and files. - For each one of them, we create a node and add it to the
currentNode.children
array. If it's a directory we alsopush
it in the stack to visit it later.
In the end, we've visited all the directories, files and sub-directories and built our tree. The implementation looks like the this:
function buildTree(rootPath: string) {
const root = new TreeNode(rootPath);
const stack = [root];
while (stack.length) {
const currentNode = stack.pop();
if (currentNode) {
const children = fs.readdirSync(currentNode.path);
for (let child of children) {
const childPath = `${currentNode.path}/${child}`;
const childNode = new TreeNode(childPath);
currentNode.children.push(childNode);
if (fs.statSync(childNode.path).isDirectory()) {
stack.push(childNode);
}
}
}
}
return root;
}
We used fs.readdirSync
as before to discover the children of a folder. We also used fs.statSync
to read the stats of the current path, it allows us to ask whether or not that child I'm looking at is a directory.
Green tests, yay 🙌, we have solved the problem of building the tree structure! When we log our root
we are able to see its properties.
TreeNode {
path: 'test/fixtures',
children: [
TreeNode {
path: 'test/fixtures/example.js',
children: []
},
TreeNode {
path: 'test/fixtures/utils',
children: [Array]
}
]
}
What's next?
We got the desired output, but there is more we could do. For example, we can add a filter to exclude files of certain extension from our tree. I'll do that since I want to visualize .js
files only.
There is also the possibility of adding properties like type
, extension
, size
(...) to our TreeNode
.
The next chapter will leverage this newly created structure to parse every JavaScript
file in it and compute metrics about the code!
Was this post useful to you? I'm always keen to hear suggestions and comments. 👋
Let me know what you think about this post on twitter!