- Introduction
- Dependencies
- Build/deployment Instructions
- Usage Scenarios
- Test-Driven Development (TDD)
- Design Principles
TreeFactor is a command-line tool for detecting refactoring operations in multi-language software projects. It is built using tree-sitter for robust parsing and AST generation. The tool supports refactoring detection in Python, JavaScript, and C++ codebases, with primary focus on identifying parameter-related refactorings.
- Multi-language Support: Analyzes code in Python, JavaScript, and C++
- Git Integration: Works directly with local Git repositories and GitHub repositories
- Comprehensive Python Support: Full detection capabilities for Python, including:
- Parameter renaming
- Parameter addition
- Parameter type changes
- Method renaming
- Parameter Renaming Detection: Supports parameter renaming detection in JavaScript and C++
-
Java 11 or higher
Download Link: Oracle JDK | OpenJDK -
Git
Download Link: Git Official Website -
GCC Compiler
Download Link: GCC on GNU Project -
Bash Shell Environment
Native OS Shell or Download Link (Optional):- Git Bash for Windows (Windows users)
- Install Bash on macOS/Linux
Operating System Support
- macOS
- Linux
- Windows
- Clone the project repository:
git clone https://github.com/CSCI5308/course-project-g03.git
cd course-project-g03- Set up Tree-sitter dependencies:
# Make the setup script executable
chmod +x setup_treefactor.sh
# Run the setup script
./setup_treefactor.shBuild using Maven:
mvn clean packageThe application can be run with different options:
- Analyze all commits in a repository:
./treefactor.sh -a /path/to/repo [branch-name]- Analyze a specific commit:
./treefactor.sh -c /path/to/repo [commit-hash]- Analyze a GitHub repository commit:
./treefactor.sh -gc https://github.com/username/repo [your-token] [commit-hash] [timeout]- Display Help for Other Commands:
./treefactor.sh -hThe tool supports four main command-line options:
-a: Analyze all commits in a local repository branch-c: Analyze a specific commit in a local repository-gc: Analyze a specific commit from a GitHub repository-h: Display command formats for other commands
./treefactor.sh -a <path-to-local-repo> <branch-name>This command detects refactorings for all commits in the specified branch of a local repository. The tool will:
- Scan through the entire commit history
- Detect refactorings in Python, JavaScript, and C++ files
- Output detected refactorings for each commit
If no branch is specified, it will analyze commits from all branches:
./treefactor.sh -a <path-to-local-repo>./treefactor.sh -c <path-to-local-repo> <commit-hash>This command detects refactorings at a specific commit of a local repository. Useful when you want to:
- Check refactorings in a particular change
- Validate refactoring operations before merging
- Review historical changes
./treefactor.sh -gc <git-url> <token> <commit-hash> <timeout>This command detects refactorings at a specified commit for project within the given in seconds. It requires a GitHub authentication token.
The tool provides four types of refactoring detection for Python code, including:
# Before
def greet(msg):
print(msg)
# After
def greet(message):
print(message)In this example, parameter is renamed from msg to message.
# Before
def greet(name):
print(f"Hello, {name}!")
# After
def greet(name, greeting="Hello"):
print(f"{greeting}, {name}!")In this example, a new parameter with the name greeting is added.
# Before
def process(data: list):
pass
# After
def process(data: List[str]):
passIn this example, the type of the parameter is changed from list to List[str].
# Before
def calc_sum(numbers):
return sum(numbers)
# After
def calculate_sum(numbers):
return sum(numbers)In this example, the method is renamed from calc_sum to calculate_sum.
For JavaScript, the tool currently supports parameter renaming detection.
// Before
function calculate(n) {
return n * 2;
}
// After
function calculate(num) {
return num * 2;
}In this example, parameter is renamed from n to num.
For C++, the tool currently supports parameter renaming detection.
// Before
void process(int x) {
std::cout << x << std::endl;
}
// After
void process(int value) {
std::cout << value << std::endl;
}In this example, parameter is renamed from x to value.
As shown in the real output, the tool provides detailed information about each refactoring:
Commit ID: XXXXXX
Commit Message: refactor
Parent Commit ID: XXXXXX
Refactorings:
Rename Parameter: n renamed to name in greet
Add Parameter in Python: Parameter 'tax_rate' of type 'object' added to function 'calculate_total'The output includes:
- Complete commit hash
- Commit message
- Parent commit ID for reference
- List of detected refactorings with specific details about each change
This project follows Test-Driven Development with the following implementation order:
We first developed the core refactoring detection functionality with Python as the initial supported language.
-
Initial Development:
-
Keyword Parameters Feature:
- Test Creation: 63ac58e - UMLModelTest
- Implementation: 0c1230d - UMLModel and Refactoring Detection Implementation
After establishing the core refactoring detection rules with Python, we extended support to C++ and JavaScript.
-
AST Visitor Development:
-
Refactoring Detection Test:
- Test: b8b9768 - Created CppRenameParameterTest
- AST Visitor Development:
- Refactoring Detection Test:
- Test: 5603ae0 - Created JSRenameParameterTest
-
We first developed and tested the core refactoring detection rules using Python as our initial language.
-
Once the core rules were working with Python, we created specific tests for C++ and JavaScript (CppRenameParameterTest and JSRenameParameterTest) to verify that our refactoring detection rules could be applied to these languages.
-
For each language, we followed TDD by:
- First creating tests for language-specific AST visitors
- Implementing the visitors
- Validating that both the visitor tests and refactoring detection tests passed
Each class in the application has a well-defined single responsibility:
- Language-specific visitors (
PythonASTVisitor,JSASTVisitor,CPPASTVisitor): Each handles AST traversal for a single language UMLModelReader: Responsible solely for reading source files and creating UML modelsUMLModelDiff: Focused solely on comparing two UML models to detect refactorings
The application is designed to be open for extension but closed for modification:
- Abstract
ASTVisitorclass allows adding new language support without modifying existing code - New visitors can be added by extending the base visitor class
- RefactoringType enum can be extended with new refactoring types
Example from ASTVisitor.java:
public abstract class ASTVisitor {
// Base class that can be extended for new languages
protected abstract void processModule(ASTNode node);
protected abstract void processClass(ASTNode node);
protected abstract void processMethod(ASTNode node);
}Language-specific visitors can be used interchangeably through the base ASTVisitor class:
ASTVisitor visitor;
if (extension.equals("py")) {
visitor = new PythonASTVisitor(model, content, filePath);
} else if (extension.equals("js")) {
visitor = new JSASTVisitor(model, content, filePath);
} else if (extension.equals("cpp")) {
visitor = new CPPASTVisitor(model, content, filePath);
}Interfaces are kept focused and minimal:
GitServiceinterface defines only essential Git operationsGitHistoryTreefactorinterface specifies only refactoring detection methods
Example from GitService.java:
public interface GitService {
Repository openRepository(String Folder) throws Exception;
RevWalk createAllRevsWalk(Repository repository, String branch) throws Exception;
}High-level modules depend on abstractions:
UMLModelDiffdepends on abstractUMLModelrather than concrete implementations- Visitors depend on abstract
ASTNodeinterface rather than concrete node implementations
LCOM (Lack of Cohesion of Methods) values:
- UMLModelReader: 0.0 (High cohesion)
- UMLModelDiff: 0.0 (High cohesion)
- Language-specific visitors: Average 0.1 (High cohesion)
Example of loose coupling:
UMLModelReader depends on abstract ASTVisitor, not concrete implementations:
ASTVisitor visitor = createVisitor(filePath, content);
if (visitor != null) {
visitor.visit(astRoot);
}Private fields and methods are used to encapsulate implementation details:
public class UMLModelDiff {
private final UMLModel oldModel;
private final UMLModel newModel;
private final List<UMLOperationBodyMapper> operationBodyMappers;
private void mapOperations() { ... }
private void mapClasses() { ... }
}Common functionality is extracted into reusable methods and classes:
public abstract class ASTVisitor {
protected ASTNode findChildByType(ASTNode parent, String type) {
// Reusable method for all visitors
if (parent == null) return null;
for (ASTNode child : parent.getChildren()) {
if (child.getType().equals(type)) {
return child;
}
}
return null;
}
}