Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,16 @@ This product is a derivative work based on the following projects:
Licensed under the Apache License, Version 2.0
https://github.com/ben-sb/javascript-deobfuscator

3. webcrack (v2.14.1)
Copyright (c) 2023 j4k0xb
Licensed under the MIT License
https://github.com/j4k0xb/webcrack

This Python library re-implements the deobfuscation algorithms and transform
logic from the above Node.js/Babel-based tools in pure Python. No source code
was directly copied; the implementations were written from scratch following
the same algorithmic approaches.
logic from the above projects in pure Python. No source code was directly
copied; the implementations were written from scratch following the same
algorithmic approaches.

Test dataset: obfuscated JavaScript samples from the JSIMPLIFIER dataset
(https://zenodo.org/records/17531662) are included in tests/resources/ for
evaluation purposes only. They are not part of the distributed package.
28 changes: 13 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,12 @@ pyjsclear input.js --max-iterations 20

## What it does

PyJSClear applies ~40 transforms in a multi-pass loop until the code
stabilises (default limit: 50 iterations). A final one-shot pass renames
PyJSClear applies transforms in a multi-pass loop until the code
stabilizes (default limit: 50 iterations). A final one-shot pass renames
variables and converts var/let to const.

**Capabilities:**
- Whole-file encoding detection: JSFuck, JJEncode, AAEncode, eval-packing
- String array decoding (obfuscator.io basic/base64/RC4, XOR, class-based)
- Constant propagation & reassignment elimination
- Dead code / dead branch / unreachable code removal
Expand All @@ -73,28 +74,25 @@ variables and converts var/let to const.
Large files (>500 KB / >50 K AST nodes) automatically use a lite mode
that skips expensive transforms.

## Testing

```bash
pytest tests/ # all tests
pytest tests/test_regression.py # regression suite (62 tests across 25 samples)
pytest tests/ -n auto # parallel execution (requires pytest-xdist)
```

## Limitations

- **Optimised for obfuscator.io output.** Other obfuscation tools may only partially deobfuscate.
- **Best results on obfuscator.io output.** JSFuck, JJEncode, AAEncode, and eval-packed code are fully decoded; other obfuscation tools may only partially deobfuscate.
- **Large files get reduced treatment.** Files >500 KB or ASTs >50 K nodes skip expensive transforms; files >2 MB use a minimal lite mode.
- **No minification reversal.** Minified-but-not-obfuscated code won't be reformatted or beautified.
- **Recursive AST traversal** may hit Python's default recursion limit (~1 000 frames) on extremely deep nesting.
- **Recursive AST traversal** may hit Python's default recursion limit (~1 000 frames) on extremely deep nesting; the deobfuscator catches this and returns the best partial result.

## License

Apache License 2.0 — see [LICENSE](LICENSE).

This project is a derivative work based on
[obfuscator-io-deobfuscator](https://github.com/ben-sb/obfuscator-io-deobfuscator)
(Apache 2.0) and
(Apache 2.0),
[javascript-deobfuscator](https://github.com/ben-sb/javascript-deobfuscator)
(Apache 2.0). See [THIRD_PARTY_LICENSES.md](THIRD_PARTY_LICENSES.md) and
(Apache 2.0), and
[webcrack](https://github.com/j4k0xb/webcrack) (MIT).
See [THIRD_PARTY_LICENSES.md](THIRD_PARTY_LICENSES.md) and
[NOTICE](NOTICE) for full attribution.

Test samples include obfuscated JavaScript from the
[JSIMPLIFIER dataset](https://zenodo.org/records/17531662) (GPL-3.0),
used solely for evaluation purposes.
27 changes: 27 additions & 0 deletions THIRD_PARTY_LICENSES.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,3 +237,30 @@ https://github.com/ben-sb/javascript-deobfuscator/blob/master/LICENSE).
**Features derived from this project:** hex escape decoding (`--he`),
static array unpacking (`--su`), property access transformation (`--tp`).

---

## webcrack

- **Version:** 2.14.1
- **Author:** j4k0xb
- **Repository:** https://github.com/j4k0xb/webcrack
- **License:** MIT

See [licenses/LICENSE-webcrack](licenses/LICENSE-webcrack) for the full license text.

**Features derived from this project:** general deobfuscation transform
patterns and architecture reference.

---

## Test dataset

- **JSIMPLIFIER dataset**
- **Author:** Dongchao Zhou
- **Source:** https://zenodo.org/records/17531662
- **License:** GPL-3.0

Obfuscated JavaScript samples from this dataset are included in
`tests/resources/` for evaluation purposes only. They are not part of the
distributed package and no code from JSIMPLIFIER is incorporated into
PyJSClear.
21 changes: 21 additions & 0 deletions licenses/LICENSE-webcrack
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 j4k0xb

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
2 changes: 1 addition & 1 deletion pyjsclear/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from .deobfuscator import Deobfuscator


__version__ = '0.1.2'
__version__ = '0.1.3'


def deobfuscate(code, max_iterations=50):
Expand Down
109 changes: 84 additions & 25 deletions pyjsclear/deobfuscator.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

from .generator import generate
from .parser import parse
from .transforms.aa_decode import aa_decode
from .transforms.aa_decode import is_aa_encoded
from .transforms.anti_tamper import AntiTamperRemover
from .transforms.class_static_resolver import ClassStaticResolver
from .transforms.class_string_decoder import ClassStringDecoder
Expand All @@ -28,6 +26,12 @@
from .transforms.hex_escapes import HexEscapes
from .transforms.hex_escapes import decode_hex_escapes_source
from .transforms.hex_numerics import HexNumerics
from .transforms.aa_decode import aa_decode
from .transforms.aa_decode import is_aa_encoded
from .transforms.jj_decode import is_jj_encoded
from .transforms.jj_decode import jj_decode
from .transforms.jsfuck_decode import is_jsfuck
from .transforms.jsfuck_decode import jsfuck_decode
from .transforms.logical_to_if import LogicalToIf
from .transforms.member_chain_resolver import MemberChainResolver
from .transforms.noop_calls import NoopCallRemover
Expand Down Expand Up @@ -127,12 +131,24 @@ def _run_pre_passes(self, code):
Returns decoded code if an encoding/packing was detected and decoded,
or None to continue with the normal AST pipeline.
"""
# JSFuck check (must be first — these are whole-file encodings)
if is_jsfuck(code):
decoded = jsfuck_decode(code)
if decoded:
return decoded

# AAEncode check
if is_aa_encoded(code):
decoded = aa_decode(code)
if decoded:
return decoded

# JJEncode check
if is_jj_encoded(code):
decoded = jj_decode(code)
if decoded:
return decoded

# Eval packer check
if is_eval_packed(code):
decoded = eval_unpack(code)
Expand All @@ -141,6 +157,9 @@ def _run_pre_passes(self, code):

return None

# Maximum number of outer re-parse cycles (generate → re-parse → re-transform)
_MAX_OUTER_CYCLES = 5

def execute(self):
"""Run all transforms and return cleaned source."""
code = self.original_code
Expand All @@ -162,25 +181,79 @@ def execute(self):
return decoded
return self.original_code

# Determine optimization mode based on code size
code_size = len(code)
# Outer loop: run AST transforms until generate→re-parse converges.
# Post-passes (VariableRenamer, VarToConst, LetToConst) only run on
# the final cycle to avoid interfering with subsequent transform rounds.
previous_code = code
last_changed_ast = None
try:
for _cycle in range(self._MAX_OUTER_CYCLES):
changed = self._run_ast_transforms(
ast,
code_size=len(previous_code),
)

if not changed:
break

last_changed_ast = ast

try:
generated = generate(ast)
except Exception:
break

if generated == previous_code:
break

previous_code = generated

# Re-parse for the next cycle
try:
ast = parse(generated)
except SyntaxError:
break

# Run post-passes on the final AST (always — they're cheap and handle
# cosmetic transforms like var→const even when no main transforms fired)
any_post_changed = False
for post_transform in [VariableRenamer, VarToConst, LetToConst]:
try:
if post_transform(ast).execute():
any_post_changed = True
except Exception:
pass

if last_changed_ast is None and not any_post_changed:
return self.original_code

try:
return generate(ast)
except Exception:
return previous_code
except RecursionError:
# Safety net: esprima's parser is purely recursive with no depth
# limit, so deeply nested JS hits Python's recursion limit during
# parsing or re-parsing. Our AST walkers are cheaper per level
# but also recursive. Return best result so far.
return previous_code

def _run_ast_transforms(self, ast, code_size=0):
"""Run all AST transform passes. Returns True if any transform changed the AST."""
node_count = _count_nodes(ast) if code_size > _LARGE_FILE_SIZE else 0

lite_mode = code_size > _MAX_CODE_SIZE
max_iterations = self.max_iterations
if code_size > _LARGE_FILE_SIZE:
max_iterations = min(max_iterations, _LITE_MAX_ITERATIONS)

# Check node count for expensive transform gating
node_count = _count_nodes(ast) if code_size > _LARGE_FILE_SIZE else 0

# For very large ASTs, further reduce iterations
if node_count > 100_000:
max_iterations = min(max_iterations, 3)

# Build transform list based on mode
transform_classes = TRANSFORM_CLASSES
if lite_mode:
transform_classes = [t for t in TRANSFORM_CLASSES if t not in _EXPENSIVE_TRANSFORMS]
elif node_count > _NODE_COUNT_LIMIT:
if lite_mode or node_count > _NODE_COUNT_LIMIT:
transform_classes = [t for t in TRANSFORM_CLASSES if t not in _EXPENSIVE_TRANSFORMS]

# Track which transforms are no longer productive
Expand Down Expand Up @@ -210,18 +283,4 @@ def execute(self):
if not modified:
break

# Post-passes: cosmetic transforms that run once after convergence
for post_transform in [VariableRenamer, VarToConst, LetToConst]:
try:
if post_transform(ast).execute():
any_transform_changed = True
except Exception:
pass

if not any_transform_changed:
return self.original_code

try:
return generate(ast)
except Exception:
return self.original_code
return any_transform_changed
Loading
Loading