Douglas Gaskell Douglas Gaskell - 2 months ago 7
Javascript Question

Methods for de-obfuscating javascript that uses string concatenation for property names

I am trying to puzzle out a way to de-obfuscate javascript that looks like this:

https://jsfiddle.net/douglasg14b/4951br9f/2/



var testString = 'Test | String'

var wf6 = {
fq4: 'su',
k8d: 'bs',
l8z: 'tri',
cy1: 'ng',
t5j: 'te',
ol: 'stS',
x3q: 'tri',
l9x: 'ng',
gh: 'xO'
};


//Obfuscated
let test1 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](4,11);

//Normal
let test2 = testString.substring(4,11);
let test3;

//More complex obfuscation
(function moreComplex(){
let h = "i",
w = "nde",
T0 = "f",
hj = '|',
a = eval(wf6.t5j + wf6.ol + wf6.x3q + wf6.l9x).length;
//Obfuscated
test3 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](testString[h + w + wf6.gh + T0](hj), a);

//Normal
let test4 = testString.substring(testString.indexOf('|'), testString.length);

})();

$('.span1').text(test1);
$('.span2').text(test3);

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span class="span1"></span><br>
<span class="span2"></span>





This is a small example, the file I'm working with is ~60k lines long and is full this kind of obfuscation. Everywhere a string can be used as a property name, this kind of obfuscation is used.

The way I can think of doing this, is to evaluate all the string concatenations so they are turned into a readable equivalent. Though, I am not sure how to go about this and ignore all the other working code that exists between all the concatenations.

Thoughts?

Bonus question: Is there a commonly used name for this kind of obfuscation that might make searches a bit easier?

Edit: Added a more complex example.

Answer

You have the basic idea right: you have to partially-evaluate the program and precompute all the constant computations. In your case, the constant computations of main interest are the concatenation steps over values which don't change.

To do this, you need a program transformation system (PTS). This is a tool that will read/parse source code for a specified language and build an abstract syntax tree, allow you specify transformations and analyses over the AST, and run those, and then spit out the modified AST as source code again.

In your case, you obviously want a PTS that is wired to know JavaScript out of the box (rare) or is willing to accept a description of JavaScript and then read JavaScript (more typical) with the hope that you can build or get a JavaScript description easily. [I build a PTS that has JavaScript descriptions available, see my bio].

With that in hand, you need to:

  • code an analyzer that inspects each variable found in an expression to see if that expression is constant (e.g., "wf6"). To demonstrate it is constant, you will have to find the variable definition, and check that all the values used in the variable definition are themselves constants. If there is more than one variable definition, you might have to check that all definitions produce the same value. You need to check for side-effects on the variable (e.g, there are no function calls "foo(...,wf6,...)" which would allow the variable's value to be modified). You need to worry about whether an eval command to accomplish such a side effect exists [this is virtually impossible to do, so you often have to just ignore evals and assume they do not do such things]. Many PTSes will have a way to allow you to build such analyzers; some are easier than others.
  • For every constant valued variable, substitute the value of that variable in the code
  • For every constant-valued sub-expression after such substitutions, "fold" (calculate) the result of that expression and substitute that value for that subexpression and repeat until no more folding is possible. Obviously you want to do this for at least all "+" operators. [OP just modified his example; he'll want to do it for "eval" operators too when all its operands are constant].
  • You may have to iterate this process, as folding an expression may make it obvious that a variable now has a constant value

The above process is called "constant propagation" in the compiler literature and is a feature of many compilers.

In your case, you could restrict the constant folding to just string concatenates. However, once you have adequate machinery to do constant value propagation, doing all or most operators on constants isn't that hard. You may need this to undo other obfuscations involving constants since that seems to be the obfuscation style used on the code you are working on.

You'll need a special rule that transforms

var['string'](args)

into

 var.string(args)

as a final step.

You have another complication: that is knowing that you have all the JavaScript relevant to producing constant-valued variables. A single web page may have many included chunks of JavaScript; you will need all of them to demonstrate there are no side effects on a variable. I assume in your case you are sure you have it all.

With respect to producing known-constant values, you may have worry about a tricky case: an expression that produces constant values from non-constant operands. Imagine the obfuscated expression was:

   x=random(); // produce a value between 0 and 1
   one=x+(1-x); // not constant by constant propagation, but constant by algebraic relations
   teststring['st'[one]+'vu'[one+1]+'bz'[one]+...](4,11)

You can see it always computes 'substring' as a property. You can add a transformation rule that understands the trick used to compute "one", e.g., a rule for each algebraic trick used to compute known constants. Unfortunately for you, there's an infinite number of algebra theorems one can use to manufacture constants; how many are really used in your example bit of code? [Welcome to the problem of reverse engineering with a smart adversary].

Nope, none of this "easy". Presumably that's why the obfuscation method used was chosen.