OP 02 October, 2024 - 08:18 PM
Among the readers, there are probably those who prefer the topic of script obfuscation to assembler and other low-level sorcery. This is the topic, for example, of my recent article " JSFuck. Analyzing a unique method of obfuscating JS code " or the review article about obfuscators on Habr mentioned in it . But what if the custom deobfuscators listed in this article do not help? In this case, you will have to bypass obfuscation yourself, and now I will tell you how to do it.
source :
https://xakep.ru/2024/06/05/jsfuck/
https://habr.com/ru/companies/skillfacto...es/814801/
We use automatic deobfuscators
Let's take a JavaScript browser application as an example. It's about three megabytes in size, about three-quarters of which is hard-obfuscated code that starts as shown in the following screenshot.
And this code ends like this.
If you have already read the article mentioned above , the characteristic names of the identifiers (_0x58cd18, _0x2f8935_0x321d33, _0x1e0595) should have given you the idea that the code is obfuscated by the obfuscator.io obfuscator. However, an attempt to deobfuscate it with a standard online deobfuscator at any settings does not bring a positive result: readable code simply does not appear in the right window.
source :
https://habr.com/ru/companies/skillfacto...es/814801/
https://obf-io.deobfuscate.io/
Similarly, attempts at deobfuscation using other tools mentioned in the article do not yield results. For example, the universal de4js deobfuscator produces a completely uninformative result.
source : https://lelinhtinh.github.io/de4js/
It looks like there is little hope for automatic deobfuscators and we will have to learn to work manually.
INFO
Looking ahead, I will say that the tools listed above have not exhausted all the means of automatic deobfuscation. For example, you can try to optimize the code through the Llama neural network . The webcrack project can automatically deobfuscate such code , but let's pretend that we failed using automatic tools - it's much more interesting!
source :
https://habr.com/ru/articles/722780/
https://github.com/j4k0xb/webcrack
Deobfuscate the code manually
First, let's run JS Beautifier on the raw code to make it readable. After running through the now structured code, we notice numerous function calls with ten hexadecimal constants as parameters:
source : https://beautifier.io/
It is logical to assume that constants are encrypted in this way, which first need to be translated into a normal readable form. As usual, let's start from the end. The code ends with the following fragment:
If you think logically, it's console.log(_0x321d33), that is _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073) == "log". Let's try to turn this stuff back: look for the line function _0x1e0595 in the code:
As you can see, this crap refers to another sub-crap named _0x340121. Let's look for it too:
This sub-crap, in turn, refers to a proto-crap named _0x3a86, which, fortunately, is the last (or rather, the first) in this chain:
So far, everything is simple. All that remains is to find the array of string constants returned by _0x5e2d:
So, it looks like we've isolated the minimal piece of code responsible for generating obfuscated strings via the _0x1e0595 function:
In the future, we will have to automate the code search for each similar function (although there are many of them, they are of the same type). But for now, we will just try to make sure that we did everything correctly. We press F12 in the browser and paste the found code fragment into the console, trying to calculate the expression _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073).
At this point, we are convinced that the returned string is not log, but, on the contrary, awal, although the log string is also present in the original array. This means that we screwed up in the calculations somewhere or the obfuscator authors cleverly deceived us, although happiness was so close...
Let's take a closer look at the code. The top code snippet, starting with the comment IT IS NOT SAFE TO MAKE CHANGES IN THE CODE BELOW, cleverly shuffles the array of string constants _0x552e21 after it is initialized.
It is interesting that in the while condition we find a construction familiar to us from JSFuck (!![])==true. Upon closer examination, we notice that such constants, together with the inverse variant (![])==false, are generously scattered throughout the obfuscated code. We make a note to ourselves for the future to change them in the code with a global replacement, after which we insert the fragment shown in the previous screenshot into the beginning of our “nuclear code” and run the test again. This time everything matches, the result is correct: _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073) == "log".
source: https://xakep.ru/2024/06/05/jsfuck/
This is where the interesting and exciting research part ends and the coding begins, although not complicated, but rather routine.
Writing a deobfuscator
Our deobfuscator construction plan will be as follows.
In the image and likeness of the above-described process of dissecting the _0x1e0595 function, we completely form the "core" of functions that will decode string constants. To do this, we look for all functions that match this template:
An approximate implementation of this action using regular expressions in JavaScript looks like this:
Here string is the source code, the output is functions — an array of function code that needs to be added to the “nuclear” code and simultaneously removed from the source code; names — a list of names of these functions.
We are looking for functions of the form
Here Name1 is the name of the function, from the list of names obtained in step 1. The code looks like this:
We get a new list of functions1 and their names names1, which we also add to the core and remove from the source code.
We repeat the function search, each time substituting the names list obtained at the previous stage for the names list until the list is empty at the next step. The final code looks something like this:
At the output, functions are all the parasitic functions generated by the obfuscator, and names are their names.
Now that the "core" is formed, we simply iterate over all the enumerable expressions of the form
Name1 is the name of the function from the list of names obtained in the previous steps. We calculate them using the following code:
The output is an array expressions, which contains enumerable expressions of the form _0x4a0111(0xa0c, 0xc0b, 0x13ef, 0x1e3e, 0x15e2, 0x1a29, 0x1b08, 0x94b, 0x968, 0x753) and the constants corresponding to them. All we have to do is replace the first ones with the second ones in the obfuscated code using a regular global replacement.
As a result, we get partially deobfuscated code, in which at least string constants and names of standard methods will be presented in an explicit form. This code can already be analyzed, edited, and can be fed to other deobfuscators in parts to make it fully readable.
Of course, this is far from complete deobfuscation of the original application. In the pursuit of perfection, expressions of the form Class["MethodName"] can be folded into Class.MethodName. Here's a slightly more advanced version of the code in Object.MethodName:
Finally, you can perform several dictionary transformations of the following type:
As a result, we will get code that is close to readable, in which only the original names of variables and functions will be irretrievably lost.
Conclusions
As is my favorite habit, I was a little disingenuous and described only the simplest and fastest way to partially deobfuscate JavaScript code. Of course, to write a serious full deobfuscator, a simple search and replace of regular expressions is not enough. To do this, you should write your own JavaScript machine emulator, as, for example, is implemented in the webcrack deobfuscator I mentioned . Perhaps I will tell you more about this someday, but for now, those interested and impatient can dig into the source code of this project themselves.
leaving a like is much appreciated and help me to keep publishing threads.
source :
https://xakep.ru/2024/06/05/jsfuck/
https://habr.com/ru/companies/skillfacto...es/814801/
We use automatic deobfuscators
Let's take a JavaScript browser application as an example. It's about three megabytes in size, about three-quarters of which is hard-obfuscated code that starts as shown in the following screenshot.
And this code ends like this.
If you have already read the article mentioned above , the characteristic names of the identifiers (_0x58cd18, _0x2f8935_0x321d33, _0x1e0595) should have given you the idea that the code is obfuscated by the obfuscator.io obfuscator. However, an attempt to deobfuscate it with a standard online deobfuscator at any settings does not bring a positive result: readable code simply does not appear in the right window.
source :
https://habr.com/ru/companies/skillfacto...es/814801/
https://obf-io.deobfuscate.io/
Similarly, attempts at deobfuscation using other tools mentioned in the article do not yield results. For example, the universal de4js deobfuscator produces a completely uninformative result.
source : https://lelinhtinh.github.io/de4js/
It looks like there is little hope for automatic deobfuscators and we will have to learn to work manually.
INFO
Looking ahead, I will say that the tools listed above have not exhausted all the means of automatic deobfuscation. For example, you can try to optimize the code through the Llama neural network . The webcrack project can automatically deobfuscate such code , but let's pretend that we failed using automatic tools - it's much more interesting!
source :
https://habr.com/ru/articles/722780/
https://github.com/j4k0xb/webcrack
Deobfuscate the code manually
First, let's run JS Beautifier on the raw code to make it readable. After running through the now structured code, we notice numerous function calls with ten hexadecimal constants as parameters:
source : https://beautifier.io/
It is logical to assume that constants are encrypted in this way, which first need to be translated into a normal readable form. As usual, let's start from the end. The code ends with the following fragment:
If you think logically, it's console.log(_0x321d33), that is _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073) == "log". Let's try to turn this stuff back: look for the line function _0x1e0595 in the code:
As you can see, this crap refers to another sub-crap named _0x340121. Let's look for it too:
This sub-crap, in turn, refers to a proto-crap named _0x3a86, which, fortunately, is the last (or rather, the first) in this chain:
So far, everything is simple. All that remains is to find the array of string constants returned by _0x5e2d:
So, it looks like we've isolated the minimal piece of code responsible for generating obfuscated strings via the _0x1e0595 function:
In the future, we will have to automate the code search for each similar function (although there are many of them, they are of the same type). But for now, we will just try to make sure that we did everything correctly. We press F12 in the browser and paste the found code fragment into the console, trying to calculate the expression _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073).
At this point, we are convinced that the returned string is not log, but, on the contrary, awal, although the log string is also present in the original array. This means that we screwed up in the calculations somewhere or the obfuscator authors cleverly deceived us, although happiness was so close...
Let's take a closer look at the code. The top code snippet, starting with the comment IT IS NOT SAFE TO MAKE CHANGES IN THE CODE BELOW, cleverly shuffles the array of string constants _0x552e21 after it is initialized.
It is interesting that in the while condition we find a construction familiar to us from JSFuck (!![])==true. Upon closer examination, we notice that such constants, together with the inverse variant (![])==false, are generously scattered throughout the obfuscated code. We make a note to ourselves for the future to change them in the code with a global replacement, after which we insert the fragment shown in the previous screenshot into the beginning of our “nuclear code” and run the test again. This time everything matches, the result is correct: _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073) == "log".
source: https://xakep.ru/2024/06/05/jsfuck/
This is where the interesting and exciting research part ends and the coding begins, although not complicated, but rather routine.
Writing a deobfuscator
Our deobfuscator construction plan will be as follows.
In the image and likeness of the above-described process of dissecting the _0x1e0595 function, we completely form the "core" of functions that will decode string constants. To do this, we look for all functions that match this template:
An approximate implementation of this action using regular expressions in JavaScript looks like this:
Here string is the source code, the output is functions — an array of function code that needs to be added to the “nuclear” code and simultaneously removed from the source code; names — a list of names of these functions.
We are looking for functions of the form
Here Name1 is the name of the function, from the list of names obtained in step 1. The code looks like this:
We get a new list of functions1 and their names names1, which we also add to the core and remove from the source code.
We repeat the function search, each time substituting the names list obtained at the previous stage for the names list until the list is empty at the next step. The final code looks something like this:
At the output, functions are all the parasitic functions generated by the obfuscator, and names are their names.
Now that the "core" is formed, we simply iterate over all the enumerable expressions of the form
Name1 is the name of the function from the list of names obtained in the previous steps. We calculate them using the following code:
The output is an array expressions, which contains enumerable expressions of the form _0x4a0111(0xa0c, 0xc0b, 0x13ef, 0x1e3e, 0x15e2, 0x1a29, 0x1b08, 0x94b, 0x968, 0x753) and the constants corresponding to them. All we have to do is replace the first ones with the second ones in the obfuscated code using a regular global replacement.
As a result, we get partially deobfuscated code, in which at least string constants and names of standard methods will be presented in an explicit form. This code can already be analyzed, edited, and can be fed to other deobfuscators in parts to make it fully readable.
Of course, this is far from complete deobfuscation of the original application. In the pursuit of perfection, expressions of the form Class["MethodName"] can be folded into Class.MethodName. Here's a slightly more advanced version of the code in Object.MethodName:
Finally, you can perform several dictionary transformations of the following type:
As a result, we will get code that is close to readable, in which only the original names of variables and functions will be irretrievably lost.
Conclusions
As is my favorite habit, I was a little disingenuous and described only the simplest and fastest way to partially deobfuscate JavaScript code. Of course, to write a serious full deobfuscator, a simple search and replace of regular expressions is not enough. To do this, you should write your own JavaScript machine emulator, as, for example, is implemented in the webcrack deobfuscator I mentioned . Perhaps I will tell you more about this someday, but for now, those interested and impatient can dig into the source code of this project themselves.
leaving a like is much appreciated and help me to keep publishing threads.