This is a fork of the small and fast TCC. It adds scrambling for both generated code and data. You can obfuscate all calls and long jumps, the parameters passed to external (library) functions, parameters passed to local functions, the stack (size of local variables and their references), the functions prolog and epilog (returns are replaced with jumps). You can also encrypt your code data section by XORing it with a LFSR. TCCX and README.
This patch is the scrambling functionality for Intel 32bits processors.
Code and Data Scramble
With the 'x' switch the compiler heavily pollutes the generated code making it larger and slower. The purpose of this operation is to obfuscate the generated code and make reverse engineering harder.
The 'x' switch by itself will enable all the scramble options. You may select individual scrambling features by listing them after the '-x':
- 'c' obfuscate all calls
- 'j' obfuscate all long jumps
- 'f' obfuscate parameters passed to external (library) functions
- 'p' obfuscate parameters passed to local functions
- 's' obfuscate the stack (size of local variables and their references)
- 'b' obfuscate functions prolog
- 'e' obfuscate functions epilog (returns are replaced with jumps)
- 'l' xor all strings with a LFSR and un-xor each when accessed
- 'd' encrypt data segment with a LFSR
The LFSR initial value as well as the unscrambling code is different with every compile. TCC generates read only objects in the data section (rather than rodata) and the entire section can be encrypted (xor-ed).
If strings obfuscation is turned on then each string will have a 16 byte variable header and each string will be xor-ed with a random LFSR seed. This option is particularly useful because it can be turned on per compiled object. The strings will be decrypted only at run time. The code overhead for each reference to a string or to a struct/array of strings is 84 bytes. A string is decrypted in place when it is first referenced, any further references will use the already decrypted string.
Code obfuscation is mainly achieved by inserting random data between genuine operations. This tricks disassemblers because they will try to disassemble the random data. They will miss real opcodes due to variable size garbage instructions engulfing the former ones. All addressing is changed to offset addressing using a variable base (usually in ebx). This prevents disassemblers to generate any cross-references for both functions and data.
The scrambling functionality is a patch against a stripped down version of tcc 0.9.27 which handles exclusively only i386 code. Both the Linux version and the cross-compiled version which generates Windows code work. The current release passes tcctest with all scrambling switches enabled. If you want to compile this compiler for a Windows platform you will probably fail (mainly because I'm using /dev/urandom). I have no plans to make it work for Windows because I'm not interested in that platform. You can generate Windows code from Linux with the cross-compiler.
Generating static executables was broken on my Debian system (with stock tcc 0.9.27) so I've patched this version to use dietlibc. This has the advantage of making small executables which run on any kernel version (the bloated libc checks kernel versions and refuses to run even if you don't need any of the 2.6 functionality). The -run switch (used for C scripting) now creates (in memory) static versions of your C. This is faster and the program occupies less space. If you don't want statically linked scripts you'll have to use -rdynamic with the -run switch.
This was my hobby project over the last few days. I don't plan to support this fork, add scrambling for other processors or add any other features from future tcc releases. The scrambling generates only Intel 32bit code for Linux and Windows.