Doppler supports a multitude of binary formats -- PE and PE64 (EXE, DLL), ELF (Binary and Library), Obj, Lib, Macho. This support extends beyond merely digesting the file-format to kickoff disassembly. For instance, useful semantic information, such as exception handlers, imports, exports, names, control-flow guard locations, etc., are enumerated and incorporated into the analysis passes and presentation. We have taken great care to incorporate as much information as possible, and to also resist tricks commonly employed by malware authors. However, if we've missed anything, binary formats parsing is easily extended.
Architectures are a pluggable component in Doppler, and we've already built plugins for x86, x86-64 and ARM CPUs. The current set of CPU implementations leverage the excellent Capstone Disassembler project, which itself is derived from LLVM. Architecture-specific concepts, such as passing parameters on the program stack, may be incorporated into analysis passes. We encourage customers to extend Doppler's set of CPUs through extensions, and we plan to continue this expansion ourselves with CPU extensions for Java, C#, Dalvik, ActionScript, etc., to make Doppler a robust and well-rounded analysis platform.
Amazing binary disassembly has little value without being organized into navigable set of connected components, and that is exactly what Doppler provides. In our humble opinion, no other tool comes close to Doppler's ability to simplify an analyst's workflow. After all, we did build Doppler with streamlining as one primary motivation. Of course you can also edit and annotate machine code as well. Our frontend extension API enables 3rd-parties to easily add more contextual annotations to code in an automated way, a capability that we will be leveraging to build-out an extensive set of 'semantic helpers' to give analysts further context.
A single standard SQLite database file houses all analysis findings -- functions, variables, imports, exports, strings, basic blocks, etc. -- as well as analyst annotations. Share this file with your peers to collaborate. We plan to introduce a collaboration server for the Enterprise edition of Doppler, which can be run on an isolated network, that will automate analysis sharing.
These reconfigurable analysis blocks are at the heart of what makes Doppler so special. We were frustrated with blackbox disassembly in other tools, as their properties and guarantees were unclear. Lacking an understanding of the disassembly algorithm resulted in poor security and compatibility guarantees of exploit mitigations built atop those binary analyses. We solved this problem by making disassembly in Doppler a transparent, configurable sequence of analysis passes, not unlike the sequence of optimization passes employed by compiler toolchains like LLVM. Use the default passes, or create or mix-and-match your own to add greater semantic understanding in the context of your particular problem.
Doppler is divided into two logical components -- frontend and backend -- which run as separate processes. The backend hosts our API as a set of RESTful JSON-RPC functions. Any language that can communicate over this standard API can automate any aspect of Doppler, or programmatically access the analysis results, such as functions, basic blocks, variables or instructions, or even decompiled source code. Indeed, the frontend populates all of it's views using this API. We are also planning to introduce a direct scripting API that makes this interaction simpler and more intuitive.
The frontend has it's own scripting API based on Microsoft's vscode platform. This consists of an extensive set of TypeScript APIs that can control any aspect of the various Doppler views. Add new backend extensions consisting of binary format parsers, CPU support, disassembly algorithms or analysis passes specific to your individual problem domain. Ultimately, we envision a tunable system of extensions wherein you can configure the workflow of passes from the frontend. We're not quite there yet, but stay tuned for announcements.
Building a generalized decompiler is critical, as we envision decompilation as a core feature for all programming languages analyzed, whether C/C++ or other languages such as Java, C# or ActionScript. Translating code to IR provides a solid foundation for such a generic decompiler, as all architectures are unified under a single instruction representation. While several approaches are available to achieve lifting, we chose a unique approach of piecing together emulated instruction snippets in the form of C/C++ source code, then compiling those source snippets into LLVM bitcode. The details are involved, but needless to say this approach gives us a significant edge over other decompilers in correctly lifting to LLVM IR. Frankly, this component alone makes Doppler an essential tool in every analysts kit, as LLVM bitcode is a pre-requisite to a multitude of analyses.
With LLVM bitcode in hand, the possibilities are limitless. Let's jump into the weeds for a moment. For the purpose of decompilation, the primary goal is to simplify the code and propagate information such as types and constants. Hence, we run several standard LLVM passes on the lifted bitcode, such as control and data-flow analyses, as well as dead code elimination. The end goal here is the polar opposite of a decompiler, we seek to un-optimize the code to bring it back as closely as possible to the original representation. That said, many existing LLVM analysis passes are entirely relevant and useful for decompilation, a fact of which we heavily take advantage.
Doppler pulls LLVM bitcode into ASTs representing the target programming language. While the code has already been simplified via LLVM analysis passes, it is not necessary human-friendly. For instance, the code is logically sound, but control and data flow may not be represented in the same order or style as originally written. Doppler performs passes on the ASTs to reorganize code statements to be more human-readable. As a final step, Doppler traverses the AST and prints the source code.
The multitude of steps in the process of lifting to source code is all abstrated away into a single action for the analyst -- 'View Source Code'! As with the other interactive views, you can navigate from function to function, completely avoiding assembly code if you like.
The prototypical 'Hello World' program can be imbued with hundreds of additional functions that relate to compiler runtime code to assist in bootstrapping program execution and error handling. This can make inspecting even small programs inefficient. Doppler assists you in avoiding inspection of uninteresting functions. We've indexed functions from multiple versions of compilers so you can immediately hone in on what is important.
We've indexed functions from hundreds of popular C/C++ libraries. Doppler names these functions when they are identified, so you can immediately move past it if you have other priorities, or scrutinize the presence of potentially vulnerable embedded libraries.
One of the first tasks in malware analysis is to orient yourself by identifying how that particular sample fits into the larger picture. The question of whether this malware has been seen before, and in what family, must be answered. We've indexed functions from thousands of samples in hundreds of malware families. Doppler uses those databases to both indicate that a function was previously observed in malware, and when malware symbols were available in our indexing, Doppler names the malicious function.
We built Doppler to scale. In fact, we've used multiple concurrent instances of Doppler in 'headless' mode on a 64-core server as a component in a malware learning system, but that's a story for another time. The point is that Doppler's core analysis algorithms and structures are highly optimized, which not only enables scaling the number of concurrent analyses using separate instances of Doppler, but also enables scaling to operate on huge binaries.
We don't test Doppler on toy binaries, instead we test on large, complex, applications like web browsers. If you need to analyze real-world binaries, Doppler is the best option with no contest.
Open Source Zelos