Our purpose is to present a cyber intelligence system created to analyze network communications in order to detect and identify botnet activities and distribution of malware related to botnets, both over the internet and within targeted networks. The ever increasing dependance of states and businesses from interconnected resources and the rapid evolution of highly-complex malware used by botnets threatens to hold any enterprise hostage. Botnets create serious security and data issues and can be effective tools for both cyber-espionage and cyber-crime, as a bot typically runs hidden and is capable of communicating with its command and control server using covert or encrypted channels, operating like a intelligent cyber agent.
The technical solution we have designed utilizes a combination of behavioral analysis and artificial intelligence techniques to process live or recorded information coming from a variety of sources, and performs cross cluster correlation and multi variate analysis to generate actionable intelligence. Currently supports analysis of flow records, inspection of fourteen different network protocols, spam records, DNS and whois answers, file metadata, binary files and log files. Thanks to a multi agent architecture we are able to decode and analyze each connection both at packet and application level, messages sent and received by a bot can be inspected for malicious contents or modified to mimic a successful and unhampered operation, while data streams can be cryptographically analyzed and decoded when possible. All imported data is checked to remove duplicate information, for each file or partial string imported but also at binary level by storing only unique sequences of bytes.
The intelligence generated is encrypted and stored using a combination of graph databases and distributed hash tables to ensure that stored information are protected and easily searchable, while results are made available to analysts only through encrypted channels. We will provide two cases where such solution could be used effectively with 100 billion records and up to which extent this approach can be effective. We have implemented our prototype system because we believe a better and more automated way of correlating information can be vastly beneficial, as initial tests shows that in the detection process both efficiency and accuracy can be improved with the proposed approach.