Distributing malware by attaching tainted documents to emails is one of the oldest tricks in the book. It's not just a theoretical risk—real attackers use malicious documents to infect targets all the time. So on top of its anti-spam and anti-phishing efforts, Gmail expanded its malware detection capabilities at the end of last year to include more tailored document monitoring. Good news, it's working.
At the RSA security conference in San Francisco on Tuesday, Google's security and anti-abuse research lead Elie Bursztein will present findings on how the new deep-learning scanner for documents is faring against the 300 billion attachments it has to process each week. It's challenging to tell the difference between legitimate documents in all their infinite variations and those that have specifically been manipulated to conceal something dangerous. Google says that 63 percent of the malicious documents it blocks each day are different than the ones its systems flagged the day before. But this is exactly the type of pattern-recognition problem where deep learning can be helpful.
Currently 56 percent of malware threats against Gmail users come from Microsoft Office documents, and 2 percent come from PDFs. In the months that it's been active, the new scanner has increased its daily malicious Office document detection by 10 percent.
"Ten percent matters," Bursztein told WIRED. "We're trying to close the gap as much as possible. We want to keep adding machine learning everywhere we can, where it makes sense. Machine learning does amazing things sometimes, but sometimes it’s overhyped. We try to use it as an extra layer rather than the only layer. We think that works way better."