- Feb 13, 2017
- 1,486
In 2016, approximately 185 million new Internet users went online, with the vast majority of these coming from nations like India. This represents a huge increase in the market. However, while the Internet population continues to grow, there has also been an increase in bots as well. The word “bot” covers a wide variety of automated programs: while some source data for search engines and help people match their queries with the most appropriate websites, others are not so helpful.
In the past year, bad bots accounted for 19.9 percent of all website traffic — a 6.98 percent increase over the same time in 2015. Bad bots interact with applications in the same way a legitimate user would, making them harder to prevent. However, the results are harmful: for example, bad bots can take data from sites without permission while others undertake criminal activities such as ad fraud and account theft.
Bots enable high-speed abuse, misuse, and attacks on websites and APIs. They enable attackers, unsavory competitors and fraudsters to perform a wide array of malicious activities, including web scraping, competitive data mining, personal and financial data harvesting, brute force login and man-in-the-middle attacks, digital ad fraud, spam, transaction fraud, and more.
The bad bot problem has become so rampant it has earned its first piece of US federal legislation. In an attempt to make the use of ticket scraping bots illegal, the US Congress passed the Better Online Ticket Sales Act. Similarly, governments in the UK and Canada are also looking at introducing new laws to stop automated ticket purchasing by bots. While legislation is a welcome deterrent, it’s difficult to legislate against those you can’t identify. Bad bots continue to exist under the radar and they are looking to stay.
What does the data say?
Using our network, we looked for trends in how bots are developing, including hundreds of billions of bad bot requests, anonymized over thousands of domains. As part of this, we focused on bad bot activity at the application layer as these attacks differ from the simple volumetric Distributed Denial of Service attacks that typically grab the headlines. Here are some of our top findings:
1. Bigger site? Bigger target
Bad bots don’t sleep — they’re everywhere, at all times. But even though bad bots are active on all sites, the larger sites were hit the hardest in 2016. Bad bots accounted for 21.83 percent of large website web traffic, which saw an increase of 36.43 percent since last year.
Larger sites are generally ranked higher in search engine results because humans rarely look past the first few search engine results. Smaller sites don’t get the same level of SEO traffic uplift so large and medium sites are more enticing targets for bad bots.
2. Bad bots lie
Bad bots must lie about who they are to avoid detection. They do this by reporting their user agent as a web browser or mobile device. In 2016 the majority of bad bots claimed to be the most popular browsers: Chrome, Safari, Internet Explorer, and Firefox. Chrome was at the top spot.
Alongside this, there was also a 42.78 percent year-over-year increase in bad bots claiming to be mobile browsers. For the first time, mobile Safari made the top five list of self-reported user agents, outranking web Safari by 17 percent.
3. If you build it, bots will come
When it comes to the attractiveness of a website, bad bots have a type. There are four key website features bad bots look for:
4. The weaponization of the data center
Data centers were the weapon of choice for bad bots in 2016, with 60.1 percent coming from the cloud. Amazon AWS was the top originating ISP for the third year in a row with 16.37 percent of all bad bot traffic — four times more than the next ISP.
But why use central data centers rather than the traditional “zombie” PC that is part of a botnet, which is more typically used for DDoS attacks? The answer here is that it’s never been easier to build bad bots with open source software or cheaper to launch them from globally distributed networks using the cloud. These data centers can scale up faster and more efficiently for bot attacks on application layers, while steps like masking IP addresses has become easy and essential within bot deployments. This centralized approach is easier to manage when it comes to fraud and account theft campaigns.
5. Out of date? Out of luck
Humans aren’t the only ones falling behind on software updates; it turns out bad bots have the same problem. One in every ten of bad bots said they were using browser versions released before 2013 — some were reporting browser versions released as far back as 1999.
But why are bad bots reporting as out-of-date browsers? Perhaps some were written many years ago and are still at work today. Some may have been targeting specific systems that only accept specific browser versions. Others may be have been out-of-control programs, bouncing around the Internet in endless loops, still causing collateral damage.
6. The continuing rise of advanced persistent bots
In 2016, 75 percent of bad bots were Advanced Persistent Bots (APBs). Today’s advanced persistent bots are more sophisticated as they can load JavaScript, hold onto cookies and load up external resources — this makes them more effective in their attacks. Similarly, bots can carry out obfuscation techniques to randomize the IP address, headers, and user agents associated with their activity. This helps them to hide in the noise of everyday activity.
APBs can carry out highly progressive attacks, such as account-based abuse and transaction fraud, which require multiple steps and deeper penetration into the web application. If you’re using a web application firewall (WAF) and are filtering out known violator user agents and IP addresses, that’s a good start. However, bad bots rotate through IPs and cycle through user agents to evade these WAF filters. You’ll need a way to differentiate humans from bad bots that are using headless browsers, browser automation tools, and man-in-the-browser malware campaigns.
7. Is the USA the bot superpower?
The US has topped the list of bad bot originating countries for the third year in a row. In fact, the US had a larger amount of total bad bot traffic (55.4 percent) than all other countries combined. The Netherlands generated 11.4 percent of bad bot traffic and was the next closest country, while China reached the top three for bad bots for the first time. South Korea made the biggest jump, up 14 spots from 2015.
But does over half of all cybercrime really come from US citizens? A spammer bot might originate from a US data center, but the perpetrator responsible for it could be located anywhere in the world. Thanks to virtual private data centers such as Amazon AWS, cyber crooks can leverage US-based ISPs to carry out their attacks as if they originated inside America and avoid location-based blocking techniques.
What can you do about bots?
As much as they try to hide their activity, there are some results from bad bot attacks that can be noticed. Normally, these results may not be explained within traditional monitoring tools. For example, you can tell significant volumes of bad bot traffic when unexpected spikes in traffic cause slowdowns without a concomitant increase in sales traffic. Another example might be where your site’s search rankings plummet due to content theft and data being scraped. Similarly, you might see poor results from misguided ad spend as a result of skewed analytics.
Other pointers to bad bot activity might be that your company sees high numbers of failed login attempts and increased customer complaints regarding account lockouts. Bad bots will leave fake posts, malicious backlinks, and competitor ads in your forums and customer review sections.
In order to filter out bad bots, it’s worth taking the time to learn about the most attractive areas of your website and find out if they are all properly secured against bots. One way to choke off bad bots is to geo-fence your website by blocking users from foreign nations where your company doesn’t do business.
Similarly, it can be worth looking at the audience profile for your customers — is there is a good reason why users would be on browsers that are several years and multiple updates past their release date? If not, having a whitelist policy that imposes browser version age limits stops up to 10 per cent of bad bots. Also consider if all automated programs, even ones that aren’t search engine crawlers or pre-approved tools, belong on your site. Consider setting up filters to block all other bots — this can block up to 25 percent of bad bots.
The best way to deal with bots is to monitor and respond on all your web and mobile traffic in real-time so that you see the next bad bot attack coming and stop it in its tracks. This approach relies on using more intelligence and automation to spot activities — rather than relying on human oversight of analytics logs, security can be maintained through better use of data and machine learning over time.
In the past year, bad bots accounted for 19.9 percent of all website traffic — a 6.98 percent increase over the same time in 2015. Bad bots interact with applications in the same way a legitimate user would, making them harder to prevent. However, the results are harmful: for example, bad bots can take data from sites without permission while others undertake criminal activities such as ad fraud and account theft.
Bots enable high-speed abuse, misuse, and attacks on websites and APIs. They enable attackers, unsavory competitors and fraudsters to perform a wide array of malicious activities, including web scraping, competitive data mining, personal and financial data harvesting, brute force login and man-in-the-middle attacks, digital ad fraud, spam, transaction fraud, and more.
The bad bot problem has become so rampant it has earned its first piece of US federal legislation. In an attempt to make the use of ticket scraping bots illegal, the US Congress passed the Better Online Ticket Sales Act. Similarly, governments in the UK and Canada are also looking at introducing new laws to stop automated ticket purchasing by bots. While legislation is a welcome deterrent, it’s difficult to legislate against those you can’t identify. Bad bots continue to exist under the radar and they are looking to stay.
What does the data say?
Using our network, we looked for trends in how bots are developing, including hundreds of billions of bad bot requests, anonymized over thousands of domains. As part of this, we focused on bad bot activity at the application layer as these attacks differ from the simple volumetric Distributed Denial of Service attacks that typically grab the headlines. Here are some of our top findings:
1. Bigger site? Bigger target
Bad bots don’t sleep — they’re everywhere, at all times. But even though bad bots are active on all sites, the larger sites were hit the hardest in 2016. Bad bots accounted for 21.83 percent of large website web traffic, which saw an increase of 36.43 percent since last year.
Larger sites are generally ranked higher in search engine results because humans rarely look past the first few search engine results. Smaller sites don’t get the same level of SEO traffic uplift so large and medium sites are more enticing targets for bad bots.
2. Bad bots lie
Bad bots must lie about who they are to avoid detection. They do this by reporting their user agent as a web browser or mobile device. In 2016 the majority of bad bots claimed to be the most popular browsers: Chrome, Safari, Internet Explorer, and Firefox. Chrome was at the top spot.
Alongside this, there was also a 42.78 percent year-over-year increase in bad bots claiming to be mobile browsers. For the first time, mobile Safari made the top five list of self-reported user agents, outranking web Safari by 17 percent.
3. If you build it, bots will come
When it comes to the attractiveness of a website, bad bots have a type. There are four key website features bad bots look for:
- Proprietary content and/or pricing information
- A login section
- Web forms
- Payment processors
4. The weaponization of the data center
Data centers were the weapon of choice for bad bots in 2016, with 60.1 percent coming from the cloud. Amazon AWS was the top originating ISP for the third year in a row with 16.37 percent of all bad bot traffic — four times more than the next ISP.
But why use central data centers rather than the traditional “zombie” PC that is part of a botnet, which is more typically used for DDoS attacks? The answer here is that it’s never been easier to build bad bots with open source software or cheaper to launch them from globally distributed networks using the cloud. These data centers can scale up faster and more efficiently for bot attacks on application layers, while steps like masking IP addresses has become easy and essential within bot deployments. This centralized approach is easier to manage when it comes to fraud and account theft campaigns.
5. Out of date? Out of luck
Humans aren’t the only ones falling behind on software updates; it turns out bad bots have the same problem. One in every ten of bad bots said they were using browser versions released before 2013 — some were reporting browser versions released as far back as 1999.
But why are bad bots reporting as out-of-date browsers? Perhaps some were written many years ago and are still at work today. Some may have been targeting specific systems that only accept specific browser versions. Others may be have been out-of-control programs, bouncing around the Internet in endless loops, still causing collateral damage.
6. The continuing rise of advanced persistent bots
In 2016, 75 percent of bad bots were Advanced Persistent Bots (APBs). Today’s advanced persistent bots are more sophisticated as they can load JavaScript, hold onto cookies and load up external resources — this makes them more effective in their attacks. Similarly, bots can carry out obfuscation techniques to randomize the IP address, headers, and user agents associated with their activity. This helps them to hide in the noise of everyday activity.
APBs can carry out highly progressive attacks, such as account-based abuse and transaction fraud, which require multiple steps and deeper penetration into the web application. If you’re using a web application firewall (WAF) and are filtering out known violator user agents and IP addresses, that’s a good start. However, bad bots rotate through IPs and cycle through user agents to evade these WAF filters. You’ll need a way to differentiate humans from bad bots that are using headless browsers, browser automation tools, and man-in-the-browser malware campaigns.
7. Is the USA the bot superpower?
The US has topped the list of bad bot originating countries for the third year in a row. In fact, the US had a larger amount of total bad bot traffic (55.4 percent) than all other countries combined. The Netherlands generated 11.4 percent of bad bot traffic and was the next closest country, while China reached the top three for bad bots for the first time. South Korea made the biggest jump, up 14 spots from 2015.
But does over half of all cybercrime really come from US citizens? A spammer bot might originate from a US data center, but the perpetrator responsible for it could be located anywhere in the world. Thanks to virtual private data centers such as Amazon AWS, cyber crooks can leverage US-based ISPs to carry out their attacks as if they originated inside America and avoid location-based blocking techniques.
What can you do about bots?
As much as they try to hide their activity, there are some results from bad bot attacks that can be noticed. Normally, these results may not be explained within traditional monitoring tools. For example, you can tell significant volumes of bad bot traffic when unexpected spikes in traffic cause slowdowns without a concomitant increase in sales traffic. Another example might be where your site’s search rankings plummet due to content theft and data being scraped. Similarly, you might see poor results from misguided ad spend as a result of skewed analytics.
Other pointers to bad bot activity might be that your company sees high numbers of failed login attempts and increased customer complaints regarding account lockouts. Bad bots will leave fake posts, malicious backlinks, and competitor ads in your forums and customer review sections.
In order to filter out bad bots, it’s worth taking the time to learn about the most attractive areas of your website and find out if they are all properly secured against bots. One way to choke off bad bots is to geo-fence your website by blocking users from foreign nations where your company doesn’t do business.
Similarly, it can be worth looking at the audience profile for your customers — is there is a good reason why users would be on browsers that are several years and multiple updates past their release date? If not, having a whitelist policy that imposes browser version age limits stops up to 10 per cent of bad bots. Also consider if all automated programs, even ones that aren’t search engine crawlers or pre-approved tools, belong on your site. Consider setting up filters to block all other bots — this can block up to 25 percent of bad bots.
The best way to deal with bots is to monitor and respond on all your web and mobile traffic in real-time so that you see the next bad bot attack coming and stop it in its tracks. This approach relies on using more intelligence and automation to spot activities — rather than relying on human oversight of analytics logs, security can be maintained through better use of data and machine learning over time.