Tutorial: Fishing For Phishing Sites

Cowpipe

Level 16
Thread author
Verified
Well-known
Jun 16, 2014
781
I've noticed over the last few weeks that people have been posting links to phishing sites for malware analysis reasons, I figured I'd put up this little thread for any budding malware hunters out there, to get you started :D

So, on to the main subject of this short tutorial, which is a very quick and effective method to find phishing pages, either for research or to report to WOT, vendors, blacklists etc. It's also a handy way to automatically grab phishing pages from in the wild rather than from blacklists, you can also create your own blacklist this way ;) Note that this is a fairly non technical tutorial, so anybody can try it out for themselves, so long as you never enter any information on the pages you find, even if you are sure they are legitimate ;)

In the spirit of being recent, let's take a look at this page (which is a legitimate Malaysian online banking portal):

Code:
https://www.maybank2u.com.my

And their login portal:

Code:
https://www.maybank2u.com.my/mbb/m2u/common/M2ULogin.do?action=Login

The first task we must do as apprentice 'phishing hunters' is to identify some unique sentences of text (called strings) on the page. These should ideally be unique to this particular bank. Anything that contains the name of the bank in the sentence is usually the best bet. The example below is pretty much perfect:

Click here to notify us of any Maybank2u.com Internet Banking fraud

Try to avoid using sentences containing phone numbers or general statements about the bank as these can lead to false positives. In some cases however, using these kind of sentences can give good results, but they usually require some filtering techniques which I shall talk about later ;)

Here's a couple of examples of what to avoid:

Too general, this text will appear on nearly every banking page, scam or not:
Never reveal your PIN / Password to anyone

Too long. Try to use shorter sentences, or if you can't find any short, unique sentences unique to that banking login, cut out a small chunk of the text which is unique and use that. For example the whole of the text below is too long, however the part in bold can be cut out and used instead.
Please be reminded that your account will be inactive if you do not login
to M2U for 3 months, and will be automatically deactivated if the account
remains idle for 6 months.

Ok, so don't worry if you're having trouble deciding which piece of text to use, you can repeat the process with different pieces of text to harvest more phishing pages as sometimes one page will show up for one piece of text but not another.

The next step is to prepare your search engine queries. By this point you may have figured out that we are simply searching for the phishing pages. This works because most phishers allow their web pages to be indexed (listed by search engines such as Google or Bing). When I say allow, I mean this is probably a mistake by them, as there is nothing but disadvantage to having standard phishing pages listed in search engine results.

We will use a number of different search engines to harvest our phishing pages. The reason being that Google does not show the same pages to Yahoo and Bing shows different results too. Note that we'll also use a couple of 'badly behaved' search engines to search for pages which don't want to be indexed (for those 'slightly' more competent phishers), Badiu for example. I will list a set of queries for each search engine and then explain how they work:

Google, Bing, Badiu:
Code:
intext:"Click here to notify us of any Maybank2u.com Internet Banking fraud"

Yahoo:
Code:
"Click here to notify us of any Maybank2u.com Internet Banking fraud"

All this does is searches within the page text for the sentence. Note the importance of the quotation marks as this only shows matches for the whole sentence instead of logical word groups for example "click here" or "internet banking fraud".

Note that you should always pay attention to what is being matched (displayed in bold). On some search engines, the later pages will only contain matches for particular keywords and these will be random pages, not phishing pages. Google and Badiu don't appear to suffer from this problem as often as Bing and Yahoo.

Now to save us going through pages and pages of results which aren't relevant we can add a couple of 'negative operators' to our search query which will filter out certain kinds of results (literally meaning, remove X from my results). The example query above, when searched on Google includes a lot of 'chaff', listings for a site called PhishTank (ironically, a database of phishing pages).

I personally like to filter out Facebook results too as these can sometimes be large in number but rarely helpful, and in some cases blog pages. Be careful when filtering words like 'facebook' as this will then block login pages which include text such as "Like us on Facebook", to get around this I use the 'inurl:' operator which only matches the word in the url (simple isn't it) ;)

Only Google seems to be reliable when working with these advanced filters, if anybody figures out a way to get them working in other search engines reliably (without dramatically altering the existing results), please let me know! :p

Google:
Code:
intext:"Click here to notify us of any Maybank2u.com Internet Banking fraud" -"PhishTank" -inurl:"facebook.com"

You will of course get good results with the above query. For those more technically aware scammers who don't allow their webpages to be indexed (but still aren't advanced enough to use user-agent detection to return 404 errors to crawlers), here is how you can get a much fuller result set by searching in the URL instead of the page content.

First we have to get a unique part of the URL which is used by the login page to actually log the user in. If you remember our example near the top of this post (the url of the login page), we can simply grab the end part for our purposes, and we use Google's search filter to only match within the url:

Google:
Code:
inurl:"M2ULogin.do?action=Login"

The above query doesn't yield very good results, only 2 phishing pages are listed. On close inspection you'll notice the question mark (?) which in a URL is normally used to indicate that parameters will now be passed. So in the above URL the variable "action" will be set to the value "Login". Understanding this principle will give you a better understanding of how phishing pages commonly work, as we shall see in a moment. Anyway, back to the question mark, to understand why our query isn't working, let's take that symbol and include it in a query on it's own. This example should in theory match any sentences containing a question mark:

Code:
intext:"?"

However it actually gives us no results. The following query however gives us some interesting matches...

Code:
intext:"a?c"

Including A/C, AC, A(C, A-C etc.

This is because the question mark in a Google query means 'optional'. This is why in the above example, we sometimes get matches for just AC and sometimes we match A <some stuff here> C. The bit in the middle can be anything and is optional. Anyway long story short, this special character needs to be removed from our query in order not to break it. Our new query returns exactly what we want, loads of phishing pages!

Google:
Code:
inurl:"M2ULogin.doaction=Login"

Many previously unknown to Web of Trust and other blacklists ;)

And of course note the descriptions:

A description for this result is not available because of this site's robots.txt – learn more.

Refer to the fact that we couldn't find these using the first query style (intext) because the page tells search engines not to make a copy of the text, so people can't search against it. Lucky for us these scammers are still pretty stupid then ;)

Anyway, I really hope you've enjoyed this little tutorial, it's been fun to write it! :D

Thanks to @FreddyFreeloader ~ Saw your upload of phishing pages and inspired me to write this article ;)
 

Cowpipe

Level 16
Thread author
Verified
Well-known
Jun 16, 2014
781
Another example using the tutorial. Requires some manual filtering still, but I harvested quite a few phishing pages from these:

Code:
intext:"Log in securely to your PayPal" -inurl:"paypal.com" -"facebook.com" -"PhishTank" -inurl:"paypal-community"

Heavier filtering:

Code:
intext:"Log in securely to your PayPal" AND intext:"send money and accept payments." -inurl:"paypal.com" -"facebook.com" -"PhishTank" -inurl:"paypal-community"
 
  • Like
Reactions: Kent and Oxygen

Kent

Level 10
Verified
Well-known
Nov 4, 2013
468
Thanks for the useful tutorial! :D
 
Last edited by a moderator:
  • Like
Reactions: Cowpipe

About us

  • MalwareTips is a community-driven platform providing the latest information and resources on malware and cyber threats. Our team of experienced professionals and passionate volunteers work to keep the internet safe and secure. We provide accurate, up-to-date information and strive to build a strong and supportive community dedicated to cybersecurity.

User Menu

Follow us

Follow us on Facebook or Twitter to know first about the latest cybersecurity incidents and malware threats.

Top