Spawn

Administrator
Staff member
Verified
"The "Yahoo News Feed dataset" incorporates anonymous browsing habits of 20 million users between February and May of 2015 across a variety of Yahoo properties, […].

All told, the data set is a whopping 13.5TB and covers 110 billion unique interaction "events." Yahoo calls it the "largest machine learning dataset" ever publicly released, and we're inclined to believe them -- there aren't very many companies who could accumulate this much browsing data.

It's a huge amount of data, but fortunately you don't need to worry about advertisers mining it to make more targeted ads. Yahoo is specifically releasing it only to the academic research community to help people build more effective recommendation algorithms. "
 

CMLew

Level 23
Verified
.......fortunately you don't need to worry about advertisers mining it to make more targeted ads. Yahoo is specifically releasing it only to the academic research community to help people build more effective recommendation algorithms. "
I would take this with a pinch of salt.......
 
  • Like
Reactions: DracusNarcrym

DracusNarcrym

Level 19
Verified
I wish i can afford 13.5TB SSD.
Server infrastructures actually incorporate multiple and RAID-configured storage devices, not singular devices with impractically large storage capacity.

The phrase "for academic purposes" is as abused/overused as the term "for fair use" in copyright legislature.

Just kidding, Yahoo, we believe you! :rolleyes:
 
Last edited by a moderator:
  • Like
Reactions: Spawn

Tinm

Level 3
What is the definition of academic research, it can be different for different people. So, its really scary. Also where is the security that this data will not fall into some other person's hand.
 
  • Like
Reactions: DracusNarcrym