- Jan 8, 2011
- 22,361
"The "Yahoo News Feed dataset" incorporates anonymous browsing habits of 20 million users between February and May of 2015 across a variety of Yahoo properties, […].
All told, the data set is a whopping 13.5TB and covers 110 billion unique interaction "events." Yahoo calls it the "largest machine learning dataset" ever publicly released, and we're inclined to believe them -- there aren't very many companies who could accumulate this much browsing data.
It's a huge amount of data, but fortunately you don't need to worry about advertisers mining it to make more targeted ads. Yahoo is specifically releasing it only to the academic research community to help people build more effective recommendation algorithms. "
All told, the data set is a whopping 13.5TB and covers 110 billion unique interaction "events." Yahoo calls it the "largest machine learning dataset" ever publicly released, and we're inclined to believe them -- there aren't very many companies who could accumulate this much browsing data.
It's a huge amount of data, but fortunately you don't need to worry about advertisers mining it to make more targeted ads. Yahoo is specifically releasing it only to the academic research community to help people build more effective recommendation algorithms. "