Just malware hashes, you must use an API such as bazaar or VT API.
API documentation
bazaar.abuse.ch
Code:
tip.kaspersky.com/Help/Doc_data/en-US/ThreatLookupAPI.htm
Some of these APIs are not free, subscriptions are required as threat intelligence, specially when curated and properly checked for FPs is not free.
Many AV vendors offer APIs as well.
The Bazaar API should be free but it is plagued with false positives.
You implement it via HTTP Post request like this:
Code:
wget --post-data "query=get_info&hash=7de2c1bf58bce09eecc70476747d88a26163c3d6bb1d85235c24a558d1f16754" https://mb-api.abuse.ch/api/v1/
In addition, the Sophos Sorel collection contains 20 million samples you can use to train ML models, be advised that you will also need a large number of safe files for false positives control.
The Sophos AI team is excited to announce the release of SOREL-20M (Sophos-ReversingLabs – 20 million) – a production-scale dataset […]
ai.sophos.com
I also found this, that contains more APIs, supposedly open source.
Repository with Sample threat hunting notebooks on Security Event Log Data Sources - ashwin-patil/threat-hunting-with-notebooks
github.com
A collective list of public APIs for use in security. Contributions welcome - jaegeral/security-apis
github.com
API for Cuckoo Malware Analysis Sandbox http://www.cuckoosandbox.org - keithjjones/cuckoo-api
github.com