What about probability of 97 documents to be benign?

The probability of exactly 97 of the next 100 documents being benign can be calculated using the binomial distribution formula:

P(X = 97) = (100 choose 97) * p^97 * (1-p)^(100-97)

where p is the probability of a document being benign, assuming the same tendencies continue with no change.

However, if you are asking for the probability of at least 97 of the next 100 documents being benign, you can calculate it as follows:

P(X ≥ 97) = P(X = 97) + P(X = 98) + P(X = 99) + P(X = 100)

This is because to have at least 97 benign documents out of 100, you could have exactly 97, or 98, or 99, or 100.

Assuming that the historical percentage of benign documents is p=0.99 (i.e., 99%), then the probability of exactly 97 of the next 100 documents being benign is:

P(X = 97) = (100 choose 97) * 0.99^97 * 0.01^3

= 0.452843

The probability of at least 97 of the next 100 documents being benign is:

P(X ≥ 97) = P(X = 97) + P(X = 98) + P(X = 99) + P(X = 100)

= (100 choose 97) * 0.99^97 * 0.01^3 + (100 choose 98) * 0.99^98 * 0.01^2 + (100 choose 99) * 0.99^99 * 0.01^1 + 0.99^100

= 0.999999999999998

Therefore, in this scenario, the probability of at least 97 of the next 100 documents being benign is approximately 1 (or 100%). This is because the probability of downloading a malicious document is very low (0.01), and the sample size is relatively large (100 documents), which significantly reduces the chances of encountering a malicious document.