Hello, my 2 cents...
I think the idea is great, and that's exactly its problem. It's a great, big idea, one that would take a lot of coordinated work to get off the ground and flying properly. Of course, many users spend their time bringing individual test results up here, but to work coordinately is an entirely different matter. For one, testers would now have to stick to certain schedules, instead of doing the tests whenever they want/can, simply because there can't be too much time between tester #1 bringing AV#1 results and tester #2 bringing AV#2 results, if we want to compare the results with the same samples. And this is not to mention that the very nature of these tests would be much more complex and time-consuming than simple VM tests.
Then there is this issue that has been brought up about AV companies paying testers for better results, or even interfering in other ways with the test results in order to mask them and make themselves look better. This idea, as I said would require a lot of coordination, but it wouldn't require, nor would it be possible, to work as coordinately as a proper testing company. And this brings a serious problem, because if you are worried that a company might be accepting money to fake results, and this means the boss agreed, and all the workers said ok to that too; then how concerned can you be about individual testers that do not work in the same office, or do not even live in the same country, being approached by such companies? The lack of centralization could make such threats to the truthfulness of the tests even more severe, and much harder to control.
Maybe this could work without such fixed schedules, but then it wouldn't be different from the freedom we already have of posting test results, videos, etc.
If I can see this thing working, I believe it's with a group of trusted members working more or less together (I'll test AV#1, you test AV#2 kind of thing) releasing results of tests done with proper procedures, at their leisure and not with a fixed schedule. The community could vote for the products they want tested, plus doing a rotation to make sure a broad spectrum of products is tested.
As an additional note, I'd like to say I believe excluding clones is a bad idea in my opinion, because clones are not the same as their originals, and may lack features, or have extra features, plus they can be more or less buggy than the originals, not to mention sometimes the database is not the same as the original, so I see a lot of reasons to expect different results from the clones, which would make testing them useful.