Machine learning models used by hedge funds for predicting stock market are of course super top secret as well as data used for their creation. However, there is one which makes its data public - Numer.ai. Dataset is encrypted and prediction of stock market is transformed into binary classification problem. Every 7 days (one round time) a new dataset is released and anyone can download it, train model and upload predictions. At the end of round, the best predictions are rewarded - there is no need to model upload. As you can see from leaderboard mljar.com is going really well in this competition.

I have decided to run some experiments with different algorithms on Numer.ai dataset without any preprocessing and make results public. You can check results in MLJAR Pantry (Project title: "Numerai - dataset from 09/22"). Basically, all algorithms perform very similar on raw Numerai data. You can see experiments definition in Compute view, and more details are available in Results. Clik on result to see learning curves and algorithm hyperparameters. For some algorithms feature importance is also computed - all features are rather equally used ...

That's it. I think with these results we hit the limit of performance on raw data so feature hacking is required to move further ...

And few words about our Pantry - it is a place that (I believe) will be full of jars with machine learning models and results. I think such results and models sharing will move ML forward!