All thanks to Scala.
1000 executors, 2 cores each. That’s about 0.012 second to predict 80K items per core. As a result, I have managed to score 80K items for each of the 100M users in 10 minutes. Certainly this is partly due to the preprocessings that I have done before this stage. All in all, the entire process has taken around an hour.