My First Attempts
March 13, 2009
I'll preface this by saying I have no hopes, expectations, or anything else of actually competing in this prize. I came into this hoping to learn about recommender systems.
I started by finding some research papers on recommender systems and collaborative filtering, and assumed I could start implementing a nearest neighbor algorithm. I pulled out my trusty old Java and wrote a quick app to import the training set into a MySQL database. 10 hours later, reality set in as to how massive this data was compared to anything I'd worked on before.
Then I started to implement my nearest neighbor algorithm. After working on that off and on for a week or so, I gave it a shot. 5 hours later it was still calculating similarity scores for the first prediction. With another 1.5 million ratings left to predict, I realized this was pointless. After a trip to the Netflix forum and I realize that nobody in their right mind is running their algorithms from a database. After putzing around on my own for a while trying to figure it out, I found a Java framework I could start from to get it all to run in memory.
Even after I modified my nearest neighbor algorithm to work with the memory backend, it was still taking way too long. 8000 out of 1.5 million in 12 hours. Another trip to the forum revealed my faulty assumption that nearest neighbor was the best algorithm to start with. On to matrix factorization...