Things I've Learned About Java
March 13, 2009
Doing a project with this scale of data (100 million ratings, by 480189 users, on 17770 movies) shows a person just how much they don't know about programming yet.
Here's a few things I've learned so far:
Objects take up a crapton of memory.
You don't realize this when you're just making you're little
Personobjects in class, but when you try and fit 100 million ratings in memory, objects are not your friend. It's all about arrays of primitive types.Integers take up way too much space.
Even the relatively small, programmer abused
inttype will make you cry when you try to do this. Had to useshortfor movie ids andbytefor ratings.Doubles are stupid.
I never knew this, but double math in Java is really imprecise. I don't have much choice (both speed and memory wise) but to use them, but adding simple decimals comes out really goofy. Google "java double arithmetic" and see the madness.
Java does not pass by reference.
It passes references by value. (I had heard this before but never ran into a problem with it until now) You may say, "What the heck does that mean?" Well, for example take a
Dogobject calledfidoand pass it to a method that takes aDogparameter. Let's say it's calledgoofyin the method. Right nowfidoandgoofypoint to the same thing. If you change something in thegoofyobject it will change the same thing in thefidoobject, because they're the same object. But if you saygoofy = new Dog();orgoofy = nullthey no longer point to the same thing.fidostill points where it did but nowgoofypoints to something else.That doesn't seem like too much of a problem, but... I had an array of 48,000
PrintWriterobjects as I was trying to reorganize the data. At the end of each run I used a for each --for (PrintWriter print : outputArray)-- to set them all to null, so they could be reused in the next run. But since each element in that array was set into a newly declaredprintvariable, settingprint = nulldidn't setoutputArray[i] = null. So a wasted hour or two of running the sorting program.
That's all I have for now, but I'm sure I'll be posting about this again.