Things I've Learned About Java
March 13, 2009
Doing a project with this scale of data (100 million ratings, by 480189 users, on 17770 movies) shows a person just how much they don't know about programming yet.
Here's a few things I've learned so far:
Objects take up a crapton of memory.
You don't realize this when you're just making you're little
Person
objects in class, but when you try and fit 100 million ratings in memory, objects are not your friend. It's all about arrays of primitive types.Integers take up way too much space.
Even the relatively small, programmer abused
int
type will make you cry when you try to do this. Had to useshort
for movie ids andbyte
for ratings.Doubles are stupid.
I never knew this, but double math in Java is really imprecise. I don't have much choice (both speed and memory wise) but to use them, but adding simple decimals comes out really goofy. Google "java double arithmetic" and see the madness.
Java does not pass by reference.
It passes references by value. (I had heard this before but never ran into a problem with it until now) You may say, "What the heck does that mean?" Well, for example take a
Dog
object calledfido
and pass it to a method that takes aDog
parameter. Let's say it's calledgoofy
in the method. Right nowfido
andgoofy
point to the same thing. If you change something in thegoofy
object it will change the same thing in thefido
object, because they're the same object. But if you saygoofy = new Dog();
orgoofy = null
they no longer point to the same thing.fido
still points where it did but nowgoofy
points to something else.That doesn't seem like too much of a problem, but... I had an array of 48,000
PrintWriter
objects as I was trying to reorganize the data. At the end of each run I used a for each --for (PrintWriter print : outputArray)
-- to set them all to null, so they could be reused in the next run. But since each element in that array was set into a newly declaredprint
variable, settingprint = null
didn't setoutputArray[i] = null
. So a wasted hour or two of running the sorting program.
That's all I have for now, but I'm sure I'll be posting about this again.