Java 4 Ever
Pretty much the story my life…
Four doctors went duck hunting together: a family practitioner, a gynecologist, a surgeon, and a pathologist. As a bird flew overhead, the family practitioner started tho shoot but decided not to because he wasn’t absolutely sure it was a duck. The gynecologist also prepared to shoot, but lowered his gun when he realized that he didn’t know whether it was a male or female duck. The surgeon, meanwhile, blew the bird out of the sky, turned to the pathologist and said: “Go see if that was a duck.”
Even lawyers have ethics. If a client mistakenly gives a lawyer $400 to pay a $300 bill, the ethical question that naturally arises is whether the lawyer should tell his partner.
This is very appropriate given the times and given the fact that I too once worked for Lehman Brothers:
The ninety-year-old man went to the doctor and said, “Doctor, my eighteen-year-old wife is expecting a baby.” The doctor said, “Well, let me tell you a story. A man went hunting, but instead of a gun, he picked up an umbrella by mistake. When a bear suddenly charged at his wife who was accompanying him in the woods, the man aimed the umbrella, shot the bear, and killed it!” The main said, “Impossible! Someone else must have shot the bear!” The doctor said, “My point exactly.”
There is a town in which the only barber, a man, shaves all the townsmen who do not shave themselves. Does the barber shave himself? If he does, he doesn’t. If he doesn’t, he does. Makes sense?
Looking for efficient ways of loading files that run in the GB realm, in Java?
I recently played around with Java’s NIO API, which was introduced in JDK 1.4. As you may know, the traditional IO classes deal with streams, whereas the NIO API deals with block-oriented approach, which makes it considerably faster because the filling and draining of buffers is handled by OS, not JVM.
Anyway, what really caught my attention in the API, was the java.nio.MappedByteBuffer class. Memory-mapped IO means that data in a file can directly mapped to physical memory. Nothing new in the OS world, but this was previously not possible in Java. Just be aware that when dealing with writing, you are dealing directly with the disk memory - there is no separation between modifying the data and saving it to the disk (as in the traditional IO).
I’ve written a sample code snippet that uses MappedByteBuffer. The sample code maps the file in increments of 1 kB (block size), just for the demonstrative purposes. If you are dealing with files less than ~1.6GB on Windows or up to 2GB files on the Unix-based OS, you can comfortably make the block size equal to the file size and the entire file would get mapped at once. (Read more about the limit-related issues which will be addressed in JDK 7). I was trying to process a 10GB file, so I had to map the file in “sliding” blocks.
Note, however, that mapping file channels directly into memory makes sense only when dealing with very large files. There is no significant performance improvement when dealing with small files. Since the release of JDK 1.4, the java.io classes are actually implemented by the java.nio classes.
Below is a sample method that maps 1 kB at a time (again, this does not make sense in practice; you can comfortably map 1 GB files at once on any platform). The while-loop executes as long as there is still areas of the file that need to be mapped. The map method is the key, which returns the MappedByteBuffer. In order to make the contents into human-readable format, the buffer gets wrapped by a CharBuffer. Each line in the file, get stored into a class variable lines, which is of type List (see full source for details).
public void load() throws IOException {
FileInputStream fis = new FileInputStream(fileName);
FileChannel fc = fis.getChannel();
MappedByteBuffer mbb = null;
Charset cs = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = cs.newDecoder();
StringBuilder sb = new StringBuilder();
long bs = 1024L; // block size
long fs = fc.size(); // file size
long t = 0L; // total size
if (fs != 0) {
while (t < fs) {
if (t + bs > fs) {
bs = fs - t;
}
mbb = fc.map(FileChannel.MapMode.READ_ONLY, t, bs);
CharBuffer cb = decoder.decode(mbb);
while (cb.hasRemaining()) {
char c = cb.get();
if (c == ‘n’) {
lines.add(sb.toString());
sb = new StringBuilder();
} else if (c == ‘r’) { // Windows
continue;
} else {
sb.append(c);
}
}
t += bs;
}
// The last line may not have ended with n
if (sb.length() > 0) {
lines.add(sb.toString());
}
}
}
The complete source code of the sample class: MemoryMappingReader.java.
Two cows are standing in the pasture. After a while, one turns to the other and says, “Do you realize that although pi is usually abbreviated to four decimal places, it actually goes to infinity?” And the other cow replies, “Moo.”
When debating the beginning of the universe, many often hide behind the infamous infinite regress idea. In other words, in order to explain the existence of the world by positing a “maker” raises the question of how to explain the existence of the maker. If another maker is posited, the question becomes, “Who made the maker?” And so on, or “ad nauseam,” whichever comes first.
My response to the very first cause of the cosmos? Just pull Lennon’s creatio ex nihilo (creation out of nothing) argument:
“Before Elvis, there was nothing.” — John Lennon
The anecdote is motivated by Leibniz’s ideas–he was neither an optimist nor a pessimist, but merely a neutral rationalist.
Optimist: The glass is half full.
Pessimist: The glass is half empty.
Rationalist: The glass is twice as big as it needs to be.