“Ma’, look no streams!”

random — oliver on March 12, 2009 at 2:20 am

Looking for efficient ways of loading files that run in the GB realm, in Java?

I recently played around with Java’s NIO API, which was introduced in JDK 1.4. As you may know, the traditional IO classes deal with streams, whereas the NIO API deals with block-oriented approach, which makes it considerably faster because the filling and draining of buffers is handled by OS, not JVM.

Anyway, what really caught my attention in the API, was the java.nio.MappedByteBuffer class. Memory-mapped IO means that data in a file can directly mapped to physical memory. Nothing new in the OS world, but this was previously not possible in Java. Just be aware that when dealing with writing, you are dealing directly with the disk memory - there is no separation between modifying the data and saving it to the disk (as in the traditional IO).

I’ve written a sample code snippet that uses MappedByteBuffer. The sample code maps the file in increments of 1 kB (block size), just for the demonstrative purposes. If you are dealing with files less than ~1.6GB on Windows or up to 2GB files on the Unix-based OS, you can comfortably make the block size equal to the file size and the entire file would get mapped at once. (Read more about the limit-related issues which will be addressed in JDK 7). I was trying to process a 10GB file, so I had to map the file in “sliding” blocks.

Note, however, that mapping file channels directly into memory makes sense only when dealing with very large files. There is no significant performance improvement when dealing with small files. Since the release of JDK 1.4, the java.io classes are actually implemented by the java.nio classes.

Below is a sample method that maps 1 kB at a time (again, this does not make sense in practice; you can comfortably map 1 GB files at once on any platform). The while-loop executes as long as there is still areas of the file that need to be mapped. The map method is the key, which returns the MappedByteBuffer. In order to make the contents into human-readable format, the buffer gets wrapped by a CharBuffer. Each line in the file, get stored into a class variable lines, which is of type List (see full source for details).

public void load() throws IOException {
	FileInputStream fis = new FileInputStream(fileName);
	FileChannel fc = fis.getChannel();
	MappedByteBuffer mbb = null;
	Charset cs = Charset.forName("ISO-8859-1");
	CharsetDecoder decoder = cs.newDecoder();
	StringBuilder sb = new StringBuilder();

	long bs = 1024L; // block size
	long fs = fc.size(); // file size
	long t = 0L; // total size

	if (fs != 0) {
		while (t < fs) {
			if (t + bs > fs) {
				bs = fs - t;
			}
			mbb = fc.map(FileChannel.MapMode.READ_ONLY, t, bs);
			CharBuffer cb = decoder.decode(mbb);

			while (cb.hasRemaining()) {
				char c = cb.get();

				if (c == ‘n’) {
					lines.add(sb.toString());
					sb = new StringBuilder();
				} else if (c == ‘r’) { // Windows
					continue;
				} else {
					sb.append(c);
				}
			}
			t += bs;
		}
		// The last line may not have ended with n
		if (sb.length() > 0) {
			lines.add(sb.toString());
		}
	}
}

The complete source code of the sample class: MemoryMappingReader.java.

Infinite Humor

random — oliver on February 12, 2009 at 2:00 am

Two cows are standing in the pasture. After a while, one turns to the other and says, “Do you realize that although pi is usually abbreviated to four decimal places, it actually goes to infinity?” And the other cow replies, “Moo.”

Infinite Regress

random — oliver on January 24, 2009 at 1:41 am

When debating the beginning of the universe, many often hide behind the infamous infinite regress idea. In other words, in order to explain the existence of the world by positing a “maker” raises the question of how to explain the existence of the maker. If another maker is posited, the question becomes, “Who made the maker?” And so on, or “ad nauseam,” whichever comes first.

My response to the very first cause of the cosmos? Just pull Lennon’s creatio ex nihilo (creation out of nothing) argument:

“Before Elvis, there was nothing.” — John Lennon

Rational Humor

random — oliver on January 24, 2009 at 1:26 am

The anecdote is motivated by Leibniz’s ideas–he was neither an optimist nor a pessimist, but merely a neutral rationalist.

Optimist: The glass is half full.
Pessimist: The glass is half empty.
Rationalist: The glass is twice as big as it needs to be.

Metaphysics the German Way

random — oliver on January 9, 2009 at 1:11 am

Aristotle is said to have said that meaning of life consists of happiness. St. Augustine thought it was to love God. Martin Heidegger, the famous twentieth-century existentialist, however, thought the authentic meaning of life is to live in full consciousness of death. Now that’s the “bright” way of looking at things…

Origin of the Word “Metaphysics”

random — oliver on January 4, 2009 at 5:15 pm

I recently came across to an interesting fact, which most people, I would claim, are not aware of. Most dictionaries define the word “metaphysics” as a division of philosophy that is concerned with the fundamental nature of reality and being and that includes ontology, cosmology, and often epistemology. One can easily read into the definition and think that the etymology of metaphysics refers to something that concerns matters beyond the physical world. However, the word actually comes from Aristotle’s work which was titled “Metaphysica”–a title given by a first-century scholar. The name referred to the fact that the treatise came after (i.e. meta- or beyond) his other treatise called “Physica.” Thus, the catchy name has nothing to do with matters beyond the physical world.

Hedge Fund Coding Skill Screening: Part II

technology — oliver on February 13, 2008 at 2:24 am

This is the second assignment from the interview (see also my previous blog entry):

A collection of particles is contained in a linear chamber. They all have the same speed, but some are headed toward the right and others are headed toward the left. These particles can pass through each other without disturbing the motion of the particles, so all the particles will leave the chamber relatively quickly.

You will be given the initial conditions by a String init containing at each position an 'L' for a leftward moving particle, an 'R' for a rightward moving particle, or a '.' for an empty location. init shows all the positions in the chamber. Initially, no location in the chamber contains two particles passing through each other. We would like an animation of the process. At each unit of time, we want a string showing occupied locations with an 'X' and unoccupied locations with a '.'.

Create a class Animation that contains a method animate that is given an int speed and a String init giving the initial conditions. The speed is the number of positions each particle moves in one time unit. The method will return an array of strings in which each successive element shows the occupied locations at the next time unit. The first element of the return should show the occupied locations at the initial instant (at time = 0) in the 'X', '.' format. The last element in the return should show the empty chamber at the first time that it becomes empty. The method signature: public String[] animate(int speed, String init).

You may assume the following constraints: speed will be between 1 and 10 inclusive, init will contain between 1 and 50 characters inclusive, and each character in init will be '.' or 'L' or 'R'.

Here are some examples:

1) 2, "..R...."
Returns:
..X....
....X..
......X
.......

The single particle starts at the 3rd position, moves to the 5th, then 7th, and then out of the chamber.

2) 3, "RR..LRL"
Returns:
XX..XXX
.X.XX..
X.....X
.......

At time 1, there are actually 4 particles in the chamber, but two are passing through each other at the 4th positio.

3) 2, "LRLR.LRLR"
Returns:
XXXX.XXXX
X..X.X..X
.X.X.X.X.
.X.....X.
.........

At time 0 there are 8 particles. At time 1, there are still 6 particles, but only 4 positions are occupied since particles are passing through each other.

4) 10, "RLRLRLRLRL"
Returns:
XXXXXXXXXX
..........

These particles are moving so fast that they all exit the chamber by time 1.

5) 1, "..."
Returns:
...

6) 1, "LRRL.LR.LRR.R.LRRL."
Returns:
XXXX.XX.XXX.X.XXXX.
..XXX..X..XX.X..XX.
.X.XX.X.X..XX.XX.XX
X.X.XX...X.XXXXX..X
.X..XXX...X..XX.X..
X..X..XX.X.XX.XX.X.
..X....XX..XX..XX.X
.X.....XXXX..X..XX.
X.....X..XX...X..XX
.....X..X.XX...X..X
....X..X...XX...X..
...X..X.....XX...X.
..X..X.......XX...X
.X..X.........XX...
X..X...........XX..
..X.............XX.
.X...............XX
X.................X
...................

Here is my solution:

public String[] animate(int speed, String init)
{
	int curr[] = new int[init.length()];
	List<String> result = new ArrayList<String>();
	boolean empty = false;

	for(int i = 0; i<init.length(); i++)
		curr[i] = init.charAt(i)=='.'?0:init.charAt(i);

	while(!empty)
	{
		StringBuilder state = new StringBuilder();
		int next[] = new int[curr.length];
		empty = true;

		for(int i = 0; i<curr.length; i++)
			if(curr[i] != 0)
			{
				state.append('X');
				empty = false;
			}
			else
				state.append('.');

		for(int i=0; i<curr.length; i++)
		{
			if(curr[i]>='R' && i+speed<next.length)
				next[i+speed]+='R';
			if((curr[i]=='L' || curr[i]>'R') && i-speed>=0)
				next[i-speed]+='L';
		}

		result.add(state.toString());
		curr = next;
	}

	return result.toArray(new String[0]);
}

The complete source code, with the test cases: Animation.java.

Hedge Fund Coding Skill Screening: Part I

technology — oliver on February 12, 2008 at 11:06 pm

One of my friends recently interviewed with a hedge fund in New York. The fund conducted a brief phone interview and then sent an email with two interesting coding problems. You could choose any of the three major languages, i.e. C, C++, or Java. I like Java. They’re not all that challenging at the end of the day; however, with limited time, one may not find these assignments trivial. Since these are actual interview questions from yesterday, I’m not going to mention the fund’s name. In theory, although very unlikely, someone could find this while their taking the very same exam.

The first assignment:

Sentences such as “A quick brown fox jumps over the lazy dog”, contain every single letter in the alphabet. Such sentences are called pangrams. You are to write a method getMissingLetters, which takes a String, sentence, and returns all the letters it is missing (which prevent it from being a pangram). You should ignore the case of the letters in sentence, and your return should be all lower case letters, in alphabetical order. You should also ignore all non US-ASCII characters. The code you submit must contain a method that conforms to the expected method signature, as follows: public String getMissingLetters(String sentence).

Here are some examples:

1) "A quick brown fox jumps over the lazy dog"
Returns: ""

2) "A slow yellow fox crawls under the proactive dog"
Returns: "bjkmqz"

3) "Lions, and tigers, and bears, oh my!"
Returns: "cfjkpquvwxz"

4) ""
Returns: "abcdefghijklmnopqrstuvwxyz"

Here is my solution:

public String getMissingLetters(String sentence)
{
	boolean occurances[] = new boolean[26];

	for(int i = 0; i < sentence.length(); i++)
	{
		int c = (int)sentence.charAt(i);

		if(c >= 'A' && c <= 'Z')
			occurances[c-'A'] = true;
		else if(c >= ('A'+32) && c <= ('Z'+32))
			occurances[c-'A'-32] = true;
	}

	StringBuilder missing = new StringBuilder();

	for(int i = 0; i < occurances.length; i++)
		if(!occurances[i])
			missing.append((char)(i+'A'+32));

	return missing.toString();
}

The complete source code, with the test cases: MissingLetters.java.

What Volatility?

humor, finance — oliver on February 12, 2008 at 2:43 am

This may well be the reason for the latest, phenomenal market volatility. Anyway, got this great clip from someone. However, since I don’t remember who and I don’t know the name of the artist, I can’t really give credit to anyone at this time. Brilliant though?

Wild!

Wallstr**

humor, finance — oliver on February 2, 2008 at 3:34 am

One of the great products of the blog-vlog-pod culture: wallstr**.com.


Next Page »
By Oliver Kaljuvee. 2007.