Archive for May, 2008

phdfs.py, a ctypes wrapper of hadoop libhdfs for python

I use python and hadoop distributed file system (HDFS) to process large amount of data at work. Instead of using the regular map-reduce mechanism provided by hadoop, I have my home-made map-reduce python engine written using Pyro. It turns out it is quite efficient and sometimes it is much faster than the corresponding streaming code for some simple map-reduce work. For this kind of work, I access the file in HDFS using “hadoop fs -cat” by the unix pipe (popen) in python. It seems to me it might be useful to be able to bypass the somehow ugly unix pipe and “hadoop fs -cat” combination. There already is a SWIG wrapper of python for hdfs. However, I think it will be nice to have ctypes wrapper such that no extra compiling is necessary for installation. I spend a few nights working on such wrapper and hope it will be useful. The results is a single python module that I call “phdfs“. It provides most of the API in the libhdfs. It will be useful if one want to read, write and manipulate the hadoop filesystem with the flexible and powerful python syntax.

You can download the phdfs.py, and try it out yourself. I have not tested all the methods, so YMMV.

Tags:

Postdocs, "Not Exactly Students, Not Exactly Employees, What are you?"

My neighbor shows me this article from East Bay Express. Those stories sound very familiar. My personal feeling is that such academic system should be fixed soon. The academic society should give more recognition to postdocs.

As a postdoc, you don’t get those benefit to students. You are not considered as a formal employee. You don’t get any benefit and you are paid low in the name of science. I still remember that I felt so absurd when I was told I could not pay my monthly parking fee by automatic deduction from my paycheck, because I was a “temporary worker” in the school I had being working for a few years.

Well, I can not say that my career is not benefit from my postdoc research. But, I can not say I totally enjoy being treated by the school as “temporary worker” for an indefinitely amount of time. One should treat the real “working horses” in the academic research industry a little better. Without these working horses, there will be no “super-star” in research communities. Anyway, there is not much point for me to complain anymore. Industrial R&D can be fun too.

Tags: