No thing really outstanding, but it is nice you can do this in python.
"".join([dict(zip('ACGTacgt','TGCAtgca'))[c] for c in DNASeq[::-1]])
The “dict” probably be created too many times in this form. The two lines version below will be much faster.
m = dict(zip('ACGTacgt','TGCAtgca'))
rcDNASeq="".join([m[c] for c in DNASeq[::-1]])
Archive for category programming
Unlike most desktop browser, there is no “download” option to save an image, a PDF file, etc., from the Mobile Safari so one can read thoses files offline. Well, this is purely a restriction imposed by Apple for security or whatever other stupid reasons. Fortunately, with the most recent update of the operation that support cut and paste and some third party WebDAV compatible file viewer, there could have a workaround.
I have recently set up a system such that when I found an interesting URL, I can cut and paste the URL to a widget served from my own host such that it will fetch the resource from the URL, and save it to a location that corresponds to the location of a WebDAV service in my web host account. Then, I can setup the WebDAV compatible file viewer (currently, I am using Air Share Pro) to view the file in my WebDAV direcotry. Specially, you can also download the files in the WevDAV server locally for future offline reading. This is extreme useful. Once the file is local, you don’t have to worry if you get wifi, EDGE, or 3G signal, and you might also save some battery power since you don’t have to reload the files from the network when you read a document on and off.
The main caveat is that you probably need your own web hosting service or iDisk that supports WebDAV. Beside that, you will need to some simple CGI/AJAX programming and spend some money to buy a WebDAV compatible file viewer like Air Share Pro. I won’t be surprised that one day there will be an iPhone app that implements something like this using a similar idea in a more polished way. Nevertheless, such application is always at Apple’s mercy. Once Apple opens some policy in the OS allowing sharing data and files between application like an real operation system, all workarounds will be totally useless. Wait, actually, the way I set up getting file to iPhone is not limited for submitting URL from Mobile Safari. Any browser can push the resource from an URL to the WebDAV directory, regradless if an desktop OS supports mounting a WebDAV volumn. Not bad as an quick method to get some useful document in the Internet from a desktop system to iPhone.
Y-combinator in python
Aug 3
In my current work, the main stream platform is .NET. I have not invested a lot of time to start to program in C# for .NET, but I am kind of attracted by F#, which is a functional language for .NET that I would like to know more about.
Recently, I purchased a book to know more about F# and I learn some thing that is called “Y-combinator” in functional programming. The F# code for the “Y-combinator” looks like this,
let rec y f x = f (y f) x;;
You can apply the “Y-combinator” to some non-recursive functional function to make it recursive. For example,
let fac f = function | 0 -> 1 | n -> n * f (n-1);; > y fac 5;; val it : int = 120
I was curious about how to do similar thing in Python. So, I try the following code snippet:
def Y(f):
def g(x):
return f(Y(f))(x)
return g
def fac(f):
def g(x):
return 1 if x == 0 else x * f(x-1)
return g
With these function definitions, “Y(fac)(10)” will give you the correct result 3628800. On the other hand, it is not really easy for me to understand this code as most of my daily programming tasks are in imperative programming style with some more straight forward functional feature. Beside the mathematical way (described in the wiki page) to understand how this work, one way one can see how this work is to see how the code unroll under python interpreter:
Y(fac)(5) -> fac(Y(fac))(5) -> g(5) inside fac with f = Y(fac)
Then the “g(5)” inside fac returns 5 * Y(fac)(4) . The Y(fac)(4) will return 4 * Y(fac)(3) and so on.
It seems there is a lot interesting thing in pure functional world. Hope I will learn more about them soon.
I use python and hadoop distributed file system (HDFS) to process large amount of data at work. Instead of using the regular map-reduce mechanism provided by hadoop, I have my home-made map-reduce python engine written using Pyro. It turns out it is quite efficient and sometimes it is much faster than the corresponding streaming code for some simple map-reduce work. For this kind of work, I access the file in HDFS using “hadoop fs -cat” by the unix pipe (popen) in python. It seems to me it might be useful to be able to bypass the somehow ugly unix pipe and “hadoop fs -cat” combination. There already is a SWIG wrapper of python for hdfs. However, I think it will be nice to have ctypes wrapper such that no extra compiling is necessary for installation. I spend a few nights working on such wrapper and hope it will be useful. The results is a single python module that I call “phdfs“. It provides most of the API in the libhdfs. It will be useful if one want to read, write and manipulate the hadoop filesystem with the flexible and powerful python syntax.
You can download the phdfs.py, and try it out yourself. I have not tested all the methods, so YMMV.
iPhone is a fancy toy with a lot of power but Apple deliberately locks a lot of the potential power. One thing I like to do on an iPhone is to be able to read CHM files. As a weekend project, I setup the tool chain for iPhone following the instructions. Then, I grabbed the source code of chmlib. With some minor modification, I was able to compile the chmlib as an iPhone binary library. That was very encouraging.
This provides a convenient way to make iPhone as a CHM reader. In the chmlib source code distribution, there is an example program that runs as a http-server that serves the content of a CHM as standard web page. The “mobileSafari” has no problem to render the results, but the fonts are usually too small to read and the text is typically rendered too wide such that a lot horizontal scrolling becomes annoyingly necessary.
I decided to combine some python code with the chm_http server from the chmlib source code. I modified the source code of chm_http so it can call python code to modify the HTML code in the CHM file, replacing the original CSS with new setting for reading on small screen. Furthermore, I found it was tedious to start the chm_http from a terminal every time when you want to read a different book. I wrote another small python script that can scan a directory and find all CHM files in the directory to output an index html page. At the end, I was able to use the mobileSafari pointing to the index page and select the book I want to read. The “chm_http” server would start automatically to get the book I like to read.
If you are interested in reading CHM on your iPhone. Get this iphoneCHM.tgz (the file would be upload soon). Copy the “chm_http2“, “rewriteHTML.py“, and “CHMServer” to “/usr/local/bin/” in your iPhone. Change the permission of these files such that you can run all of them. Put some chm files in /var/root/Media/CHM_Ebooks/. Open a terminal in the iPhone or ssh into the iPhone to run “CHMServer”. After that, ask the Safari to open this URL http://127.0.0.1:8000. You should see the links to the CHM files. You can now click on any of them and enjoy a nice reading time.
用 python+OV 寫中文輸入法
Jul 18
前一陣子看到 lukhnos 在寫一個能讓 OV 用 Ruby 來寫 filter 的模組,一時心血來潮想看看如何用 python 來寫 OV 的 filter 的模組。花了點時間研究了一下如何在 C/C++ 中內藏 python。 雖說用 python 來寫程式作研究也有好一陣子,也曾經用過 SWIG 來控制用 C/C++ 寫的物理模擬程式, 在 C/C++ 中呼叫 python 倒是第一次實做。花了點時間寫了個 prototype,在 lukhnos 的協助下,搞定了一個讓 OV 可以用 python 寫 filter 的模組 (在 OV 的 svn repository: Modules/OVOFPythonBased/ 中)。
OV 的 filter 主要是呼叫一個叫 process 的 method。傳到 process 中的只是一個字串,所以實做 OV 的 filter 並不是太難。發展的過程中,大多數的時間花在看 python 的 C API 文件,熟悉如何在 C/C++ 中建立 python 的物件及將參數傳給 python 的 method。
讓 OV 的 filter 機制可以用 modern 的 python 或是 ruby 實做只是第一步。Dynamic language 的方便已經讓發展新的 filter 的工作大大的簡化。所以下一個就是看看能不能讓 OV 用 Python 或是 Ruby 來寫輸入法。在未來實驗類似酷音等比較複雜的自然語言處理的輸入法模組的時候,如果可以用 Python 或是 Ruby 來寫輸入法應該會有很大的助益。
寫 python based OV filter 時, 只需要定義好對應到 process 的 python method/function 就好了,python 的部份是完全的被動,python 的 code 並不需要管 C/C++ 的 class 與 instance,只需要實做一個叫 process 的函數就可以了。 但寫 OV 的輸入法模組的時候,有幾個 OV 的物件必須要傳到 Python 中,而且 Python 也最要能夠 subclass OV 中的 class 來保持 OV API 介面的一致。基本上要做下面幾件事:
(1) 用 SWIG 來把 OV 的 class 轉成 Python 的 class。
(2) 定義對應到 Python class 的 OV C/C++ class。
(3) 在 (2) 中最重要的一件事就是要把將 OV C/C++ 中 instance pointer 轉成 Python 可以認得的物件。
在這三項工作裡,最容易的部份是 (1)。基本上只要把 Framework/Headers/OpenVanilla.h 剪貼到 SWIG 的 interface 檔中就好,唯一要注意的地方是要讓 SWIG 知道要將 C++ 的 class 轉成 python 的 class。這要用到 SWIG 中的 directors 。請見 SWIG 的相關文件 與 Modules/OVIMPython/ 中的 OVIMPython.i。
接下來要就是要讓 OV C/C++ 知道 Python 的存在,主要要去 subclass 兩個 OV C/C++ 的 class, OVInputMethodContext 和 OVInputMethod, 讓 OV 的 loader 可以呼叫對應的 python 物件。請見 Modules/OVIMPython/OVIMPythonBased.cpp 中的 OVIMPythonBasedContext 與 OVIMPythonBased class。 這兩個 wrap classes 作的事情很簡單,就是實做 C++ method 來呼叫對應的 Python instance method。但是之前的一個障礙就是 OVInputMethodContext 及 OVInputMethod 中的 method 的參數裡大多是指向 C++ 的 instance 的 pointers。要怎麼把這些對應的 instance 變成 python 物件在傳給 python 倒是一個比較不容易的問題。也牽涉到 SWIG 怎麼把 C++ 物件映射到 python 物件的細節。
也許在 SWIG 有對應的解法,但我並沒有從 SWIG 的文件中看到顯而易見的方法來解決這個問題。後來是在研究 SWIG 產生的 python module 的檔案中找到 hint。對每一個要 wrap 的 C++ class,SWIG 會產生兩個對應的 python class。例如如果在 C++ 中有如下的宣告:
class OVKeyCode : public OVBase {
public:
virtual int code()=0;
};
SWIG 會建立下面兩個 python class:
class OVKeyCode(OVBase): ...
和
class OVKeyCodePtr(OVKeyCode): ...
其中 OVKeyCodePtr 的 constructer ( __init__() in python ) 可以用 SWIG 中的表示 C/C++ pointer 的 python pointer object 建立對應的 python object。所以接下來要作的就是要把 C/C++ 的 pointer 轉成 python 中 pointer object。而 SWIG 的作法只是把 C/C++ 中的 pointer 的 address 和 type 換成特殊格式的字串,在 SWIG 所產生的 C/C++ 的 wrap 檔中有一個特別的函式 (char *SWIG_PackData(char *c, void *ptr, int sz) ) 就是把 C/C++ 的 pointer 換成字串,所以我們就可以用這個函式將 C/C++ 的 pointer 轉成對應的 python 字串然後透過 SWIG 產生的 aClassPtr 來產生 python 中的 aClass 的 instance,而這個 python instance 的 implementation 就是對應的 C/C++ implementation。
這樣的 mapping 實在有點太複雜而不直覺。還沒有真的詳讀 SWIG 的文件,不知道有沒有比較優雅的方式來作同樣的事。雖說如此,對要用 python 寫輸入法的人可以完全不去理 wrapper 本身及兩個語言的物件對應的複雜性,專注在用 python 來寫輸入法。 在lukhnos 稍早寫的用Python + OpenVanilla寫輸入法中有用 python 的 OV 輸入法的 minimum example。
我想這只是第一步,我自己來試著了解 embedding python 的小小練習。如果有空的話,再看看如何真的用 python 在 OV 裡作些有趣的事。
The following code create circular references in python:
>>> aRef = [] >>> aRef.append(aRef) >>> print aRef [[...]]
This creates a list object referred by a variable named “aRef”. the first element in the list object is a reference to itself. In this case, the “del aRef” dereference aRef to the list object. However, the reference count of the list object does not decrease to zero and the list object is not garbage collected, since the list object still refers to itself. In this case, the garbage collector in Python will periodically check if such circular references exist and the interpreter will collect them. The following is an example to manually collect the space used by circular referenced objects.
>>> import gc >>> gc.collect() 0 >>> del aRef >>> gc.collect() 1 >>> gc.collect() 0
Quickhull is an algorithm that is similar to the quicksort using a divide and conquer strategy to find the convex hull for scattered points. The green line in the plot is the initial base line and gray lines are the intermediate base line. The final convex hull is shown in red.
source code: qh.js
<
Diggs in a box
Feb 12
I have been a fan of the “treemap” algorithm to visualize data with tree-like structure . Finally, I decide to implement one last Friday noon. After a few hours, I had a basic Treemap class in python. When I was testing the class, I found the algorithm can generate some interesting artistic painting effect.
The following is a composition using the Gene Ontology. I guess this is one of the many ways to visualize the so-called Systems Biology. You can also click on the image to see some other randomly generated compositions.
Inspired by Newsmap, I tried to see if I can use the python Treemap class and some javascript to map Diggs to a treemap. So, I spent a few hours in the last weekend making this little toy — Diggs in A Box.
It is rather primitive now and I can think a long TODO list to improve it. Hopefully, I will get some time to improve its functionality and appearance eventually. If you have any suggestions, please leave a message and I hope you find such presentation of data interesting and useful.
Review on Mitchell Model’s book, “Bioinformatics Programming Using Python”
Feb 15
Posted by Jason Chin in comment, programming | No Comments
I am helping a local Pyhton interests group for a review of the book “Bioinformatics Programming Using Python” by Mitchell Model. Here is my review.
—
Comparing to Perl, Python has a quite lagged adoption as the scripting language of choice in the field of bioinformatics, although it is getting some moment recently. If you read job descriptions for bioinformatics engineer or scientist positions a few year back, you barely saw Python mentioned, even as a “nice to have optional skill”. One of the reasons is probably lacking of good introductory level bioinformatics books in Python so there are, in general, less people thinking Python as a good choice for bioinformatics. The book “Beginning Perl for Bioinformatics” from O Reilly was published in 2001. Almost one decade later, we finally get the book “Bioinformatics Programming Using Python” from Mitchell Model to fill the gap.
When I first skimmed the book “Bioinformatics Programming Using Python”, I got the impression that this book was more like “learning python using bioinformatics as examples” and felt a little bit disappointed as I was hoping for more advanced content. However, once I went through the book, reading the preface and everything else chapter by chapter, I understood the main target audiences that author had in mind and I thought the author did a great job in fulfilling the main purpose.
In modern biological research, scientists can easily generate large amount of data where Excel spreadsheets that most bench scientists use to process limiting amount of data is no longer an option. I personally believe that the new generation of biologists will have to learn how to process and manage large amount inhomogeneous data to make new discovery out of it. This requires general computational skill beyond just knowing how to use some special purpose applications that some software vendor can provide. The book gives good introduction about practical computational skills using Python to process bioinformatics data. The book is very well organized for a newbie who just wants to start to process the raw data their own and get into a process of learning-by-doing to become a Python programmer.
The book starts with an introduction on the primitive data types in Python and moves toward the flow controls and collection data type with emphasis on, not surprisingly, string processing and file parsing, two of most common tasks in bioinformatics. Then, the author introduces the object-oriented programming in Python. I think a beginner will also like those code templates for different patterns of data processing task in Chapter 4. They summarize the usual flow structure for common tasks very well.
After giving the basic concept of programming with Python, the author focuses on other utilities which are very useful for day-to-day work for gathering, extracting, and processing data from different data sources. For example, the author discusses about how to explore and organize files with Python in the OS level, using regular expression for extracting complicated text data file, XML processing, web programming for fetching online biological data and sharing data with a simple web server, and, of course, how to program Python to interact with a database. The deep knowledge of all of these topics might deserve their own books. The author does a good job to cover all these topics in a concise way. This will help people to know what can be done very easily with Python and, if they want, to learn any of those topic more from other resources. The final touch of the book is on structured graphics. This is very wise choice since the destiny of most of bioinformatics data is very likely to be some graphs used in presentations and for publishing. Again, there are many other Python packages can help scientists to generate nice graph, but the author focuses on one or two of them to show the readers how to do general some graphs with them and the reader might be able to learn something else from there.
One thing I hope the author can also cover, at least at a beginner level, is the numerical and statistical aspect in bioinformatics computing with Python. For example, Numpy or Scipy are very useful for processing large amount of data, generating statistics and evaluating significance of the results. They are very useful especially for processing large amount data where the native Python objects are no longer efficient enough. The numerical computation aspect in bioinformatics is basically lacking in the book. The other thing that might be desirable for such a book is to show that Python is a great tool for prototyping some algorithms in bioinformatics. This is probably my own personal bias, but I do think it is nice to show some basic bioinformatics algorithm implementations in python. This will help the readers to understand a little bit more about some of the common algorithms used in the field and to get a taste on a little bit more advanced programming.
Overall, I will not hesitate to recommend this book to any one who will like to start to process biological data on their own with Python. Moreover, it can actually serve as a good introductory book to Python regardless the main focus on bioinformatics examples. The book covers most day-to-day basic bioinformatics tasks and shows Python is a great tool for those tasks. I think a little more advanced topics, especially on basic numerical and statistical computation in the book, will also help the target audiences. Unfortunately, none of that topic is mentioned in the book. That has been said, even if you are an experienced python programmer in bioinformatics, the book’s focus on Python 3 and a lot of useful templates might serve well as a quick reference if you are looking for something you do not have direct experience before.