Archive for category hacking

Running the de Brujin graph IPython Notebook Example

First, you will need to have a correct python environment to use the notebook. If you don’t have an ipython 0.13 environment installed, you need to setup for whatever computer you are using. I use a Mac or a Linux VM for most of my personal work. (Linux VM/Linux/Windows for my day time job.) Here is my instruction on how I set it up on my own computer: pythonbrew_ipython.rst

Once you get the pythonbrew environment installed. Start with a clean shell. Activate the environment by

$ source "$HOME/.pythonbrew/etc/bashrc"

Now go to a clean working directory. Let’s call it “workdir”.

$ cd workdir

You will need to install “networkx” for the de Brujin group example to work.

$ pip install networkx

Then, you can clone my github repository and start IPython Notebook in a correct directory

$ git clone https://github.com/cschin/ipython_d3_mashup.git
$ cd ipython_d3_mashup/ipython_13_vis_example/
$ ipython notebook --pylab=inline

If you have a Mac OS X.8 and you don’t block any local host port, you should get your Safari browser pop up.
Click the link pointing to the “De_Bruijn_VIS” notebook. It should open the notebook. Shift-Enter to execute each cell.
The notebook does use some IPython extension mechanism to download some extra code from the github python_d3_mashup repository. Sometimes, you might need to run the notebook twice to make sure some of the extension is working.

I could put up a video instruction later if I have time. And if you damage on your computer by following these instructions, you are on your own.

Hacks About d3.js + IPython 0.13 official release

Thanks the IPython team for their excellent work on 0.13 release. The new re-factored javascript for IPython 0.13 notebook makes writing mashup using d3.js + IPython simpler. I put two examples in ipython_13_vis_example/ at github

The examples should work with my own IPython vis_0.13 branch. (More specifically, I test them with this commit.)

In IPython 0.13, I do not need to patch IPython’s official CodeCell and Notebook javascript like in the previous hack. I add two extra files, IPython/frontend/html/notebook/static/vis/vis_extension.js and IPython/frontend/html/notebook/visutils.py, in the code tree to support excuting javascript code from IPython and excute Python code from javascript.

The GDP_CO2_Example.ipynb only uses the vis_extension.py. It shows how to make a movable chart with IPython + d3.js.

The Word_Ladder_network_vis.ipynb is an experiment to show how to build interacitve widget to show it is possible to use python code as callbacks for some html elements.

Currently, I don’t feel happy about the visutils.py code. It is quite ugly. If time permits, I will think a better way to make the mapping between javascript objects and python objects more transparent. It is quite tricky to debug if any simple mistake is in the code.

Once I get some more time over the coming weekend. I can post some screen shots or videos.

Can Not Resist Hacking IPython + d3.js, Another Force Layout Demo for Word Ladder Game

Post a video for visualizing the neighbors of English word (http://en.wikipedia.org/wiki/Word_ladder). It shows what can be done now with minimum change to IPython 0.13-dev source + some simple monkey patches.

I will write down what I think where we can go from here later. The IPython notebook can be download from here. The monkey patches that make this working can be downloaded from my fork of the IPython source code from the GitHub site too.

Yet another ipython + d3.js example: motion chart

I gave a lightening talk in a recent Bay Area Python meet-up. I went over some of my recent hacks on combining ipython notebook and d3.js. What I wanted to show was how to mix python code and javascript code to create a dynamic programming/data analysis notebook. I created yet another example to demonstrate the great potential on combining the powerful tools.

If you are interested, you can try the this ipython notebook. You will need to download the development branch of the ipython v 0.13 to see the notebook. The notebook itself includes some of the explanation on how to run it and how it is done. I did not spend too much polishing the code and the motion chart, but it got the basic ingredients. If you want to peek it, here is a short screen recoding to show it looks like.

Experimenting with ipython notebook bi-directional communication

Thanks for Brian Granger pointing out how to make bi-directional communication from javascript in a ipython-notebook front-end to back-end ipython kernel using the existing websocket/zmq channel architecture in ipython (see the thread ). I have been hacking around to see how to do it. I need to modified a few lines of the ipython-notebook javascript to make it work ( see my github commit ). I wrote some example to show how it works ( the ipython notebook and a screen shot ). Pretty cool that it works. It seems that one can develop a widget library to avoid hand-crafting both the javascript and python code for such communication. All right, one more small step toward to building some cool interactive visualization / analysis tools with ipython.

iPython Notebook / d3.js mashup

While I have been using ipython for a long time, I never really it more than just checking whether some code snippets working as expected. (Well, I tried to play with the parallel computing framework with ipython, but I never put it into production.) Just recently, I start to look into the ipython web-based notebook feature more carefully. It is great and make me think the ipython will make a python programmer or someone uses python for data analysis much more productive. (I used to envy the “RStudio” in the R-lang land, now, we python programmer finally have something more competitive.)

The cool thing using a web page as front-end is there are a lot potential using web interface for some cool visualization. I played with protovis.js a while ago. Recently, I went to a visualization meets-up, d3.js was mentioned a numbers of time. Then the idea comes to my mind “is it possible to combine the best of two world, python and d3.js?” After consulting some more experience users in the ipython-dev mailing list to see what is possible, I decided to spend some of my weekend time to hack it around. In the meantime, I get the chance to play with tornado, zero-mq and websocket, all the fun stuff these days. At the end, I am able to pass some javascript code written within the ipython notebook to get the browser to execute it and show some animation with d3.js. This will enable to create more fancier visualization in an interactive way all in a browser.

My weekend hacking results are hosted at github . I think there is a great potential to make thing like this working better. (For example, can we have a pythonic backend of d3.js? :) ) It definitely worth to mess it around to see more use like this.

How to implement the Needleman–Wunsch alignment algorithm without using a single loop in Python

I am still fascinated about the programming style using co-routine. Actually, it is possible to implement the Needleman–Wunsch alignment algorithm by purely message passing fashion. The following code shows how to implement the algorithm using co-routines again. I modify the code from my previous post such that the alignment array itself is also generated dynamically. We can completely remove those setting up loops. This code is also annotated to show how it is done. If any reader is interested and have any comment, I do like to hear.

# @author Jason Chin
#
# Copyright (C) 2011 by Jason Chin
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

"""

This is an example to implement Needleman-Wunsch sequence algorithm using python's
co-routine. One of the most interest aspect of such implementation is that there 
is no explicitly loop. You can not find either the "for" nor "while" keywords
in this code.  Each alignment cell is a co-routine and the calculation of alignement
score and backtracking that generates the alignment string are done with a message
passing fashion.  The alignment cells are also generated in a dynamic way.  A 
banded alignment can be done by limiting not generate the whole alignment array but
only the banded part of the array.

Is it useful? I am not sure, but it is definitely fun to show it is possible.

--Jason Chin, Apr. 10, 2011

"""

### Set up the alignment score scheme
matchScore, mismatchScore, gapScore = 4, -3, -4

### Two testing string for alingment
seq1 = "TTAAGTGTAGCCTTGTGTGACATGTATTTTTAT"
seq2 = "TTTCTAGGTAGTTGTGGTGAGTTTAGTTGATAT"

### cellMap is a dictionary that maps integer pairs to the co-routines
cellMap = {}

### For tracking the global best alignment cell
globalBestCellScore = [None, -100000]



def getAnAlignCell(x, y, seq1, seq2):
    """
    This function returns a co-routine the represents an alignment cell at position
    x and y.  The alignment strings are passed explicilty for simplicity.
    """

    def alnCell():

        """
        This is the co-routine for an alignment cell. A alignment cell co-routine is
        excuted in roughly two stage. The first stage it collects the alignment score
        from the cells at (x-1,y-1), (x-1,y), and (x, y-1) and calculate the best 
        alignment score. Depending the alignment path through the alignment cell, a new
        alignment score is generated and passed to the cells at (x+1, y+1), 
        (x+1,y), and (x, y+1). If any of those cell has not be generated, it will 
        generate the co-routine and regisiter them with the cellMap dictionary. After 
        this it waits for the backtracking caculation.  If a cell is in the best alignment
        path, it will pass the best alignment pair to next cell in the best alignment
        path.
        """

        global globalBestCellScore
        global cellMap

        b1, b2 = seq1[x], seq2[y] 
        mx, my = len(seq1), len(seq2)

        cellData = []

        # if the cell is on the top or the left side of the alignment, they only have
        # to wait for one other cell to pass in the alignment score. Otherwise, they
        # need to collect three messages from those (x-1,y), (x,y-1), and (x-1, y-1)
        # before they can do any calculation.
        if x == 0 or y == 0:
            cellId, s = yield 
            cellData.append( (cellId, s) )
        else:
            cellId, s = yield 
            cellData.append( (cellId, s) )
            cellId, s = yield 
            cellData.append( (cellId, s) )
            cellId, s = yield 
            cellData.append( (cellId, s) )

        # find the best cell that gives the best alignment score
        cellData.sort( key=lambda x: -x[1] )
        bestCell, bestScore = cellData[0]

        if bestScore > globalBestCellScore[1]:
            globalBestCellScore = [ (x,y), bestScore ]

        # pass the new alignment score to (x+1, y+1)
        if x+1 < mx and y+1 < my:
            # generate the cell at (x+1, y+1) if necessary
            if (x+1, y+1) not in cellMap:
                cellMap[ (x+1, y+1) ] = getAnAlignCell( x+1, y+1, seq1, seq2 )()
                cellMap[ (x+1, y+1) ].next() 
            if b1 == b2: # a match, seq1[x] == seq[2], new_score = bestScore + matchScore
                cellMap[ (x+1, y+1) ].send( ((x,y), bestScore + matchScore) ) # pass the new score to cell (x+1, y+1)
            else: # a mismatch, seq1[x] != seq[2], new_score = bestScore + mismatchScore
                cellMap[ (x+1, y+1) ].send( ((x,y), bestScore + mismatchScore) ) # pass the new score to cell (x+1, y+1)
        # pass the new alignment score to (x+1, y), namely, the base seq1[x] is aligned to a gap
        if x+1 < mx:
            # generate the cell at (x+1, y) if necessary
            if (x+1, y) not in cellMap:
                cellMap[ (x+1, y) ] = getAnAlignCell( x+1, y, seq1, seq2 )()
                cellMap[ (x+1, y) ].next() 
            cellMap[ (x+1, y) ].send( ((x,y), bestScore + gapScore) )
        # pass the new alignment score to (x, y+1), namely, the base seq2[y] is aligned to a gap
        if y+1 < my:
            # generate the cell at (x, y+1) if necessary
            if (x, y+1) not in cellMap:
                cellMap[ (x, y+1) ] = getAnAlignCell( x, y+1, seq1, seq2 )()
                cellMap[ (x, y+1) ].next() 
            cellMap[ (x, y+1) ].send( ((x,y), bestScore + gapScore) )
            
        path = yield # wait, if the cell is on the best path, the co-routine will resume 


        # generate the alignment pair according the best alinged cells
        if bestCell[0] >= 0 and bestCell[1] >=0 :
            if path == None:
                path = []
            
            if bestCell[0] - x == 0:
                c1 = "-"
            else:
                c1 = seq1[x-1]
            if bestCell[1] - y == 0:
                c2 = "-"
            else:
                c2 = seq2[y-1]
            path.extend( [ (c1, c2) ] )
            
            # send calculated partial path to the best alingment cell to this cell
            cellMap[ bestCell ].send(  path   )
        
        # return the best path if bestCell[0] = -1 or bestCell[1] = -1
        yield path

    return alnCell


# initialize the cell at (0,0)
cellMap[ (0,0) ] = getAnAlignCell( 0, 0, seq1, seq2 )()
# prime it
cellMap[(0,0)].next()
# start the whole execution by sending in the initial score to cell at (0,0)
cellMap[(0,0)].send( ( (-1, -1), 0 ) )

# get the best global cell
bestCell = globalBestCellScore[0]

# continue to excute the best cell co-routine to get the alignment path
bestPath = cellMap[bestCell].next()
bestPath.reverse()

# some simple mechinary to print out the alignment path
alnRes = zip(*bestPath)
print "".join(alnRes[0])
print "".join(alnRes[1])


The result:

$ python coAlign_v2.py   
-TT-AAGTGTAGCCTTGT-GTGACATGTA-TTTTTA
TTTCTAG-GTAG--TTGTGGTGA-GTTTAGTTGATA

Yet Another Python Coroutine Fun Stuff

It might be a totally useless python hack. Yes, it is possible to implement dynamic programming using message passing style python co-routine with the enhanced python generator. Here is the code. I will write some details about how this piece code works. However, the main idea is simple (although you might need some background knowledge about sequence alignment algorithm.) We create a co-routine for each alignment cell. The alignment score is generated by passing the best score around the neighboring cells. The backtracking is also implemented as message passing backward.

matchScore, mismatchScore, gapScore = 4, -5, -3
seq1 = "AGTGTAGTTGTGTGAATGTATTTTTAT"
seq2 = "AGGTAGTTGTGGTGATTTAGTTGATAT"

cellMap = {}
globalBestCellScore = [None, -100]

def getAnAlignCell(x, y, p):
    def f():
        global globalBestCellScore
        global cellMap
        b1, b2 = p
        cell1Id, s1 = yield 
        cell2Id, s2 = yield 
        cell3Id, s3 = yield 
        cellData = [ (cell1Id, s1), (cell2Id, s2), (cell3Id, s3) ]
        cellData.sort( key=lambda x: -x[1] )
        bestCell, bestScore = cellData[0]
        if bestScore > globalBestCellScore[1]:
            globalBestCellScore = [ (x,y), bestScore ]
        if x+1 < len(seq1) and y+1 < len(seq2):
            if b1 == b2:
                cellMap[ (x+1, y+1) ].send( ((x,y), bestScore + matchScore) )
            else:
                cellMap[ (x+1, y+1) ].send( ((x,y), bestScore + mismatchScore) )
        if x+1 < len(seq1):
            cellMap[ (x+1, y) ].send( ((x,y), bestScore + gapScore) )
        if y+1 < len(seq2):
            cellMap[ (x, y+1) ].send( ((x,y), bestScore + gapScore) )
            
        path = yield
        if bestCell[0] >= 0 and bestCell[1] >=0 :
            if path == None:
                path = []
            path.extend( [ (x,y) ] )

            cellMap[ bestCell ].send(  path   )
        yield path
    return f

for x in range(len(seq1)):
    for y in range(len(seq2)):
        cellMap[ (x,y) ] = getAnAlignCell( x, y, (seq1[x], seq2[y]) )()
        cellMap[ (x,y) ].next()

for x in range(len(seq1)):
    cellMap[ (x,0) ].send( ( (x, -1), 0 ) )
    cellMap[ (x,0) ].send( ( (x-1, -1), 0 ) )

for y in range(len(seq2)):
    if y != 0:
        cellMap[ (0,y) ].send( ( (-1, y), 0 ) )
        cellMap[ (0,y) ].send( ( (-1, y-1), 0 ) )

cellMap[(0,0)].send( ( (-1, -1), 0 ) )

bestCell = globalBestCellScore[0]
bestPath = cellMap[bestCell].next()
bestPath.reverse()

s1 = []
s2 = []
px, py = bestPath[0]
for x,y in bestPath[1:]:
    if x - px != 0:
        s1.append(seq1[px])
    else:
        s1.append("-")
    if y - py != 0:
        s2.append(seq21)
    else:
        s2.append("-")
    px, py = x, y
print "".join(s1)
print "".join(s2)

The result seems to be correct

$ python coAlign.py   
GTGTAGTTGTGTGAATGTATTT--TT-A
G-GTAGTTGTG-G--TG-ATTTAGTTGA

Python Generator Fun

The following python code generates 100 by 100 = 10,000 generators and use them to simulate 100 step random walk 500 times. Not particular useful thing but it was fun to find out you can simulate random walk differently. I will probably try to write some dynamical programming code using the extensive generator in python (co-routine like construct) if I find some time to work on it.


import random

maxStep = 100
fmap = {}
def getFun(i,j):
    def f():
        path = [(i,j)]
        while 1:
            if i < maxStep - 1:
                path.extend( fmap[ (i+1, j+1) ].next() if random.uniform(0,1) > 0.5 else fmap[ (i+1, j) ].next() )
            yield path
            path = [(i,j)]
    return f

for i in range(maxStep):
    for j in range(maxStep):
        f = getFun(i,j)()
        fmap[ (i,j) ] = f

for i in range(500):
    print i, [ x[1] for x in fmap[ (0,0) ].next() ]

How to save PDF/Images from Mobile Safari to WebDAV compatible iPhone program?

Unlike most desktop browser, there is no “download” option to save an image, a PDF file, etc., from the Mobile Safari so one can read thoses files offline. Well, this is purely a restriction imposed by Apple for security or whatever other stupid reasons. Fortunately, with the most recent update of the operation that support cut and paste and some third party WebDAV compatible file viewer, there could have a workaround.

I have recently set up a system such that when I found an interesting URL, I can cut and paste the URL to a widget served from my own host such that it will fetch the resource from the URL, and save it to a location that corresponds to the location of a WebDAV service in my web host account. Then, I can setup the WebDAV compatible file viewer (currently, I am using Air Share Pro) to view the file in my WebDAV direcotry. Specially, you can also download the files in the WevDAV server locally for future offline reading. This is extreme useful. Once the file is local, you don’t have to worry if you get wifi, EDGE, or 3G signal, and you might also save some battery power since you don’t have to reload the files from the network when you read a document on and off.

The main caveat is that you probably need your own web hosting service or iDisk that supports WebDAV. Beside that, you will need to some simple CGI/AJAX programming and spend some money to buy a WebDAV compatible file viewer like Air Share Pro. I won’t be surprised that one day there will be an iPhone app that implements something like this using a similar idea in a more polished way. Nevertheless, such application is always at Apple’s mercy. Once Apple opens some policy in the OS allowing sharing data and files between application like an real operation system, all workarounds will be totally useless. Wait, actually, the way I set up getting file to iPhone is not limited for submitting URL from Mobile Safari. Any browser can push the resource from an URL to the WebDAV directory, regradless if an desktop OS supports mounting a WebDAV volumn. Not bad as an quick method to get some useful document in the Internet from a desktop system to iPhone.

Tags: , ,