thoughtwisps One commit at a time

Hello and welcome to thoughtwisps! This is a personal collection of notes and thoughts on software engineering, machine learning and the technology industry and community. For my professional website, please see race-conditions. Thank you for visiting!

Random Bezier Curves with Bokeh

Recently, I have been exploring Bokeh, a Python library for creating beautiful data visualizations in the vein of d3.js. Not only does the library provide a good API for creating anything from histograms to chloropleths, but it also provides support for interactive graphs with real-time data updates, which is especially fascinating for anyone working with streaming data such as stock prices.

One of my current side-projects involves creating a dynamically updating map of the London Underground and for that I need to create a map-like visualization of the relative positions of the stations. My initial experiments with using only circles to indicate the positions were less than satisfying (see the pseudomap below).

London Tube Map

To make the map more realistic, I need to connect the circles together using either lines or curves. Enter Bokeh’s Bezier curve functionality. I am still in the process of figuring out exactly how to create a beautiful tube map using Bezier curves and circles. My primary obstacle lies in understanding the mathematics behind the construction of the Bezier and how this translates to controlling the curvature. For anyone struggling with similar questions, here is a very good introduction to Bezier curves. For now, I have just been experimenting by creating random Bezier curves using Bokeh. Below is an example of Bezier curves with random start and endpoints.

Random Bezier Curve using Bokeh

The entire Python snippet used to create the image is given below.

#import the requires libraries
from bokeh.plotting import *
import numpy.random as np

#set the output HTML file, which will contain the graph
output_file('test-tube-map.html')

#generate random start, end and control points
start_point_x=list(np.random_sample(10))
start_point_y=list(np.random_sample(10))
end_point_x=list(np.random_sample(10))
end_point_y=list(np.random_sample(10))

first_control_point_x=[(a+b)/2 for a in start_point_x for b in end_point_x]
first_control_point_y=[(a+b)/2 for a in start_point_y for b in end_point_y]
second_control_point_x=[2*(a+b)/3 for a in start_point_x for b in start_point_x]
second_control_point_y=[2*(a+b)/3 for a in start_point_y for b in end_point_y]

#create the Bokeh plot object
bokeh_plot=figure(title="Some funky Bezier curves")

#add the Bezier curves to the plot object
bokeh_plot.bezier(start_point_x, start_point_y, end_point_x, end_point_y, first_control_point_x, first_control_point_y,
                  second_control_point_x, second_control_point_y)

Programming braindump : Week 09/02/2015

The Braindump

  • I went to a great event organized by Women in Data and Pyladies London. The evening was sponsored by Man group’s AHL fund and included talks by AHL quant developers and quant analysts. All in all, it was a great evening and I am now quite fascinated about the hedge fund world and in particular, the software development that goes into making pricing models.

  • At work, I am working on improving a source-to-source compiler, which transforms Java code into C# and JavaScript code. Needless to say, I am learning a whole lot about compilers, Abstract Syntax Trees and and about the JVM. There might be a blog post in the near future about compilers - they are truly fascinating!

  • I have been reading Cal Newport’s blog. I first came across Newport’s writings in college and was quite taken with his debunking of the “follow your passion” -thesis for career success. Although I read his advice on being successful at college, various issues in my personal life got in the way of implementing these strategies. I’d like to these strategies another try.

  • I started working on a data simulation project related to the London underground. Hopefully, I will be able to give an update post soon.

  • I finally started using Pycharm and I can say that my Python programming productivity has increased. On the other hand, people say that IDEs make a developer stupid. So I’ll have to keep close tabs on the situation and maybe bust out good old vim once in a while (vim is great, by the way, but for TDD I find an IDE to be easier)

On-going programming reading list

  • Learning Python by Mark Lutz: Still working through part IV: Functions and Generators. Hopefully, will finish it in the coming week

  • Bokeh tutorial (needed for my London Underground programming project)

  • graph_tool tutorial (needed for my London Underground programming project)

An Evening at AHL

London is probably one of the most exciting places to live if you are a software developer. True, the rent is absolutely ridiculous and the Tube gets rather crowded during rush hour, but the amount of activities, conferences and meetups geared toward technologists is absolutely staggering.

I am a member of Pyladies London and Women in Data, two meetup groups who do an excellent job organizing interesting events for women working in software engineering and data analysis. Yesterday, I attended a joint Pyladies and Women in Data meetup sponsored by Man Group’s AHL hedge fund.

At AHL, we heard three great talks, one by Carol Ward, AHL COO, one by Charlie Beeson, a quant developer and one by Giuliana Bordigoni, a quant researcher. Charlie Beeson explained the reasons behind the adoption of Python at AHL. While previously, the quant research and the quant platform part used different technologies (R and Matlab for the the quant research and Java and C++ for the quant platform ), now AHL uses Python to bridge the gap between the two.

In addition to developing in-house Python software, AHL developers also contribute to Numpy, Pandas and SciPy open source projects. The dev environment of choice at AHL is Eclipse IDE with the PyDev plugin. I was quite please to see that AHL had developed the F1 short cut for the PyDev plugin, which allows an RStudio like execution of Python code line by line.

Braindump - week 02/02/2015

The Weekly Programming Braindump

  • Sebastian Rashka has written an interesting post on parallel programming with Python

  • When you are designing a Python method (aka function) that involves file IO, is it “better” (in whatever way a programmer may choose to define better) to pass the file as a filename (string) to the method and let the method take care of opening the file or is better to pass it as a file object and do the whole opening conundrum in the client calling the method/function

  • I started learning about tuple spaces and tuple space programming

  • I learned something about blocking and non-blocking IO in Java, but I can’t say I could explain it to someone else. Must do more research.

  • I started using PyCharm and downloaded Spyder. The weekend should be called “Fun with IDEs”

  • I am planning to write a Spyder plugin for bioinformaticians. That is, as soon as I learn how to use Spyder

  • I started learning about limit order books in finance and started writing my own limit order book simulation based on the orderbook package in R

  • I just completed the data harvesting phase for my London Tube Simulation and I could not be more excited about it!

  • I was inspired by KDB+ and Q (I can’t say I have any experience of either really, I just happened to browse the Wikis for fun) to mangle together a pseudo-useful data structure for my next project.

  • I learned about ctags in Vim (but not, like, how you actually make them useful for your project)

  • I learned something about transcompilers and to be honest, they scare me a bit (Compilers scare me too, but in a good way!)

  • I started reading Ilmari Karonen’s introduction to Redcode again and this time it started making more sense.

  • I have been thinking about the Global Interpreter Lock in Python and how one may try to circumnavigate it and what would the consequences of that be (!!!!)

  • TDD has been starting to stick on me. Esp. after I started using Pycharm. It’s just so addicting to see that green “play” button, click on it and get the satisfying “Tests passed”. Ahhh.

  • I found a paper about computing confidence intervals of the AUC. Previously, Python completely zonked on me, when I tried to implement one of the formulas from the paper, but I think I may give it another try this weekend (armed with numpy and scipy).

  • I finally learned the difference between .bashrc and .bash_profile (yeah, I’m slow…but hey, better late than never!)

Experiment: Communication between two Python scripts

In the “Experiment” series, I write about weird “I’ve always wanted to do this, but I am not sure if this will work” things

For a while now, I have wanted to conduct a very simple experiment. Can two Python scripts communicate with each other while both of them are running? Since both of the scripts are running in their respective shells, there is no immediate way I can think of that would allow a piece of code in one script to pass a variable to a piece of code in another script. Except via a shared space on the disk, such as a file.

Communication between two Python scripts via a shared file

Experiment 1

My first experimental setup involved the following two script files:

  • publisher.py - a class that “publishes” some financial data by writing it to a file

  • subscriber.py - a class that “subscribes” to financial data by reading it from the shared file

#publisher.py
import random
import time

publication_buffer=open("publication.buf", 'w')
trade_number=0

print "Starting exchange"

while True:
    time.sleep(5)
    probability=random.uniform(0,1)
    if probability>=0.5:
        print "Making trade"
        trade_number=trade_number+1
        publication_buffer.write("Trade number"+str(trade_number)+"\n")
#subscriber.py
number_of_lines=0
print "Opening trade channel"
while True:
    trade_buffer=open("publication.buf", 'r')
    lines=trade_buffer.readlines()
    if len(lines)>number_of_lines:
        print "Update received"
        print lines[len(lines)-1]
        number_of_lines=number_of_lines+1

The publisher is started first to give it a chance to “warm up” (ie. open the file and start publishing trades). Then after a few seconds, the subscriber starts listening to the published trades. In theory, this seems like something that might work. This is what happened:

#shell 1

$ python publisher.py
Starting exchange
Making trade

#shell 2
$ python subscriber.py
Opening trade channel

For some reason, nothing is received by the other file. If subscriber would be able to read the values published by publisher, some values would be printed in the console. But nothing happens! Also, immediately after launching this operation, my computer starts to whir and hangs. Something is going on!

Let’s look at the output of top to see if something is consuming a lot of computational power and indeed python is the top process (eating up 80% of the CPU). Now I have to admit, my hardware could use a bit of upgrading (it dates back to the stone age, aka 2010), but 80% CPU for something as simple as the above two scripts seems

Investigation will continue. :)