thoughtwisps One commit at a time

Hello and welcome to thoughtwisps! This is a personal collection of notes and thoughts on software engineering, machine learning and the technology industry and community. For my professional website, please see race-conditions. Thank you for visiting!

Conversations about Python Dicts

What is the main conceptual difference between dictionaries and lists?

Lists in Python store objects based on a positional offset and are fetched based on an index whereas in dictionary objects are fetched by keys.

#fetching entries in a list based on position
securities=['equities', 'bonds', 'options', 'futures']
print securities[0]

>>> equities
#fetching entries in a dictionary based on key

employees={'engineering':['Lisa', 'Ann', 'Bob'],
	   'marketing':['Charlie', 'Mike']}
print employees['engineering']

>>> ['Lisa', 'Ann', 'Bob']

What does it mean for a dictionary to be mutable?

A variable stores a reference to a dictionary not a copy. Not understanding this difference fully can sometimes lead to silly or rather serious runtime errors.

icecream={'strawberry':4, 'blueberry':5, 'banana':6}
new_icecream=icecream

The variable new_icecream refers to the exact same dictionary as the variable icecream. We can verify this by adding another item in icecream and then calling new_icecream to retreve that item.

icecream['raspberry']=9
new_icecream['raspberry']
>>>9

Python did not throw a KeyError, which means that there exists a key called raspberry in the dictionary referred to by the variable new_icecream even though we used the icecream variable to add it to the dictionary

What are some alternatives to literals when constructing dictionaries?

The vanilla way to construct a dictionary in Python is to use the literal expression.

vanilla_dictionary={'rasberry':4, 'vanilla':2}

A dictionary can also be constructed by calling dict().

another_dictionary=dict(raspberry=4, vanilla=2)

Alternatively, we can use a list of tuples (key, value) pairs.

dictionary_from_tuples=dict([('raspberry',4),('vanilla',2)])

Sometimes, your functions will give you separate lists for the key and for the values. In this case, it will be useful to employ the zip functions to create key-value pairs and then pass the key-value pairs to the dict() function.

flavours=['linux_mint', 'ubuntu','debian','fedora','redhat','scientific_linux']
number_of_users=[40,30,90,100,80,10]
print zip(flavours,number_of_users)
>>>[('linux_mint', 40), ('ubuntu', 30), ('debian', 90), ('fedora', 100), ('redhat', 80), ('scientific_linux', 10)]

As we can see, the zip() functions creates tuples. We can then pass the zip() function the dict() function to create a dictionary.

nix_users=dict(zip(flavours, number_of_users))
print nix_users
>>>{'scientific_linux': 10, 'fedora': 100, 'redhat': 80, 'linux_mint': 40, 'ubuntu': 30, 'debian': 90}

How do I find out about the other methods available for dictionaries?

Execute dir(dict) or help(dict).

dir(dict)
>>>
['__class__',
 '__cmp__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'has_key',
 'items',
 'iteritems',
 'iterkeys',
 'itervalues',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values',
 'viewitems',
 'viewkeys',
 'viewvalues']

As you can see, the list is vast!

LinuxChix London

LinuxChix London is coming back!

I have an exciting announcement: I will be restarting LinuxChix meetups in the London area. The LinuxChix community has given me a lot of wonderful support and encouragement over the years and I would love to give something back to the community.

I will posting details of mailing lists and Meetup pages soon!

Today I learned

Some things I learned today:

  • I still suck at working with Git on shared repositories. Merge conflicts, arrgh! In particular I need to improve in the following scenarios: (1) Knowing how to pull correctly when someone has pushed changes to a remote repo in the same branch that I am using. (2) Establishing an effective workflow for developing new features without messing up the main branch

  • Setting up even a simple app with Flask is way beyond me.

  • Something about Python decorators

  • A Python Flask app process does not die when I execute the standard kill

  • The with statement in Python

10th PyData London Meetup recap

Yesterday, I attended the 10th Pydata London Meetup at Lyst. The venue was packed by the time I arrived: I never realised there were so many data obsessed Pythonistas in town!

The Talks

During the course of the night, the audience was treated to two great talks: Kim Nilsson talked about building great data analyst teams and Evgeniy Burovskiy talked about cool things you can do with Numpy. I’d like to take a moment to write down some very interesting pointers I took away from the talks.

Data Geeks Unite! Communication and Team-building 101 for Scientists, Analysts and Engineers.

I hate to say this, but I am usually very prejudiced against ‘soft skills’ talks at meetups. They inevitably remind me of some Agile retrospective’s that I’ve attended, where people promise to do a lot of things to improve how the team works, but never actually get around to implementing those ideas in practice. I am happy to say Kim’s talk proved me 100% wrong. Here are some interesting things I took away form the talk:

  • if you are transitioning from an academic role into a business role, be prepared for culture shock

My own experience matches pretty well with this. I graduated last fall and immediately accepted a job at software engineering house. All of my experiences for the first few months were isolating and foreign.

  • business deadlines are shorter than the deadlines in academia

In academia, deadlines are often well defined and far away in the future. For example, your thesis might be due in 12 months. Many tech business, (at least in agile software engineering teams) operate in sprints and one usually releases something every few weeks.

  • collaboration with your co-workers is more ‘intense’ than in academia

When I was completing my Master’s degree, I attended maybe 10 hours worth of lectures and tutorials and the rest of the time was devoted for research. An academic job allows you to hack away on your code pretty much without interference save for the occasional meeting with a supervisor and perhaps some seminars. Coding at a business is different. There are daily stand-ups and sprint demos. Your co-workers will likely pair up with you and code review what you do. In other, words you will spend a lot more time with your co-workers.

  • spend some time researching how to build a great team

Kim talked about Belbin’s 9 different team roles and how an effective team can be structured by placing different personalities in different roles at key stages in constructing the product (or data model).

  • allow time for learning and playing

One of the most amazing things about writing code for a living is that one has the opportunity to learn something nnew every day. At my current job, there has not been a day that I have not picked up at least one new amazing thing about a programming language, a software engineering design principle or computer architecture. In fact, the flow of useful and important information was so high, I felt compelled to start this blog and a learning journal! I highly recommend it to other programmers and data scientists. Reflecting on what you learned solidifies the concepts in your mind. Even better if you can teach it to someone else later on!

“SciPy Roadmap discussion (with short intro to numpy/scipy)”

Evgeniy’s talk was a menagerie of Numpy and Scipy gotchas and a bunch of useful tips for Numpy newbies such as myself. Some of the most important points I took away from the talk:

  • Numpy runs on arrays. Know your arrays and use them.
import numpy as np 

array=np.arange(10)
print array
#array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Applying an expression on (to?) the variable array will apply it on every element in the array.

array1=array+1
print array1
#array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
  • Broadcasting can introduce memory and performance issues

I have to admit, I am not a big expert on broadcasting, but is certainly something that would be cool to explore further!

  • You achieve in-place operations on Numpy array by piping the output back into the input
random_array=np.random.random_sample(10)
np.exp(random_array, out=random_array)
  • Practice your Numpy!

There is a great series of exercises compiled by Nicolas Rougier, which will take you all the way from Neophyte to Numpy master. I’m still working my way through the Neophyte level :)

  • You can write Conway’s Game of Life in only a few lines of Numpy.

This looks like a great weekend project!

Generic Types In Java

Learning braindump entries are not meant to be fully fleshed out blog posts. Instead they represent my attempt to form a consistent mental model of the issue I am trying to master. As such, they may not be always entirely clear or well-written. Apologies!

Generic Types in Java (the super basics)

Generic Types are Java classes or interfaces, which allow the programmer to pass types (ie. other classes and interfaces) as parameters. This is useful, because it allows the same class to be reused with multiple different types.

The Java tutorial track on Generics provides a very helpful example, the Boxclass. Suppose we want to have a class that can hold and manipulate a range of other Java classes.

To implement this idea without using generics, we may have to result to something like the following:

public class Box{
	private Object object;

	public Box(Object object){
		this.object=object;
	}
}

With generics, we can pass the type as a parameter. For example, we could rewrite the Box class to use generic type parameters as follows:

public class Box<T>{
	private T object;

}

The type parameter can be used to flesh out the implementation of the class

public class Box<T>{
	private T object;

	public Box(T object){
		this.object=object;
	}
}

What types can the type variable take?

A type variable can be any non-primitive type: interface, class, array type or another type variable. (At the moment, having another type variable as the type of a type variable is a bit unclear)

How do I create an instance of a generic type?

Creating an instance of a generic type is called generic type invocation. During invocation you would ideally supply the type you want the generic type to use. Simply put, replace the T in Box with another type.

For example,

//create a Box that holds Strings

Box<String> stringBox = new Box<String>(String helloWorld)

What is a raw type and how do I create one?

It is possible to create an instance of a generic type without supplying type arguments. In the Box example, this would look something like the code snippet below:

//create a raw Box type

Box randomBox = new Box();

However, Eclipse will probably give you friendly nudges if you do this.