The Process of Building a Small Project

TUESDAY, MAY 01, 2012

The past week I've been stuck somewhere known as home, in between school and work, with most of my friends still stuck in one of the above. Naturally, this was a good opportunity to get some coding done and play around. I decided to make a web game, and play around with node, coffeescript and html5 at the same time. You can try it here

It took about 8 days to go from no knowledge to a working game up and running online, and I would estimate around 50 hours in total. I'll try to go through the 8 days, because I feel like it mirrors a typical small scale learning project. As a side note, I didn't actually start working on my completed game until day 5 (the original project is still under construction), but the process from days 1-4 is still applicable in this case.

Day 1: What the hell is going on?

The first I usually do is open up some tutorials on the various topics and tackle them one by one. For node, I used the excellent node beginner book. Setting my machine up and completing the tutorial in coffeescript took me all of my day's allotted time. At this point, I'm not feeling pretty good having gone through the tutorial without any trouble, but the more concerning thing is that I'm not sure how everything meshes together yet.

Day 2: Oh cool, that was neat and super easy

Since I wasn't sure how esactly I was going to use node, I decided to play around more with coffeescript and build something simple to draw out objects. After looking around a bit, I also decide to use socket.io, because it provided the necessary infrastructure to build my real-time game. At the end of the day, I have a basic grasp of socket.io and I'm feeling confident that frontend code won't be a problem with the game in the short term.

Day 3: Hack away

This day isn't really noteworthy, I spent most of the day being productive and building basic infracstructure around the game. I did however, get burned for the first time by cofeescript. None of my functions could call each other because they were all in separate files. I was under the assumption that coffeescript translated exactly into javascript, but I was unaware of the wrapper preventing anything going to global space. The temporary solution was to force it into global space, and deal with it later.

window.Swordsman = Swordsman

Day 4: Run into an annoying issue

So far, I had been building up a common library, to be used by the client and the server. However, I had a large amount of trouble getting my common files to be used both by node and by the browser. Essentially, the browser knows what files to include based on the html file, while the node server knows what it needs using the require command. The tricky thing was that I was still moving things over to global scope. The node server has no reference to what "window" is, and the browser does not know what "require" is. Added in was the fact that I was inheriting classes from different files, and it because a huge mess. Eventually, I managed to come up with this hack of a solution:

if typeof global == "undefined"
  Unit = window.Unit
else
  Unit = require('../models/unit.js').Unit

class Archer extends Unit
  # Class goes here

if typeof global == "undefined"
  window.Archer = Archer
else
  exports.Archer = Archer

I'm sure there's a better solution, but this managed to work on both server and client.

Day 5: Combine all the things

Here's where I went off on a wild tangent, and built a radically different game. After seeing the 'zerg rush' google easter egg, I decided build something similar to that game. However, I think typically the pieces start coming together. I wanted this project to go really quickly and smoothly, so I built everything to use server-side logic, and have the client essentially do nothing but draw and feed information to the server.

Day 6: Getting it almost working

At the end of this day, I had all the components pretty much done, so I tried to get up running on a server somewhere. Based on previous experience, I tried heroku first. A few hours later, it was more like herofuuuuuuuu. It turns out heroku doesn't socket.io, with a series of configurations you have to use. In addition to that, you had to use socket.io to listen to either an express app or the native node http server. I gave up for the night, but eventually I switched to ec2.

Day 7: Server setup, minor cleanup

EC2 micro instances are pretty slow. It took me over an hour to install node, npm, etc. Another thing that burned me here was trying to ssh as ec2-user instead of ubuntu, but probably my fault. While setting up I managed to clean up some of the code and fix some minor bugs. I got everything up and running, but the game still had 2 major bugs.

Day 8: Finish

The first of these bugs was that the game seemed to get progressively harder, without me implementing this functionality. I had put my server computation loop inside the socket connection code, causing it be run an additional time every time a player joined, which I did whenever I refreshed my client page with changes.

io = require('socket.io').listen(8080)
game = new Game()

# Do this
setInterval(() ->
  game.compute_state()
  io.sockets.emit('game data', game.save())
, Game.UPDATE_INTERVAL)

# not this
io.sockets.on('connection', (socket) ->
  setInterval(() ->
    game.compute_state()
    socket.emit('game data', game.save())
  , Game.UPDATE_INTERVAL)
)

The second bug caused the game to freeze permanently occasionally, but I could never reproduce. I used a workaround that reset the game if there were no players beforehand.

The fun part of the day was playtesting and tuning. I got a few friends and my brother to try it out. You can try it here. I feel the game is a little too easy with a small number of players, and too hard with a large number of players, but I haven't managed to get too people to play at the same time.

Comment Count


Minted, Data Visualization and d3

SATURDAY, DECEMBER 10, 2011

As you may or may not know, I'm an intern at Minted. One question I often get asked my people is "What do you do at Minted?" Usually, I've tried to answer this question in a variety of ways, but it usually ends up in me trying to direct them to the website, or a very vague answer like "I do a little bit of everything" or "web development." This post describes one of my side projects while working there.

A few days ago, after a meeting, we ended the discussion on the separate topic of using data, and visual presentation on the site. Getting back to my desk, I decide to play around with a few things, in the spirit of the meeting. Finally, I remember a library called d3.js, introduced to by a friend (let's call him Chris). d3 is really neat in that you can feed it in data to create a visualation, generating a bunch of DOM elements (or SVG). The elements can be dynamicly classed, and it will display differently (due to differences in styling). d3 also provides functionality to transform and animate the visualization.

The Idea

Looking at the d3 examples, the first thing that struck me was the heatmap visualization, showing unemployment rates across the US, by county. I knew we kept track of zip codes for shipping, and thus, an idea was born. Querying for zipcode and order data was pretty simple, but since I was piggybacking onto this d3 example, I wanted it in json format. Some hacking and string replacing later, I had my dataset.

First attempt

"That was simple, putting this together only took 20 minutes!" I excitedly emailed this out to the team.

The Snag

Being a Canadian, my knowledge of US geography is fairly limited. So I didn't think it would be strange at all that the highest concentration of orders would be in Alaska and Montana. Eventually I realized that something was wrong, but why? It turns out counties also have 5 digit number codes. Counties Codes and Zip Codes are both 5 digits! Counties are identified using something called a FIPS county code, and of course, with there being ~50000 zip codes and ~1400 counties, there was bound to be some overlap. The next step was to translate from zip codes to county codes, or vice versa.

Method 1

A quick google search resulted in nothing promising, but I found this site. It allowed me to enter in a FIPS code and would give a table with zip codes in that county. I decided that I would try to data mine the site. The code went something like this:

map = {}
for code in fips_list:
  url = "http://www.melissadata.com/lookups/CountyZip.asp?fips=" + code
  r = urllib2.urlopen(url)
  html = r.read()
  zips = re.finall('something complicated', html)
  map[code] = zips
print map

After a couple hundred requests, my ip was blacklisted for the day, so I needed some other plan.

Method 2

After searching for a long while, I came across this site, run by the Center for Disease Control and Prevention, a government organiztion. This page had the data I needed, but it came in at about 1.3GB (unzipped). This came in 10 text files of about 130MB each. The data was formatted like this:

11789000231000125432543NY103SUFFOLK
11789000231000225442544NY103SUFFOLK
11789000231000325452545NY103SUFFOLK

The zip code is the first 5 digits. The county fips code is the 2 digit state code, followed by the next 3 digits. Of course, there's another map going from state letter codes to FIPS codes. The code to process all of this text is below:

# Not all states shown here
state_map = {
  "AK":"02",
  "AL":"01",
  "WY":"56"
}

text = sys.stdin.read()
lines = text.split("\n")
result = {}

for line in lines:
  line = line.strip()
  if len(line) == 0:
    continue

  # Extract data from the line
  zipcode = line[0:5]
  state = line[23:25]
  code = line[25:28]

  key = ''
  # Ignore bad state codes
  if state in state_map:
    key = str(state_map[state]) + str(code)
  else:
    continue

  if key in result:
    result[key].add(zipcode)
  else:
    result[key] = set()
    result[key].add(zipcode)

# For printing in a JS friendly way
for key, value in result.iteritems():
  result[key] = list(value)
print result

The resulting map ends up being about 500kb of text, which I will try to put up somewhere, in case someone else needs it. EDIT: here

The Visualization

Adding in the county code to zip code map allowed me to finally get an accurate heat map. Accurate in that all of the data was being correctly used, but the formula for calculating the level of "heat" was not tuned. This took away a lot of the depth that the map could have shown. I was still using the default function, with a slight modification for my county map, which had this logic:

function quantize(d) {
  var orders = data[countyMap[d.id]];
  return "q" + Math.min(8, ~~(orders * 9 / 12)) + "-9";
}

The heatmap was styled so that there are 9 levels, classed from "q0-9" to "q8-9" with 0 being the lighest and 8 being the darkest. Since this was originally done with unemployment rate precentages, this was not suitable for my needs. With this basic formula, any county with over 12 orders would be shown with the highest level. Another problem was that it did not differentiate between 0 and 1 orders. Showing any county with at least 1 sale felt very important. Some of the options I tried are shown below, listed in order.

function quantize(d) {
  var orders = data[countyMap[d.id]];
  return "q" + (orders / 50) + "-9";
}

function quantize(d) {
  var orders = data[countyMap[d.id]];
  return "q" + Math.min(8, orders / 50 + 1) + "-9";
}

function quantize(d) {
  var orders = data[countyMap[d.id]];
  return "q" + Math.min(8, orders / 30 + 1) + "-9";
}

function quantize(d) {
  var orders = data[countyMap[d.id]];
  var heatLevels = [...];
  for (var i = 0; i < heatLevels.length; i++) {
    if (orders > heatLevels) {
      return "q" + i + "-9";
    }
  }
}

This was an interative process to get something that I felt displayed a sufficient amount of data. My first solution resulted in a more balanced map, but still did not differentiate between 0 and 1 order. This was solved by the 2nd solution. The third solution was more tuning. The final solution, where the data is entered into a list, allowed for the most control, allowing me to tune it as I wanted, instead of relying on a forumla.

The Result

Final Order Heatmap Visualization

Random Thoughts and Remarks

After putting this all together, it took a couple more hours to hook this up into a usuable web page on the site, instead of a random page on my local machine being force fed a bunch of data files. SVG is surprisingly powerful, having used for several web applications now. Finally, this post got kind of long and took a surprisingly long time, but hopefully this offers some insight into what I'm doing. Minted is hiring.

EDIT: Due to popular demand (of one person), I've posted a FIPS to ZIP code map here.

Comment Count


Mav vs Stanford - Week 4

MONDAY, NOVEMBER 07, 2011

Stanford is offering 3 courses online: Introduction to Databases, Introduction to Artificial Intelligence, Machine Learning. I plan on completing all the advanced tasks every week, and posting my thoughts / reflections every Sunday. This is week 2, Sunday October 23, 2011

Introduction to Databases

  • 1 unit, 4 quizzes, 0 assignments
  • I should just leave this as a permanent message: I still haven't watched any of the videos, I'm not sure how much longer this can keep up.

    Quiz 1: SQL Movie-Rating Query Exercises

    There are 4 tables, involving movies, reviewers and ratings. This quiz is similar to the quiz last week, except SQL is used this time. This took a bit of adjustment, but I am far more familiar with SQL than the relational algebra syntax, so this quiz was relatively easy.

    1 Attempt: 9/9

    Note: There was 4 quizzes, and they all seemed more or less the same, so I only did the first one.

    Machine Learning

  • 1 unit, 1 quiz, 1 assignment
  • Moving into meatier content, with non-linear regressions and smoothing parameters (regularization).

    Quiz 8: Neural Networks: Representation

    Neural networks are an interesting concept, by having a web of inputs it appears that a faster response can be generated to respond to a problem. Alternately, if there are a large number of inputs, they can be used to filter out some inputs through series of operations. This unit didn't talk too much about how to apply this to neural networks, and instead talked mostly about truth tables. The next unit is supposedly about "learning" though

    2 Attempts: 4.5/5, 5/5

    Assignment 3: Multi-class classification and neural networks

    The assignments are getting progressively harder, but my time management has failed me again. I only managed some weak attempts at most of the questions.

    Final: 20/100

    Introduction to Artificial Intelligence

    Previous week's homework: 93%

  • 2 units, ?? video questions, 1 homework assignment
  • Unit 7: Representation with Logic

    I'm blanking out, but this unit was about logic, stating which states are possible, given another state. Whether or not statements are valid or invalid. Overall, seemed relatively basic.

    Video Lecture Score: 68%

    Unit 8: Planning

    I didn't watch the videos, but the general concept is that events can be specified in advance, and they are defined within the plan.

    Video Lecture Score: 0%

    Homework 4

    More of the same, various logic problems with radio buttons and checkboxes

    What I've Learned

  • Start ML Earlier
  • The lecture material for machine learning isn't too bad, but the programming assignments seem to take over an hour. Definitely wasn't smart to start at 11 on Sunday

    Comment Count


    Mav vs Stanford - Week 3

    MONDAY, OCTOBER 31, 2011

    Stanford is offering 3 courses online: Introduction to Databases, Introduction to Artificial Intelligence, Machine Learning. I plan on completing all the advanced tasks every week, and posting my thoughts / reflections every Sunday. This is week 2, Sunday October 23, 2011

    This week was sort of a mess, as I was pretty busy at work, and I was away at Yosemite for the weekend (more on that later). Anyways, this is the week where I finally did things in advance, and still ran out of time.

    Introduction to Databases

  • 1 unit, 1 quiz, 0 assignments
  • I still haven't watched any of the videos, I'm not sure how much longer this can keep up.

    Quiz 1: Relational Algebra Exercises

    You were given 4 tables and asked to select random things like favourite pizzas. The exercise is actually good in theory, but since the syntax of the language was so painful, it was hard not to hate it. Coming with no formal DB education, and only minor MySQL usage having to type in Relational Algebra was frustrating at best. The other flaw with this quiz is that you can simply select the required values, since the query isn't tested with other test cases after you submit. I suppose anyone who puts in that much effort wouldn't bother cheating here though...

    1 Attempt: 9/9

    Machine Learning

  • 2 units, 2 quizzes, 1 assignment
  • Moving into meatier content, with non-linear regressions and smoothing parameters (regularization).

    Quiz 6: Logistic Regression

    The unit was overall very interesting, showing an alternative method to train a classifier and have more accurate decision boundaries. I'm afarid that the details of the math are starting to get lost on me, but at least the main concepts are sticking.

    2 Attempts: 4.5/5, 5/5

    Quiz 7: Regularization

    This quiz was pretty easy, but perhaps that's a function of this unit being a sort of extension of the previous unit.

    1 Attempt: 5/5

    Assignment 2: Logistic Regression

    This assignment wasn't nearly as easy as the first one, and there were no bonus assignments (oh noes!) Quirks with Octave aside, solutions can still be done in under 10 lines each, usually less. I know the professor said you will get a better idea of the material when you implement it, but personally, I don't feel that that is the case.

    This assignment would have taken much less time if it wasn't for a few things. First, I forgot about the sigmoid function completely, and wasted 5-10 minutes there. Next, when implementing it, I searched for the exponential function in documentation, and found the wrong one. I gave up and returned after an hour or so. Lastly, Octave indices start at 1, not 0.

    Final: 100/100

    Introduction to Artificial Intelligence

    Previous week's homework: 91%

  • 1 unit, ?? video questions, 1 homework assignment
  • As I mentioned earlier, this week was a mess, and I elected to do this course last. By Thursday night I didn't do any of this course, and I figured I'd just put nothing here this week. However, this course is strange and off with the other 2 courses, with a Monday deadline. I still wouldn't have finished if not for the extra 1 day extension this week.

    Unit 5: Machine Learning

    Considering I'm taking another course called Machine Learning, I was expecting this to be an easy unit. It wasn't, and the content didn't match the other course at all. More things to learn, but I didn't have enough time to get through all the videos, skipping most of the question ones and trying to get through all the content. This course seems to focus a lot on probability, whereas the actual machine learning course is about regression.

    Video Lecture Score: 62%

    Homework 3

    Not much to say here, same as usual.

    What I've Learned

  • Starting Blogging earlier
  • As a side effect of starting so early, I'm forgetting details of the my experience taking this course. This week's post felt pretty vague because of it, but hopefully I will blog as I go through the content this week.

    Comment Count


    Mav vs Stanford - Week 2

    SUNDAY, OCTOBER 23, 2011

    Stanford is offering 3 courses online: Introduction to Databases, Introduction to Artificial Intelligence, Machine Learning. I plan on completing all the advanced tasks every week, and posting my thoughts / reflections every Sunday. This is week 2, Sunday October 23, 2011

    Introduction to Databases

  • 4 units, 3 quizzes, 0 assignments
  • Of the 3 courses, this is somewhat of a black sheep. Databases doesn't seem quite as intersting, and I guess you could say the quizzes were also pretty tedious. Of course, I still haven't seen any of the videos for this course.

    Quiz 1: XML Quiz

    I honestly felt stupid taking this quiz as many times as I did. This material didn't really click until I did the next quiz.

    6 Attempts: 4/6, 4/6, 5/6, 5/6, 5/6, 6/6

    Quiz 2: DTD Exercises

    This quiz probably should have been before quiz 1, at least for me. It's actually pretty impossible to not get perfect on this quiz, since the popup validates your DTD. Pretty time consuming exercise, but easy.

    1 Attempt: 3/3

    Quiz 3: Relational Algebra Quiz

    This was pretty easy, since I know a bit of SQL. Some terms were there I didn't know, but luckily my friend (let's call him Jon) was there to fill me in on definitions.

    1 Attempt: 10/10

    Machine Learning

  • 2 units, 2 quizzes, 1 assignment
  • There was less material than last week, partially because 1 unit was dedicated to Octave, which I think is a free, open-source MATLAB. The content was similar to the previous week, with just a little bit of an extension, going from regression with 1 variable to multiple variables.

    Quiz 4: Linear Regression with Multiple Variables

    I tried the quiz part way through the videos. Most of the questions seemed pretty easy, and were covered on the videos, but there was 1 question I had no idea about (when to use gradient descent vs normal equations). The 2nd quiz, I learned my lesson, but I didn't read anything, so I got another question wrong.

    3 Attempts: 4/5, 4/5, 5/5

    Quiz 5: Octave Tutorial

    I have some familiarity with MATLAB, having to use it for school for courses here and there, so this quiz was pretty easy. I realized after you can just install Octave and run all the commands from the quiz.

    2 Attempts: 4.75/5, 5/5

    Assignment 1: Linear Regression

    Disclaimer: I'm on OS X, perhaps the experience on Windows and Linux isn't as bad. Installing Octave was smooth, but I feel like using it was a pain. First of all, when you run the program, it just opens another terminal window, but you can't run Octave from command line (easily). Put this in .bashrc or .profile or whatever:

    alias octave='exec /Applications/Octave.app/Contents/Resources/bin/octave'
    

    My second gripe with Ocatve is that the command line editor isn't very good, so I just used the Octave terminal to run the scripts, and did all of my editting in vim. I recommend you to do the same.

    As for the assignment itself, there are 7 parts, with only the first 3 being mandatory. It seems overwhelming at first, but the first is a warm up exercise with the answer given to you. The submittal process is pretty smooth and results are known right away. Each part of the assigment is anywhere from 1-5 lines (at least for my answers). If coded properly, the main assignment answers can be re-used for the bonus questions as well. The whole assignment took maybe an hour.

    Final: 150/100

    Introduction to Artificial Intelligence

    Previous week's homework: 92%

  • 1 unit, ?? video questions, 1 homework assignment
  • I didn't really go through the videos last week, but this course seems to be more intense than the other 2 courses. Having 37 parts for the unit is pretty intimidating, until you realize most of them are for questions.

    Unit 3: Probability in AI

    Probability is fun, and I enjoyed learning a bit more about it, along with some other things, such as Bayes Networks. The video questions got pretty tough after a while, and I also feel like there's some sort of bug with the score calculation. It marked me as incorrect once when it was clearly correct, and rewatching the video seems to count as a wrong answer too. More experimentation must be done. Speaking of rewatching videos, sometimes a question pops up at the end of the video lecture, and there's no watch to rewatch the video with refreshing the page.

    Video Lecture Score: 75%

    Homework 2

    This takes on the same format as last week, as well as the lectures. Short video explaining the question, then a textbox or radio button to choose answers. The homework felt like it was easier than the video lecture questions, but I will see my mark before I say too much.

    What I've Learned

  • Basic background knowledge
  • These courses require a lot of time
  • I guess I shouldn't be surprised, that taking 3 courses takes a large amount of time. Doing all 3 courses took most of my Saturday, so I'm probably going to start earlier, and spend my weekend doing something else. On the plus side, I feel like I'm starting the learn the foundation for these courses, which I suppose is the end goal.

    Comment Count


    more

    Maverick Lee


    Home/Blog
    Archive
    Projects
    Resume
    About

    Follow me