jBoxer

I change the directions of small pieces of metal for a living.

How to reset/revert a single file with Git

| Comments

I’m making this post more for my own reference than to help anyone else, but feel free to comment if you have questions or anything.

If you’ve made a commit with git (let’s call it “commit A”), and you want to make another commit (let’s call it “commit B”) that, among other things, reverts the changes made to a single file (let’s call it file1.rb) in commit A, use the following command:

git reset commit-a -- path/to/file1.rb

That’ll create two sets of changes: a copy of the changes you made in commit A, and the inverse of the changes you made in commit A. The inverse is staged, while the copy is unstaged. Nine times out of ten, you’ll want to commit the staged changes (which, as they’re the inverse of the changes in commit A, will result in a revert of file1) and discard the unstaged ones.

My Doorbell Is Better Than Your “Please Rob Me”

| Comments

“Hey man, I know you really like Foursquare, but you should check out this new site called Please Rob Me. Foursquare is actually really dangerous, cuz now it’s easy for robbers to know when you’re not at your house!”

I’m a Foursquare user, and I’ve heard variations of this from over a dozen of my friends in the past 24 hours. For the record, I think “Please Rob Me” is a really funny and creative idea, and I love that they made it. However, anyone who is legitimately giving privacy/security advice over this is being ridiculous.

First of all, most people are out of the house from 9am to 5pm. Robbers know this, and Foursquare checkins don’t change it.

Second, there’s absolutely zero indication of whether or not anyone else is home, or how soon you’ll be getting back. These are much more crucial pieces of knowledge than “a single person was not in his house at this time”.

Last, and most important, a superior technology has existed for decades: the doorbell. Run around a neighborhood ringing doorbells of houses that look like they might be vacated, and break into the ones that no one answers.

“Oh but people don’t answer their doorbell every time, so a robber might break into an occupied house!”

Right. Just like people don’t have to actually be at a place to check into it on Foursquare, and like how checking in on Foursquare doesn’t mean no one else is home. There are potential false positives with both methods (and if there weren’t, people would be getting robbed a lot more). The point is, breaking in based on doorbell-ringing is much less dangerous than doing it based on Foursquare checkins, as it’s reasonably likely that no one is home if no one answers the doorbell

In reality, privacy via anonymity has always been a pretty shaky concept. Yes, checking into Foursquare does give robbers one more tool to make their job easier, but in the face of much better tools (like the doorbell), it’s negligible. If you chose to be a Foursquare user in the first place, Please Rob Me should have no impact on your decision.

The importance of attr_accessible in Ruby on Rails

| Comments

I’m sure this has been written about ad nauseum, but I spent some time yesterday explaining it to someone who didn’t understand, and now I feel like writing it up a bit more formally.

What is attr_accessible?

In Ruby on Rails, attr_accessible allows you to specify which attributes of a model can be altered via mass-assignment (most notably by update_attributes(attrs) and new(attrs)). Any attribute names you pass as parameters will be alterable via mass-assignment, and all others won’t be.

How does mass-assignment work normally?

By default, mass-assignment methods accept a hash of attribute values, each keyed by their associated attribute’s name. If I ran the following code:

1
User.new({ :name => 'Harry Potter', :email => 'hp@hogwarts.com' })

A new instance of the User model would be created, and the name and email attributes would be set accordingly. It can also be used to alter related models. For example:

1
2
3
4
User.new({
  :name => 'Albus Dumbledore',
  :is_teacher => true,
  :course_ids => [1, 2, 3] })

In addition to creating a user with the appropriate attributes, this will update the specified courses to be owned by this user(assuming a user has_many courses in our app).

How can this be abused?

Very easily. What if someone did this:

1
User.new({ :name => 'Draco Malfoy', :is_teacher => true })

This Draco Malfoy fellow may not actually be a teacher, but the system is none the wiser. Of course, the developer would never code this; in a real Rails app, the code is going to look like this:

1
User.new(params[:user])

The elements in params[:user] are taken from the POST/GET/PUT data passed along when the action was run. They’re thrown blindly into the mass-assignment, and any attributes whose names match the keys will be set.

“So what’s the big deal? Just don’t include an ‘is_teacher’ field in the web form, and the param won’t be there.” This is true for innocent users, but the malicious ones (and Draco Malfoy is definitely a malicious one) have an easy way around this. A web form is just a way to make it easy for users to pass data to your app. There are other ways. For example, if I wanted to register for the app via the command line instead of a browser, I could do it like this:

1
2
curl -d "user[name]=Harry Potter&user[email]=hp@hp.com" \
    http://myapp.com/users/

This sends a request to http://myapp.com/users/ and passes data in the exact format it would’ve appeared if I’d filled out a web form that asked for a name and email address. However, I could also do this:

1
2
3
curl -d \
    "user[name]=Draco Malfoy&user[email]=m@hp.com&user[is_teacher]=1" \
    http://myapp.com/users/

Since is_teacher is an attribute name in my User model, and mass-assignment methods blindly accept whatever attributes they see, Draco Malfoy has just set himself a teacher.

Even worse, I could use this to grab courses that may not be mine.

1
2
3
curl -d \
    "user[name]=Draco Malfoy&user[course_ids]=1&user[course_ids]=2" \
    http://myapp.com/users/

Draco Malfoy has now taken courses 1 and 2 away from whoever they originally belonged to (Dumbledore, if my memory serves me) and given them to himself.

How can we prevent this?

There are a few obvious but clumsy ways. We could skip mass assignment, setting each individual attribute in our controller, but this will introduce a lot of duplicate and unnecessary code. We could explicitly pull unwanted parameters out:

1
2
params.delete(:is_teacher)
params.delete(:course_ids)

This also introduces a lot of duplicate code. If we ever add new columns that we want to restrict, or decide we want to unrestrict a column, we’re going to have to go through the create and update actions, and any others that perform mass assignment.

We could factor these out into some sort of sanitize_params method on each model. This is a better solution, but you still have to call it in every action that alters the data. It’s definitely not as good as the built-in one: attr_accessible. We can add this to the top of the User model:

1
attr_accessible :name, :email

This white-lists name and email; these two attributes will be accepted from a mass-assignment method, while all others will be ignored. This is by far the safest way to do it; only attributes you’ve explicitly allowed (which hopefully means you’ve thought carefully about them) can be set by mass-assignment. This way, if some intern comes along and adds a bunch of dangerous columns or relations (payment_accepted or horcruxes, for example), no one has to think about updating the sanitize methods.

What does this not do?

I saw one person say “Why would I put anything in attr_accessible? Why would I want any of my attributes to be hackable?”

Make no mistake: attr_accessible is no substitution for proper access control. If all users have write access to all other users, attr_accessible will let one user change another’s name attribute if it’s specified. Regular authentication and access control must be used to prevent users from writing to model instances that they shouldn’t be able to write to. Once this is done correctly, attr_accessible can be used to prevent a malicious user from altering data of her own that she shouldn’t be able to alter.

To be more clear, it could be considered “hacking” if a user were able to change everyone’s name to “Voldemort”. attr_accessible can’t prevent this; you need to do proper authentication with something like Authlogic. Once you’ve set your controllers up to prevent a user from even attempting to change another user’s data, you’ve prevented this “hack”.

If the user tries to change his own name to “Voldemort”, that’s totally fine. We don’t care if he does it via the web app, curl, or anything else; users are allowed to change their own name. Including :name in attr_accessible isn’t making it “hackable”, because it’s an attribute that users should be able to change.

If the user tries to change his is_teacher attribute from false to true, that’s also considered “hacking”. We don’t want to let users do this, so we exclude :is_teacher from attr_accessible to prevent it.

Are attributes excluded from attr_accessible immutable?

No. They can still be altered, just not via mass-assignment. If I exclude is_teacher from attr_accessible, and I go:

1
2
3
hagrid = User.first(:conditions => { :name => 'Rubeus Hagrid' })
hagrid.is_teacher = true
hagrid.save

That will work just fine. The difference is, it forces you to set the attribute explicitly, so there’s no potential of accidentally setting an attribute unexpectedly passed to mass-assignment. This way, I can allow my non-dangerous attributes to be set via mass-assignment with attr_accessible, then explicitly provide or deny control over dangerous attributes in other actions.

Prototype analog to jQuery’s $(document).ready

| Comments

I have a lot of experience with jQuery, but less with Prototype. Recently, I needed to add some event handlers to some elements in a Ruby on Rails app I’m building. I searched for how to do the equivalent of jQuery’s $(document).ready() function in Prototype so that I could add the handlers after the document loaded, but most of the guides I found were out of date (I’m running Prototype 1.6.0.3, and I don’t know which version these guides were for, but they all made my Javascript console angry).

Eventually, I was able to piece it together after digging through the depths of the Prototype API documentation. It’s actually very simple:

1
2
3
document.observe('dom:loaded', function(){
  // do yo thang...
});

Wrap whatever you’re doing with that, and it won’t be run until the document is loaded.

Installing Ruby on Rails 2.3+ plugins From Github

| Comments

I’ve been banging my head against this wall for quite awhile now, and I just finally figured out the answer. Like I’ve done in other posts, I’ll just post what worked for me, and hopefully it’ll help other people.

I’m running Ruby 1.9 and Ruby on Rails 2.3.3 on Snow Leopard. I’ve been trying to install plugins (specifically, Authlogic and Carmen) for a couple days now using the following two commands (as taken from the main github pages):

1
2
script/plugin install git://github.com/binarylogic/authlogic.git
script/plugin install git://github.com/jim/carmen.git

In return, I received the following errors:

1
2
Plugin not found: ["git://github.com/binarylogic/authlogic.git"]
Plugin not found: ["git://github.com/jim/carmen.git"]

After a lot of poking around, it turns out you need to make two changes in order for this to work on Rails 2.3 or higher: change the git:// at the beginning of each URL to http://, and add a trailing slash to the end of each URL. So instead, run these:

1
2
script/plugin install http://github.com/binarylogic/authlogic.git/
script/plugin install http://github.com/jim/carmen.git/

They both worked perfectly for me, so hopefully they’ll work for you. If not, leave a comment and I’ll try to help.

Installing Erlang on Snow Leopard

| Comments

Here’s another in my series of “Installing X on Snow Leopard”. These aren’t official, well-tested guides; they’re just documentations of my attempts to compile and install various things on my personal computer. My last one (Installing MySQL on Snow Leopard) is my most popular post to date (aside from a couple that have been on Reddit). Erlang is less popular than MySQL, but hopefully this will still help a few people.

Downloading and unpacking

Go to http://erlang.org/download.html and download the Source for the newest version (when I was writing this, that was R13B03. After downloading, extract it to somewhere that’s convenient to get to with the Terminal.

Configure

Open the Terminal and cd into the directory you extracted Erlang to (mine was /Users/jake/src/otp_src_R13B03. Then run the following command:

1
2
3
4
5
./configure \
    --prefix=/usr/local/ \
    --enable-smp-support \
    --enable-threads \
    --enable-darwin-64bit

Note: You will probably get three errors. Read about them in the Configuration Errors section coming up.

The first three configure options are the defaults according to the README. However, I’ve had experiences where supposed defaults aren’t really the defaults when compiled on OS X, so I don’t like to take chances. --enable-darwin-64bit enables experimental support for the 64bit x86 Darwin binaries. This may not be necessary, but in general, 64-bit stuff has fewer problems on Snow Leopard, so I figured this was a good idea.

Configuration Errors

I got the following configuration errors:

1
2
3
4
jinterface    : No Java compiler found
wx            : Can not combine 64bits erlang with wxWidgets on
                MacOSX, wx will not be useable
documentation : fop is missing. The documentation can not be built.

These aren’t a problem. If you get any errors besides these, you’re in trouble. Leave a comment, and I’ll see if I can help.

Making and installing

These two commands shouldn’t give you any trouble:

make

And then, after make is done:

sudo make install

If you get any errors at either of these stages, leave a comment and I’ll try to help.

Making sure it works

Note: This canonical test is gratefully borrowed from erlang.org.

Put the following into a text file:

1
2
3
4
5
-module(test).
-export([fac/1]).

fac(0) -> 1;
fac(N) -> N * fac(N-1).

Save it as test.erl in a directory that’s easy to get to with the Terminal. Then, from the Terminal, cd into that directory and type erl (which, if everything worked right, should start the Erlang command-line interpreter). From the interpreter, run the following commands:

1
2
3
4
5
6
1> c(test).
{ok,test}
2> test:fac(20).
2432902008176640000
3> test:fac(40).
815915283247897734345611269596115894272000000000

Note: Lines starting with N> (where N is a number) are lines you should type (but just type the stuff coming after N>). The other lines represent output.

c(test). compiles test.erl (assuming it’s in the directory you cd‘ed into). test:fac(20). and test:fac(40). runs your factorial function.

So, that’s what worked for me. If anyone has any problems along the way, leave a comment and I’ll try to help.

Snide remarks against women in technology

| Comments

It’s this culture of attacking women that has especially got to stop. Whenever I post a video of a female technologist there invariably are snide remarks about body parts and other things that simply wouldn’t happen if the interviewee were a man.

Blogger Robert Scoble, responding to threats against tech author Kathy Sierra

The Knuth-Morris-Pratt Algorithm in my own words

| Comments

For the past few days, I’ve been reading various explanations of the Knuth-Morris-Pratt string searching algorithms. For some reason, none of the explanations were doing it for me. I kept banging my head against a brick wall once I started reading “the prefix of the suffix of the prefix of the…”.

Finally, after reading the same paragraph of CLRS over and over for about 30 minutes, I decided to sit down, do a bunch of examples, and diagram them out. I now understand the algorithm, and can explain it. For those who think like me, here it is in my own words. As a side note, I’m not going to explain why it’s more efficient than na”ive string matching; that’s explained perfectly well in a multitude of places. I’m going to explain exactly how it works, as my brain understands it.

The Partial Match Table

The key to KMP, of course, is the partial match table. The main obstacle between me and understanding KMP was the fact that I didn’t quite fully grasp what the values in the partial match table really meant. I will now try to explain them in the simplest words possible.

Here’s the partial match table for the pattern “abababca”:

1
2
3
char:  | a | b | a | b | a | b | c | a |
index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 
value: | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |

If I have an eight-character pattern (let’s say “abababca” for the duration of this example), my partial match table will have eight cells. If I’m looking at the eighth and last cell in the table, I’m interested in the entire pattern (“abababca”). If I’m looking at the seventh cell in the table, I’m only interested in the first seven characters in the pattern (“abababc”); the eighth one (“a”) is irrelevant, and can go fall off a building or something. If I’m looking at the sixth cell of the in the table… you get the idea. Notice that I haven’t talked about what each cell means yet, but just what it’s referring to.

Now, in order to talk about the meaning, we need to know about proper prefixes and proper suffixes.

Proper prefix: All the characters in a string, with one or more cut off the end. “S”, “Sn”, “Sna”, and “Snap” are all the proper prefixes of “Snape”.

Proper suffix: All the characters in a string, with one or more cut off the beginning. “agrid”, “grid”, “rid”, “id”, and “d” are all proper suffixes of “Hagrid”.

With this in mind, I can now give the one-sentence meaning of the values in the partial match table:

The length of the longest proper prefix in the (sub)pattern that matches a proper suffix in the same (sub)pattern.

Let’s examine what I mean by that. Say we’re looking in the third cell. As you’ll remember from above, this means we’re only interested in the first three characters (“aba”). In “aba”, there are two proper prefixes (“a” and “ab”) and two proper suffixes (“a” and “ba”). The proper prefix “ab” does not match either of the two proper suffixes. However, the proper prefix “a” matches the proper suffix “a”. Thus, the length of the longest proper prefix that matches a proper suffix, in this case, is 1.

Let’s try it for cell four. Here, we’re interested in the first four characters (“abab”). We have three proper prefixes (“a”, “ab”, and “aba”) and three proper suffixes (“b”, “ab”, and “bab”). This time, “ab” is in both, and is two characters long, so cell four gets value 2.

Just because it’s an interesting example, let’s also try it for cell five, which concerns “ababa”. We have four proper prefixes (“a”, “ab”, “aba”, and “abab”) and four proper suffixes (“a”, “ba”, “aba”, and “baba”). Now, we have two matches: “a” and “aba” are both proper prefixes and proper suffixes. Since “aba” is longer than “a”, it wins, and cell five gets value 3.

Let’s skip ahead to cell seven (the second-to-last cell), which is concerned with the pattern “abababc”. Even without enumerating all the proper prefixes and suffixes, it should be obvious that there aren’t going to be any matches; all the suffixes will end with the letter “c”, and none of the prefixes will. Since there are no matches, cell seven gets 0.

Finally, let’s look at cell eight, which is concerned with the entire pattern (“abababca”). Since they both start and end with “a”, we know the value will be at least 1. However, that’s where it ends; at lengths two and up, all the suffixes contain a c, while only the last prefix (“abababc”) does. This seven-character prefix does not match the seven-character suffix (“bababca”), so cell eight gets 1.

How to use the Partial Match Table

We can use the values in the partial match table to skip ahead (rather than redoing unnecessary old comparisons) when we find partial matches. The formula works like this:

If a partial match of length partial_match_length is found and table[partial_match_length] > 1, we may skip ahead partial_match_length - table[partial_match_length - 1] characters.

Let’s say we’re matching the pattern “abababca” against the text “bacbababaabcbab”. Here’s our partial match table again for easy reference:

1
2
3
char:  | a | b | a | b | a | b | c | a |
index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 
value: | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |

The first time we get a partial match is here:

1
2
3
bacbababaabcbab
 |
 abababca

This is a partial_match_length of 1. The value at table[partial_match_length - 1] (or table[0]) is 0, so we don’t get to skip ahead any. The next partial match we get is here:

1
2
3
bacbababaabcbab
    |||||
    abababca

This is a partial_match_length of 5. The value at table[partial_match_length - 1] (or table[4]) is 3. That means we get to skip ahead partial_match_length - table[partial_match_length - 1] (or 5 - table[4] or 5 - 3 or 2) characters:

1
2
3
4
5
// x denotes a skip

bacbababaabcbab
    xx|||
      abababca

This is a partial_match_length of 3. The value at table[partial_match_length - 1] (or table[2]) is 1. That means we get to skip ahead partial_match_length - table[partial_match_length - 1] (or 3 - table[2] or 3 - 1 or 2) characters:

1
2
3
4
5
// x denotes a skip

bacbababaabcbab
      xx|
        abababca

At this point, our pattern is longer than the remaining characters in the text, so we know there’s no match.

Conclusion

So there you have it. Like I promised before, it’s no exhaustive explanation or formal proof of KMP; it’s a walk through my brain, with the parts I found confusing spelled out in extreme detail. If you have any questions or notice something I messed up, please leave a comment; maybe we’ll all learn something.

A method with no unit tests is a broken method

| Comments

If you write software, you need to write unit tests. If you’ve written a method/function, and you haven’t written a unit test for it, it’s safe to assume that it’s broken (even if it compiles and your other tests pass).

I’m not necessarily advocating full-fledged test-driven development. I’m just saying, if you release code into “the wild,” and there are methods you haven’t unit tested, your customers will almost certainly run into multiple bugs in each one of them.

That’s an atomic point. Separate from that, I’d like to mention that this isn’t always a bad thing. For a startup that wants to iterate as quickly as possible (and is writing non-life-critical software), writing the code with no unit tests, releasing it, and reproducing each customer-discovered bug with a unit test before fixing it is a totally reasonable model. These startups just shouldn’t operate under the illusion that their software “works”. In the hours after they make one of these releases, they should feel blessed if a single customer is able to register or log in.

Convert an int array to a comma-separated string with C

| Comments

I’m much less experienced in C than I am in higher-level languages. At Cisco, we use C, and I sometimes run into something that would be easy to do in Java or Python, but very difficult to do in C. Now is one of those times.

I have a dynamically-allocated array of unsigned integers which I need to convert to a comma-separated string for logging. While the integers are not likely to be very large, they could conceptually be anywhere from 0 to 4,294,967,295. In Python, that’s one short line.

1
my_str = ','.join([str(num) for num in my_list])

How elegantly can people do this in C? I came up with a way, but it’s gross. If anyone knows a nice way to do it, please enlighten me.