Die Semicolon Die!
Wednesday, May 25th, 2011
I’ve always put javascript’s automatic semicolon insertion (ASI for short) under the bad parts of the language. That is based on Douglas Crockford’s explanation of how the feature is tricky and easily leads to mistakes, with the canonical example being:
// good, returns the object
return { ... }
// wrong! returns undefined
return
{
...
}
Fair enough. Lately i’ve been doing more and more ruby. Ruby is a language universally praised for its elegant, easy to read syntax. One of the strong points of the syntax is its terseness, that is, you can omit a lot of punctuation. Semicolons as well. Wait a moment…
def test
return
{
...
}
end
test # returns nil !!
Same thing! Having the meaning of a program change due to an end-of-line is not a good thing in ruby as well, but it’s widely accepted because of the benefits. This must be true for javascript as well, so first point:
“Removing semicolons and other punctuation clutter is not just a liability. It actually makes your code look better.”
So both the languages have to decide when a statement implicitly terminates. But is ruby implementation really the same as javascript? It turns out it’s not, ruby takes a quite safer approach. A statement in ruby is finished on an end-of-line if it’s syntactically valid by itself, it spans multiple lines if it’s not:
# this works, the trailing dot means the statement is not finished
object.
method1.
method2.
method3
# syntax error, first line is a valid statement by itself, second line calls method1 on nothing
object
.method1
.method2
.method3
It’s safe because how a line is parsed depends on the line itself, not by other lines that could be written “by others”. The bad part is how it makes method chaining on multiple lines look ugly. This is why ruby 1.9 introduced the exception “the statement continues if the first character of next line is a dot”.
Javascript takes a step further to solve this bad part. A controversial step. A statement is finished on an end-of-line if the first character of the next line cannot be correctly parsed as if it was part of the line. Otherwise, the statement goes on. This removes the clutter and gives nice chaining:
// just works
object
.method1()
.method2()
.method3()
Unfortunately, you now have a nasty problem. 2 lines which are supposed to be 2 different statements, but with the first character of the second line being a valid continuation of the first, will be treated as one statement with unpredictable results. This practically happens only when a line starts with either ( [ + - /
// function call instead of grouping var a = b + c (d + e).print() // is really var a = b + c(d + e).print() // array index instead of array literal var a = ["a", "b", "c"] [0, 1].forEach( … ) // is really var a = ["a", "b", "c"][0, 1].forEach( … ) // binary math operator instead of unary var a = b + c -1 == string.indexOf(query) || die() // is really var a = b + c – 1 == string.indexOf(query) || die() // division instead of regular expression var i=0 /[a-z]/g.exec(s) // is really var i=0 /[a-z]/g.exec(s)
Well, this sucks, so what should you do? I could say that i remember being caught by this problem just once in many years of javascript. The return problem or starting a line the nasty way is something extremely rare. But even if you don't want to afford the risk, why avoid ASI without even knowing about it? Without even thinking about a reasonable fix, given the nicer syntax? And this leads me to the second point:
"To write semicolon-free code and avoid getting bitten, you just need to remember 2 rules
1) Don't put an end-of-line between return, break, continue, throw, postfix ++, postfix -- and their operand
2) Avoid starting a line with ( [ + - / but if you have to, prepend it with a semicolon"
// everything's fine
return { ... }
continue label
break label
throw error
counter++
counter--
var a = b + c
;(d + e).print()
var a = ["a", "b", "c"]
;[0, 1].forEach( ... )
var a = b + c
;-1 == string.indexOf(query) || die()
var i=0
;/[a-z]/g.exec(s)
Is it that taxing to remember? Automatic semicolon insertion is of course controversial, but using it is not a complete failure. It's a matter of taste, a trade-off between cleaner nicer code and some tough albeit avoidable pitfall.
While i'm at it, let's debunk some well known myths that always show up
- "I could know ASI but others don't and they will mess things out"
Well this may be true. It depends on where you work, the skill of your peers, etc.. To me, a javascript programmer is just supposed to know this stuff as he knows of prototype and first class functions. If they don't, supposing they got the opposable thumbs, as they can be told to put semicolons everywhere, they can be told to remember the above 2 simple rules. - "It's not gonna work the same way on every browser"
It's in the specs since more than a decade. I think browser bugs are a thing of the past and even proponents of this theory look unable to find something newer than 5 years ago, so. - "It breaks the tools. You cannot minify code anymore, etc..."
Let's be clear about this. It's officially part of the language. A tool unable to cope with ASI is a broken tool, period. Anyway, i have never had a problem with google closure compiler. - "Jslint doesn't work with it"
Jslint enforces the vision of Douglas and it's pretty strict about it. This is fair, yet for those having another vision nothing is wrong with using Jshint which has an option to accept ASI.
Let's close with two very nice articles that explain the details and of course you can always read the ecmascript specs:
The most well-written comprehensive article