Semantically, legal codes are smooth, shapeless balls of text. They’re programmatically inaccessible, useless to software — and most people. There’s simply nothing on which to get a purchase. As qualitative data, they’re inaccessible to quantitative analysis.


This is the problem that the State Decoded project seeks to solve.

The State Decoded’s job is to turn legal codes inside out, bringing their substructures to the surface to make them understood more easily. By reducing laws to their smallest possible units, indexing them via every possible metric, and exposing all of those internal structures, it’s possible to give people and software alike something to get a hold on.

Place names, people’s names, organization names, bill numbers, dates, glossary terms, cross references, and statistically unlikely phrases are all lurking just below the surface, waiting to be gathered and cataloged. Innumerable external sources of data can be used to infer more about laws, including citations in court opinions, citations in scholarly publications, citations in legislation, citations in blog entries, website traffic patterns, legislative tags, legislator voting histories, lobbying records, campaign finance data, and a great deal more.

None of this has much to do with making state laws prettier. But it’s the part where the State Decoded starts to get fun.


The project’s motto might be “state codes, for humans,” but it would be more honest to call it “state codes, for robots.” It’s the API that’s going to make this project valuable, because it’s the API where all of this fascinating data will be shared in its entirety (and also in bulk downloads because — let’s face it — sometimes APIs are more trouble than they’re worth).

What will people do with this? I have no idea. That’s the beauty of it. There are people much smarter than I who will grasp the fascinating applications and analyses that can be created with these data. Perhaps they’ll find that legislators in different political parties tend to pass bills that affect distinctly different titles of the code. Or that the SMOG ranking of amendments to the code have gradually been increasing.

Maybe that legislation amending a law tends to follow a spike in scholarly citations of that law. Who knows? By providing all of these data points in one place, it will be possible for people to crunch the numbers themselves and find out what secrets lie within them.

The API for Virginia is in alpha testing now. If you’re interested in putting it to work, send an e-mail saying so to join the alpha test. This is where things get good.

Image courtesy of Flickr user jimmywayne.