The State Decoded, Now Solr-Powered

    by Waldo Jaquith
    August 22, 2012

    The State Decoded project is putting U.S. state laws online, making them easy to search, understand and navigate. Our laws are organized badly, but The State Decoded is reorganizing them automatically, connecting people with the legal information they need with the ease of a Google search.

    In implementing many of the features necessary to provide this experience, it would be easy to try to reinvent the wheel. While it’s novel to provide this sort of functionality on a legal website, the functionality itself is hardly new. Recommendations of similar laws are really no different than Amazon’s ability to recommend similar products. Making legal text easier to understand is really just an application of natural language processing. And a simple, elegant search interface is nothing Google hasn’t figured out.

    At its core, all of this is about the same thing: analyzing a series of texts and determining how they relate to one another. That’s a solved design pattern.



    Many of these design patterns already exist in one piece of software: Solr. The Solr document indexing software can be thought of as a search engine, meant to be installed on a single website, although it’s really much more advanced than that. It’s the unchallenged champion of search engine software, its power and flexibility unrivaled.

    Solr is a natural for The State Decoded. Solr provides some features that would otherwise need to be built from scratch and provides a framework that will make some exciting analysis and collaboration possible.


    The use of Solr has been tested out on Virginia Decoded, which is one of the state-level implementations of the State Decoded software. That work was donated by Open Source Connections, a Solr consultancy shop with an interest in good governance. Three interns there — David Dodge, Joseph Featherston, and Kasey McKenna — spent a chunk of their summer analyzing legal code structures and figuring out how to best index and represent them within Solr.

    where solr shines

    One feature that Solr provides out of the box is a concept of document relatedness. When presented with a single law, it can provide a listing of other laws that are linguistically similar to that law. This is an important first step in reorganizing laws to reflect topic-based logical groupings. For instance, Virginia’s § 18.2-30 (“Murder and manslaughter declared felonies”) displays five other laws that Solr has judged to be related to it, including § 19.2-8.1 (“Prosecution for murder or manslaughter; passage of time not a limitation”). Somebody interested in the law prohibiting murder is likely to be interested in the law that says it doesn’t matter how long it takes somebody to die, it’s still murder. Normally there is nothing that would connect those two sections, which are in entirely different titles of the Code of Virginia. Thanks to Solr, those connections are now obvious.

    Another feature that Solr provides is the ability to respond to remote search queries. That is, every state site that uses Solr can allow their laws to be searched by other sites, which would make it possible to search multiple states’ laws in one fell swoop, or for one state site to highlight the existence of similar laws in other states.

    More than anything else, Solr provides a framework to enable innovative analysis of legal codes. For example, Apache Mahout — a machine learning library — can be plugged into this Solr setup, which could automatically and completely reorganize an entire legal code into topical clusters, find un-tagged laws and apply topical descriptions to them to make them easier to identify, or analyze the laws that a site visitor has looked at and recommend others that might be of interest to them.

    Although requiring the use of Solr for The State Decoded does make the software somewhat more complicated to implement, the benefits are too large to be ignored. Version 0.6 of The State Decoded, due out November 1, will be the first release of the software that is Solr-based.

    Waldo Jaquith has been a website developer for 18 years and an open-government technology activist for 16 years. He holds a degree in political science from Virginia Tech and was a 2005 fellow at the Sorensen Institute for Political Leadership. He and his wife live on a small farm near Charlottesville, Va., where he works for the Miller Center at the University of Virginia.

    Tagged: laws legal codes search solr the state decoded virginia decoded

    Comments are closed.

  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »
    MediaShift Newsletters »

    Follow us on Social Media