The State Decoded Debuts Version 0.4

    by Waldo Jaquith
    September 18, 2012

    Version 0.4 of The State Decoded was just tagged on GitHub and bundled up for download, the result of six weeks of work. The State Decoded is a platform that displays state codes, court decisions, and information from legislative tracking services to make it all more understandable to normal humans.


    This release is dedicated (almost) exclusively to enhancements to the dictionary system. Eighteen issues comprise the changes in this release, sixteen of which pertain to the built-in automatic, custom dictionary system, which finds defined terms within legal codes and stores them in a dictionary, using that data to embed contextual definitions that are relevant to each law.


    what’s new

    There are a few big changes:

    • The State Decoded comes with a built-in dictionary of general legal terms. Using several different non-copyrighted, government-created legal dictionaries, a collection of nearly 500 terms have been put together, which will help people to understand common legal terms that are rarely defined within legal codes, such as “mutatis mutandis,” “tort,” “pro tem,” and “cause of action.”
    • Dictionary terms are now identified more aggressively, which means that for many states, the size and scope of the custom dictionary is going to expand substantially. In the case of Virginia there was a 49% increase (a leap from 7,681 to 11,504 definitions), a striking difference that could be observed immediately when browsing the site.
    • The problem of nested/overlapping definitions has been solved. When one definition was nested within another (e.g., if we have definitions for both “robbery” and “armed robbery”), then mousing over “robbery” would yield a pair of pop-up definitions, one obscuring the other. Now only the definition for the longest term is defined under those circumstances.
    • Internal terminology has been standardized. In various places the dictionary and its components were all called different things (glossary, definitions, dictionary, terms, etc.) in different places. Now the collection of words is called a “dictionary,” each defined word is a “term,” and the description of what that term means is a “definition.”)
    • The retrieval and display of definitions is substantially faster — they take about half the time that they used to. This is a result of optimizing and simplifying the structure of the database table in which definitions are stored.

    A list of all closed issues is available for those who want specifics. And for those who are suckers for details, this is the first release for which a detailed Git commit log is available, with relatively detailed comments for all 68 commits that comprise this release.

    This release is two weeks late, almost entirely because of time spent on a pernicious and difficult parsing bug that, it only occurred to me today, shouldn’t have blocked this release because, while an important problem, it has absolutely nothing to do with definitions. (The problem that is being wrestled with is how to handle subsections of laws that span paragraphs. Easy to describe, difficult to solve, at least for those state codes that pretend that a paragraph and a section are one and the same. I’m looking at you, Virginia.) That issue has been moved back to v0.5, and I’ll go right back to wrestling with it on Monday.


    more versions to come

    Next up, version 0.5 will be another general-enhancements release. Version 0.6 will be the Solr release — the version in which the popular search software becomes integrated deeply into the project. Version 0.7 will be the API release, where the nascent API gets built out to full functionality and documented properly.

    Version 0.8 will be the user interface release, in which the design will be overhauled, a responsive design will be implemented, serious work will go into the typography, an intercode navigation system will be implemented, contextual help and explanations will be embedded throughout, and the results of some light UI testing will be incorporated. Version 0.9 will be dedicated to optimizations — making everything go faster and be more fault-tolerant, both through improving the code base and supporting the APC and Varnish caching systems.

    And, finally, version 1.0 will be the first release in which State Decoded becomes a platform that facilitates the sort of analysis and data exchange that makes this project so full of possibility — things like flexible content export, visualizations, user portfolios of interesting laws, and surely lots of other things.

    Waldo Jaquith has been a website developer for 18 years and an open-government technology activist for 16 years. He holds a degree in political science from Virginia Tech and was a 2005 fellow at the Sorensen Institute for Political Leadership. He and his wife live on a small farm near Charlottesville, Va., where he works for the Miller Center at the University of Virginia. He was received a “Champion of Change” award at the White House earlier this month.

    Image courtesy of Flickr user jimmywayne.

    This post originally appeared on The State Decoded blog.

    Tagged: dictionary github legal codes release the state decoded version 0.4

    Comments are closed.

  • Who We Are

    MediaShift is the premier destination for insight and analysis at the intersection of media and technology. The MediaShift network includes MediaShift, EducationShift, MetricShift and Idea Lab, as well as workshops and weekend hackathons, email newsletters, a weekly podcast and a series of DigitalEd online trainings.

    About MediaShift »
    Contact us »
    Sponsor MediaShift »
    MediaShift Newsletters »

    Follow us on Social Media