UPDATED 16:20 EDT / JANUARY 20 2014

NEWS

Treasury.IO: The $11 trillion check book daily data feed automation tool about the US government

Interested in what the US Treasury is spending out on a daily basis? You’re in luck: there’s a data-driven report for that. Unfortunately, the information released has always been arcane and difficult to digest. To fix this problem Treasury.IO–built by a team of developers, data scientists, and journalists–stepped up to help fix this problem and open up this data to easier use.

The US Treasury’s cash and debt operations for the Federal Government publishes a daily report based on reporting from Federal Reserve Banks, Treasury Regional Financial Centers, Internal Revenue Service Centers, Bureau of the Public Debt and various electronic systems. The operating cash is maintained in an account at the Federal Reserve Bank of New York and is invested in Tax and Loan Note Accounts at financial institutions.

The Treasury’s Daily Treasury Statement lists actual cash spending and the amount is down to millions of dollars on everything the government spent and funded on each day. The department publishes data tables summarizing the cash spending, deposits, and borrowing of the Federal government. These files contain catalog of all the money taken in that day from taxes, the programs, and how much debt the government took out. But the problem is that the Treasury only release these files in text or PDF files, making the analysis very difficult to read and understand.

Moreover, one table provided by the department provides cash withdrawals from the Treasury, another table breaks down deposits, yet another one summarizes debt insurances and a fourth table provides tax income and so on. The files’ messy formatting has somewhat limited the analysis and proved a challenge for the Treasury.IO team.

The birth of Treasury.IO

A mix of journalists and developers in New York who share a common idea that government data and data that journalists publish should be free and open. The group started working on side projects that can take the form of an official grant-funded project like Treasury.IO or things as small as scraping movie data to analyze trends.

Developer Michael Keller said to SiliconAngle associate editor Kyt Dotson that Cezary Podkul, a data journalist at Reuters, helped give life to this project one week in 2012 when he talked about the Treasury department dataset–focusing on the useful data contained in the set as well as obstacles to its use by the journalism community and public in general. The problem was the data format: it was published in a way that made it practically impossible to do any type of trend or aggregate analysis. For instance, you could see how much cash was paid on government payrolls today, or yesterday by pulling up the individual report for one day. But there was no easy way to quickly benchmark or compare it to other days or months.

After refining the project through a weekend event co-hosted by Columbia and Stanford datafest, a $10,000 Knight-Mozilla OpenNews grant, and some talks with the Treasury, they’ve come up with a tool to solve Podkul’s original problem.

As a result, the group started building Treasury.IO as a way to liberate dataset so journalists, or anyone, could analyze the government’s cash spending and borrowing more easily–an issue that has been a focal point of political debates.

Treasury.IO contains all of the government’s daily ledger sheets dating back to June, 2005. The system is based on SQLite database that can be directly queried via a URL endpoint, which can be downloaded and parsed the fixed-width files into a standard schema.

The members who contributed to this project are Brian Abelson, Jake Bialer, Burton DeWilde, Michael Keller, Thomas Levine, and Cezary Podkul, with assistance from Ashley Williams.

The challenges

To understand what challenges the team faced, SiliconAngle went to the developers.

Keller said the data format was extremely challenging to work with.

The Treasury publishes these files in a fixed-width format, which, if you think of a normal data file as an Excel file with columns, is a data file where each character is a column. So to pull out a number you’re interested in, you need to know that it occurs X number of characters from the left. That wouldn’t be so bad if the number of characters was the same, but that’s not the case.

To make matters worse, the files aren’t consistent in how they structure this data, so you can’t easily write a program to turn these files into Excel-like spreadsheets.

Also, line items will sometimes be added or removed from the files, and the specific name of the same program sometimes changes over time. The developers’ challenge was to write a parser that could parse the data in a consistent format so they didn’t have to copy all of the data by hand.

The other challenge was writing a front-end website and documentation that could make this data understandable for people interested in their government.

Treasury.IO developers wrote extensive notes, linked, and researched what these different programs do so that people could not only query the data, which was impossible before, but also learn about what is actually in the data (i.e. what these programs are responsible for) and thus be able to sharpen their knowledge of how government runs its checkbook.

The Data

Treasury.IO uses a user-friendly query builder which lets users select, graph, and download data from each table. The users can query the database and return the data in a CSV, data.frame, or JSON in Python, R, JavaScript, Node.js, Ruby, or Google Docs. The maker code is at GitHub, which can be cloned to set-up a stand-alone instance of the API. In addition, a SQLite database is updated daily that anyone can download.

Treasury.IO lets users query government expenditures agency by agency in specific detail–down to the day, week, month, and year.

How it works

Each day at 4pm, the Treasury updates its website with the latest information about daily spending, revenues, and borrowing. Shortly after, Treasury.IO downloads the file on a cloud server run by ScraperWiki, a data hosting service.

The parsing program then runs through the directory to find whether any new file is being added. When the program finds a new file, it parses individual line items, standardizes names, and insert the appropriate tables in the database. All the tables store as a CSV format for the day and later gets concatenated and stored in master SQLite3 database. In between, Trasuery.io uses Amazon CloudFront as a proxy to prevent the occasional bottleneck. The free project is also augmented by a Treasury IO twitter bot, which tweets out analyses of the data each day.

Using the tool, you could track the salaries of government employees, compare how much citizens pay in taxes versus how much they grow the debt and could compare Medicare spending to premiums over time.

Treasury.IO is currently being used by Reuters to analyze federal payroll data for government workers who have been out of work for nearly two weeks during the partial shutdown of the U.S. government. Al Jazeera America used Sunlight Foundation’s Capitol Words API and Treasury.IO to make a chart showing the rise in the debt ceiling since 2005, alongside the number of times the words “debt ceiling” have been spoken on the floor of Congress.

Treasury.IO was first soft-launched last summer in July, and revised around October to add more documentation. The group is officially launching it in January 2014.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU