Arun Raghavan, an open source software enthusiast, and four friends worked all night on January 19th on a very unique problem. They were scraping electoral data from ceokarnataka‘s website. They wanted to create a user-friendly frontend for citizens to search their names and polling booth information. They did this as part of a hacknight to commemorate the life and works of Aaron Swartz, on January 19th and 20th organised by HasGeek, an event organiser for geeks.
Aaron Swartz was a hacktivist, who died in early January. He had helped create RSS 1.0; contributed to Creative Commons; was an early builder of Reddit, where he’s often acknowledged as a co-founder; and more recently, became a data liberator, which got him into trouble with the law.
Aaron Swartz is gone, but his work on making the world a better place should not die with him, was the idea behind the hacknight. The idea was to understand his work, issues such as IT laws, copyright rules and access to information and contribute to keep Schwartz’s memory and projects alive.
Swartz had initiated several coding projects during his lifetime. Anand Chitipothu, Bangalore-based developer who collaborated with Swartz at the Internet Archive and maintains his web.py framework, suggested that the hacknight could also be an opportunity where people get familiar with Aaron’s coding projects and work on some of them.
Around 40 people participated. Some participants proposed projects to liberate different kinds of public data such as electoral data, weather data, information about train timetables and crawling data from government and NIC websites. Developers worked on these projects to make the data searchable and usable.
Discussions during the hacknight: The hacknight started at 3 PM with a discussion about the life of Aaron Swartz and the political and legal implications of his coding projects and activism. This discussion was led by Chitipothu and Kiran Jonnalagadda of HasGeek.
Schwartz had started freeing data funded by public money which constitutionally belonged in the public domain. He published data from the catalogue of the Library of Congress and the US case law archives on the Internet Archive. Later, Aaron downloaded articles from JSTOR to release academic papers whose research was funded with public money. Before he could sift through the downloads, Aaron was caught by the police. He returned the hard disk containing the downloads. JSTOR and MIT did not pursue cases against him, but the United States government charged Aaron for breaking into the MIT campus and faking identity by changing the MAC address of his computer.
At the end of Jonnalagadda’s presentation, participants asked several questions about activism, what constitutes offensive speech, framework of IT laws in India, and the process of law-making.
Sunil Abraham of the Centre for Internet and Society (CIS) also joined the hacknight. He made a presentation about copyright laws, the Indian IT Act and Schwartz’s work. After Sunil’s presentation, there was a half hour discussion about the scope of copyright laws in India, copyright exemptions and what constitutes copyright infringement. Participants agreed that the trouble lies with the broad interpretations of copyright and IT laws. This enables the state and private parties to target and harass a person, often on frivolous grounds.
At 6 PM, participants with project ideas and those who wanted to join projects formed groups
A complete list of projects that participants worked on during the hacknight are available on the hacknight website. We talked with some of the teams and individual participants to understand their projects, the process they followed for solving the problems, and outcomes at the end of the hacknight.
Liberating electoral data: Arun Raghavan, an open source enthusiast, and four other participants (Arun K, Praveen, Mikul and Sumant) worked on scraping electoral data from ceokarnataka.kar.nic.in. They planned to build a frontend which will make it easy for users to search their names and polling booth information. Currently, the electoral roll is published as a PDF document for each polling station along with a search form (which is unreliable and fails often) for individuals to find their names on the roll and the location of their polling station.
It was difficult to parse the data because the PDFs were not designed for machine readability. Hence, the team had to spend time understanding how to extract the text. The other problem was that the person’s name was written above the father’s name, but if the person’s name was very long, it overlapped the father’s name. This made it difficult to determine where the person’s name ended and where the father’s name began. The team managed to come up with a heuristic to distinguish between the person’s name and father’s name based on slight differences in the way the text was printed on each sheet.
Other data liberation projects:
Indexing Government websites by category of information: Elvis D’souza worked on crawling government websites and indexing them by category, for e.g., education, import-export trade, science and technology, etc. According to him, government websites contain lots of information including documents and spreadsheets. At the hacknight, Elvis completed the indexing process and ran some statistics about information contained in these websites. He eventually wants to build a portal where people can access this index and the documents.
Railway timetable data: Anand scraped data from the IRCTC website. Supreeth Srinivasmurthy worked with this data to plot a map. Bibhas Debnath also worked on the timetable data to b
uild an API. A demo of this API is yet to be released.
Parsing weather data: Asok Padda converted weather data from HTML format to Excel sheets. Hourly weather data for all weather stations in India during 2012 is parsed and uploaded to Internet Archive: archive.org Other projects: Kashyap Kondamundi started building an app which will help people to calculate the current values of their mutual funds. He built 70% of this app at the hacknight.