Chirag Shah's WebHome - Tools

HomeResearchPublicationsTalksTeachingAcademicsCVContact

I have developed several tools to support and promote research in various domains of information seeking and use. All the tools are created with an open-source model, and available for free under Creative Commons Licenses. See here how these tools have helped a number of individuals and organizations around the world in their projects.

Coagmento

Coagmento (Latin for working together) is a tool that allows a group of people to seek and use information collaboratively. At present, this tool is being tested in the lab. More details will follow soon!


ContextMiner

ContextMiner is a framework to collect, analyze, and present the contextual information along with the data. It is based on an idea that while describing or archiving an object, contextual information helps to make sense of that object or to preserve it better. The ContextMiner website (http://www.contextminer.org/) provides tools to collect data, metadata, and contextual information off the Web by automated crawls. At present, ContextMiner supports automated crawls from blogs, YouTube, Flickr, Twitter, and open Web. It also collects inlinks information for YouTube videos from the Web. Additional sources will continue to be added.

Let's say you are interested in what people are posting and saying about the recent outbreak of H1N1 virus. With ContextMiner, you can setup a campaign, say "H1N1 outbreak". Within this campaign, you can add queries that you want ContextMiner to keep running on sources such as blogs, Twitter, YouTube, and Flickr. In this case, queries could be 'swine flu', 'H1N1 virus', etc. ContextMiner will keep running these queries periodically (as per your preference) on various sources indicated before, extract and store data for you in a structured format. You may also want ContextMiner to monitor certain specific webpages. Once you provide these queries and URLs, ContextMiner continues monitoring them without any intervention from your side. Later you can come back to ContextMiner to see what your campaign has collected. You can filter or search into your collection, and even export it for further analysis with other tools of your choice.

ContextMiner is free to use and requires no installation. It is an open-source project with a Creative Commons license.


TubeKit

TubeKit is a toolkit for creating YouTube crawlers. It allows one to build one's own crawler that can crawl YouTube based on a set of seed queries and collect up to 17 different attributes. TubeKit assists in all the phases of this process starting database creation to finally giving access to the collected data with browsing and searching interfaces.

In addition to a suit of components to perform query-based YouTube crawling, Tubekit includes various tools allowing one grab various forms of data off YouTube without running queries. This data includes YouTube videos, video attributes, and user profiles.

Tubekit is free to use. It is an open-source project and distributed under a Creative Commons license.


InfoExtractor

InfoExtractor is a framework to extract relevant information from various sources such as blogs, YouTube, and Twitter.

As a web service, InfoExtractor helps one extract structured information from a supplied URL. For example, one can enter a URL of a YouTube video and InfoExtractor will extract a number of associated attributes (title, tags, view count, comments, etc.) in a format that can be easily exported, analyzed, or pluged into something else.


DiscoverInfo

DiscoverInfo is a unique tool to explore a collection of documents (currently, The North Carolina Election of 1898 from UNC Library).

With DiscoverInfo interface, one can do full text search in the collection. DiscoverInfo indexes text, HTML, XML, and PDF documents. The system prepares term cloud based on the term occurrences in the collection as well as across the documents. These clouds can provide a good overview of the underlying collection. One can browse through the clickable term clouds to discover documents. In addition to this, the system not only retrieves relevant information from the indexed collection, but can also evaluate how novel some information (here, document) is with respect to other documents. This can help one in discovering not only the relevant, but also novel information.


DIToolkit

DiscoverInfo is a tool to visualize a collection. It grabs a website, indexes it, and prepares for browsing, which includes typical IR search, term clouds, and novelty visualization.


| © 2010 Chirag Shah | Site last updated: June 18, 2010 | Site optimized for Firefox | Follow me on Twitter |