I have completed the SILS Master's program and have started the PhD program at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. Please visit my new home page for more information.
Text Mining Toolkit: a Java-based toolkit used for parsing and tokenizing HTML documents, applying unsupervised learning techniques (clustering), and analyzing the resulting clusters. This toolkit is used for my current topic detection work (see below). See the user's manual for a more detailed explanation of the toolkit's capabilities. A new version with enhanced clustering and document processing capabilities is in the works.