Tutorial 1 : 10 Data Mining Mistakes -- and How to Avoid Them |
John F. Elder IV (Chief Scientist, Elder Research, Inc.) |
Abstract |
This tutorial will reveal the top mistakes data analysts can make, from the simple to the subtle, using real-world (often humorous) stories. The topics will be presented from case studies of real projects and the (often overlooked) symptoms that suggested something might be amiss |
Biography |
Dr. John Elder heads a data mining consulting team in Charlottesville, Virginia, and Washington, DC (www.datamininglab.com), founded in 1995. Elder Research, Inc. focuses on investment and commercial applications of pattern discovery, including stock selection, image recognition, biometrics, cross-selling, drug efficacy, credit scoring, market timing, and fraud detection. John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems & Information Engineering from the University of Virginia, where he's an adjunct professor teaching Optimization. Prior to ERI, John spent 5 years in high-tech defense consulting, 4 heading research at an investment management firm, and 2 in Rice's Computational & Applied Mathematics department. |
Tutorial 2 : Data Mining In Time Series Databases |
Eamonn Keogh (University of California, Riverside) |
Abstract |
In this tutorial we will review the state of the art in time series data mining. In addition to the ubiquitous classification and similarity search problems, we will also consider clustering, anomaly detection, visualization, motif discovery and other exciting tasks. The ideas presented will be motivated by case studies in domains as diverse as video surveillance, cardiology, text mining, space telemetry monitoring, handwriting indexing, query by humming and motion capture/animation. Rather that simply review previous work, we have taken the time to reimplement and compare most of the work in the literature. For example: we have |
Biography |
Dr. Keogh is an assistant professor of Computer Science at the University of California, R iverside. His research interests include Data Mining, Machine Learning and Information Retrieval. He has published papers on time series in all the top data mining conferences and journals, including VLDB, SIGKDD, SIGIR, SIGMOD, SIGGRAPH, EDBT, PKDD, PAKDD, IEEE ICDM, IEEE ICDM, SIAM SDM, TODS, DMKD and KAIS. Several of his papers have won "best paper" awards. He recently won a 5-year NSF Career Award for "Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases". His papers on time series data mining have been referenced well over 1,000 times (see http://www.cs.ucr.edu/~eamonn/selected_publications.php). |
Tutorial 3 : Algorithmic Excursions in Data Streams |
Sudipto Guha (University of Pennsylvannia,) |
Abstract |
For many recent applications, the concept of a data stream is more appropriate than a data set. By nature, a stored data set is an appropriate model when significant portions of the data are queried repeatedly, and updates are small and/or relatively infrequent. In contrast, a data stream is a more appropriate model in scenarios where large volumes of data or updates arrive continuously and it is either unnecessary or impractical to store the data in some form of memory. Many applications naturally generate data streams as opposed to simple data sets. |
Biography |
Sudipto Guha is an assistant professor in the Department of Computer and Information Sciences at University of Pennsylvania since Fall 2001. He completed his PhD in 2000 at Stanford University working on approximation algorithms and spent a year working as a senior member of technical staff in Network Optimizations and Analysis Research department in AT&T Shannon Labs Research. He is Alfred P. Sloan Research Fellow. |
Tutorial 4 : Data Grid Management Systems (DGMS) |
Arun swaran Jagatheesan (University of California at San Diego) |
Abstract |
A data grid infrastructure facilitates a logical view of heterogeneous distributed resources that are shared between autonomous administrative domains. Data grids are being built around the world, as the next generation data-handling infrastructures, for coordinated sharing of data and storage resources. A datagrid infrastructure provides a location independent logical namespace, consisting of persistent global identifiers for data resources, storage resources and users in an inter/intra organizational enterprise. Data Grid Management Systems (DGMS) provide services on the data grid infrastructure for inter/intra organizational information storage management. |
Biography |
Arun swaran Jagatheesan ("Arun") is an Adjunct Assistant Researcher (OPS faculty member) at the University of Florida, and a Visiting Scholar at the San Diego Supercomputer Center (SDSC) at University of California, San Diego. His research interests include Data Grid Management, Peer-to-peer Computing, and Workflow Management Systems. He is the founder and technical lead of the SDSC Matrix Project on Gridflow Management Systems. He is a co-chair of the Grid File System Working Group at the Global Grid Forum, and is involved in research and development of multiple datagrid projects at the San Diego Supercomputer Center. |