Skip to content.

Developing a Curriculum for Statistical Analysis of Spatiotemporal Data Using Cyberinfrastructure

Investigator:

Mary Kathryn Cowles
Department of Speech Pathology and Audiology
Statistics and Actuarial Science; Biostatistics

Co-Investigator:

  • Marc Armstrong, Professor and DEO, Dept of Geography (CLAS); CLAS Interim Associate Dean for Research
  • Brian Smith, Assistant Professor, Dept of Biostatistics (CPH)
  • Shaowen Wang, Research Scientist, ITS-Academic Technologies; Adjunct Assistant Professor, Dept of Geography (CLAS)
  • Jun Yan, Assistant Professor, Dept of Statistics and Actuarial Science (CLAS)

Awarded: $26,000

 

What do you intend to do?

Storage and statistical analysis of massive geographic datasets exceeds the capabilities of any single computer, often even of a cluster of many computers at a single facility. This poses a significant challenge in teaching methods for analyzing such data.

Recently, cyberinfrastructure (CI) technologies have emerged to coordinate the collective use of heterogeneous computational resources from many facilities to address computationally-intensive problems. This approach to supercomputing is well suited to computationally intensive methods in spatial statistics. The purpose of our project is to develop course modules for teaching spatial statistics using the National Science Foundation (NSF) TeraGrid -- a major element of national CI that is based on Grid technologies.

The TeraGrid offers a collection of Science Gateways intended to provide easy access to TeraGrid resources. A TeraGrid Science Gateway is often implemented using Grid portal technologies tailored to the needs of a specific scientific community. Grid portal technologies provide support for Grid services on top of Web services.

The PI and co-PIs on this proposal have collaborated since 2004 to produce a TeraGrid GIScience Gateway for geographic information science, called GISolve. GISolve provides a web interface through which a user may upload a dataset, select an appropriate analysis method, specify any needed parameters, and access TeraGrid computing resources to carry out the analysis and deliver the results.

In this project, we will use GISolve to develop several portlets that encapsulate spatial statistics methods. We will incorporate these portlets into the following courses:

  • 044:005 - Foundations of Geographic Information Systems (undergraduate)
  • 044:113 - Principles of Geographic Information Systems (undergraduate)
  • 22S:138 - Bayesian Statisics (undergraduate and graduate)
  • 22S:166 - Computing in Statistics (primarily graduate)

How will it improve student learning?

Students will be able to perform large-scale hands-on statistical analyses of real-world data using TeraGrid resources made available through the TeraGrid GIScience Gateway. They will learn through experience how to deal with the challenges of analyzing massive datasets.

The portlets will be designed as online modules and as such will be widely accessible to other instructors and students. The portlets will allow for different emphases in courses across disciplines. For example, in geography, geology,or sociology courses, the focus may be to obtain answers to research questions, while in statistics and computer science courses, more attention may be paid to the underlying statistical methods and computing algorithms and strategies.

What do you need?

We need support for a 50%-time graduate research assistant for 12 months. This RA will assist the research team in adding more flexible spatial data analysis tools into GISolve as well as in the technical aspects of producing course module materials. Although the RA will work under the day-to-day supervision of Shaowen Wang in ITS-Academic Technologies, he or she will be integrated into the project team and will be indirectly supervised by the PI.

What is your rough estimate of costs?

Our expected cost is $26,000. This is the projected average stipend for a 12-month 50% RA in ITS-Academic Technologies in 2007-2008.