GRITTS Seminar (Speaker: Victor Pankratius, MIT Haystack)

Wednesday October 1, 2014 4:00 pm
LIGO Lab, NW22 Conference Room*

Computer-Aided Discovery: Scalable Machine Assistance for Big Data Science

Next-generation science needs to handle rapidly growing data volumes from ground-based and space-based instrument networks. We are starting to see the beginning of a data Tsunami in astronomy, geoscience, physics, and other disciplines. In radio astronomy for instance, the current generation of antenna arrays produces data at Tbits per second, and forthcoming instruments will expand these rates much further; various physics experiments could potentially collect data at Petabytes per second. Human scientists are thus becoming increasingly overwhelmed when attempting to opportunistically explore Big Data and understand its cross-disciplinary implications. As real-world phenomena are digitized and mapped to data, the scientific discovery process essentially becomes a search process in multidimensional data sets. The extraction of meaningful discoveries from this sea of data therefore requires highly efficient and scalable machine assistance to enhance human contextual understanding. Computer-Aided Discovery uses automation in a new way to match models and observations and support scientists in their search. The NSF-supported computational infrastructure currently being developed at MIT Haystack opens up new possibilities to answer questions such as: What inferences can be drawn from an identified feature? What does a finding mean and how does it fit into the big theoretical picture? Does it contradict or confirm previously established models and findings? How to test hypotheses and ideas effectively on a large scale? To achieve this, scientists can programmatically express hypothesized scenarios, constraints, and model variations. Using programmable crawlers in a cloud computing environment, this approach helps delegate the automatic exploration of the combinatorial search space of possible explanations in parallel on a variety of data sets.

