As part of the Data-Driven Discovery Investigator Competition from the Gordon and Betty Moore Foundation, they ask for
five references to the most influential work in data science in the applicant’s view. This is distinct from the bio-sketch references and will not be factor in the Foundation’s decision-making. This information will help the Foundation better understand the influential ideas related to data-driven discovery and data science.
After talking to others in the lab, below is my list, sorted in order of citations according to Google Scholar. Love to hear comments on these and/or suggestions of others I missed.
-
1. Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., … & Grafham, D. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921. (16,000 citations)
The Human Genome Project turned the secret of life into into digital information. On January 14, 2014 Illumina announced a new sequencing machine that can do the wet lab processing of a genome for $1000. This price is widely believed to be a tipping point, and soon millions will have their genomes sequenced. At 25 to 250 gigabytes per genome, genetics is now Big Data.
A simple, easy-to-use programming model to process Big Data. It led to the No-SQL movement, Hadoop, many startup companies, and awards for its authors.
At a time when there was confusion as to what cloud computing was, it defined cloud computing, explained why it occurred now, and listed its challenges and opportunities.