In recent years, research labs at Harvard – particularly in the sciences – have found themselves devoting more resources to their research computing efforts. However, they have also found that in research computing, the pace of change, the cost of upkeep, and turnover in local system administrators provides an insufficiently long term and stable computing environment. It is also impossible to meet the demands of grand challenge science with small local computing facilities. With this in mind, the Research Computing group was established in 2007.
As part of the Conte Center, we are developing both internally- and externally-available analysis platforms for the enormous data sets emerging from Conte labs’ studies on the developmental basis of mental illness. We will begin with a focus on data from the genomic imprinting research, which we have experience analyzing through past collaborations. In subsequent years, after streamlining the genomic analysis pipeline, we will focus on the connectome project—which, in addition to Brainbow-based fluorescence and electron microscopy approaches, features super-resolution imaging of synaptic connections. We will then work to integrate bioinformatics streams emerging from the genomics and connectomics studies with each other and with the studies of critical period plasticity.
In addition to developing analysis platforms for these projects, we will build novel algorithms, code and automated systems wherever applicable. We also hope to design classes on advanced computation techniques complementing the scientific approaches used by Conte labs.
About Dr. Cuff
James Cuff was appointed Director of Research Computing for the Faculty of Arts and Sciences in 2006, previously directing Research Computing for the Life Sciences Division. In 2003 he moved from the UK to the Broad Institute of Harvard and MIT. As Group Leader for Applied Production Systems, he managed high performance technical computing alongside large scale storage and relational database systems. Previously he held a position at the Wellcome Trust Sanger Institute as Group Leader for the Informatics Systems Group. There he built the large scale high performance computing infrastructure to support the Ensembl genome analysis project.
Prior to the position at the Sanger Institute, James worked at Inpharmatica in London and the European Bioinformatics Institute in Cambridge. In those positions he focused on using high performance computing to study genome sequences and designed ab initio algorithms for protein secondary structure prediction. James holds a D. Phil in Molecular Biophysics from Oxford University and a B.Sc. (Hons) in Chemistry with Industrial Experience from Manchester University.
A high-resolution map of human evolutionary constraint using 29 mammals. (2011) Lindblad-Toh K et al. Nature. 478(7370):476-82.
Distinguishing protein-coding and noncoding genes in the human genome. (2007) Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES. Proc Natl Acad Sci U S A. 104(49):19428-33.
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. (2007) ENCODE Project Consortium et al. Nature. 447(7146):799-816.
Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. (2007) Mikkelsen TS et al. Nature. 447(7141):167-7
Genome sequence, comparative analysis and haplotype structure of the domestic dog. (2005) Lindblad-Toh K et al. Nature. 438(7069):803-19.
The Ensembl computing architecture. (2004) Cuff JA, Coates GM, Cutts TJ, Rae M. Genome Res. 14(5):971-5.
The Jalview Java alignment editor. (2004) Clamp M, Cuff J, Searle SM, Barton GJ. Bioinformatics. 20(3):426-7.
Ensembl 2004. (2004) Birney E et al. Nucleic Acids Res. 32(Database issue):D468-70.
Ensembl 2002: accommodating comparative genomics. (2003) Clamp M et al. Nucleic Acids Res. 31(1):38-42.
The Ensembl genome database project. (2002) Hubbard T et al. Nucleic Acids Res. 30(1):38-41.
ProtEST: protein multiple sequence alignments from expressed sequence tags. (2000) Cuff JA, Birney E, Clamp ME, Barton GJ. Bioinformatics. 16(2):111-6.
Q&A with Dr. Cuff (2012)
Why do you find bioinformatics so interesting?
“Data, data, data.” Our ability to generate and understand key issues in human health through in-silico analysis is a data driven exercise. We are now at a turning point where our ability to generate data is outstripping our ability to understand it. With advances in modern algorithms and scalable computing systems we are finally able to start to make sense of the deluge of information that comes from our laboratories.
What are three of your team’s most exciting accomplishments so far?
Since starting at Harvard in 2003 we have increased our computational foot print from 200 to over 17,800 processors. This has now enabled us to achieve great economies of scale to launch into grand challenge sciences such as those we are building within the Conte Center. Connecting neuronal pathway analysis with genetic imprinting would never before have been considered practical without significant infrastructure investments and capabilities.
What are your big goals for Research Computing in the next 5-10 years?
Wow, 5-10 years…. That’s a long time! If I could have imagined when I started computing what we would have seen in just this short period no one would believe you. I still remember 8 bit computers the machines we are starting to see that operate at the Exascale level will result in desktop infrastructure having to support the Petascale. My goal is to attempt to keep up with the extreme velocity that modern sequence technology and computing demand on a daily basis!