Feeds:
Posts
Comments

The Research IT Blog has moved into a dedicated Research IT category at my HPC Info blog. Visit my HPC Info blog for information and discussion about High Performance Computing (HPC). Other IT topics are also covered there, especially as they relate to supporting scientific research and integrating HPC technologies. Thanks for your visits to this site. I hope you will enjoy HPC Info as well. You can subscribe to the feed below:

Subscribe to HPC Info in a reader

Scientific research often benefits from open innovation.  While there are many examples, I am particularly excited to see what happens in the area of cancer genomics. The Genome Center at Washington University published the results of sequencing the first cancer genome back in November 2008.  Internally, there was collaboration between departments in the School of Medicine resulting in innovative analyses and leading to more discoveries.  Since then I’ve read and heard about a number of similar or follow up projects at varioius institutions.  As data is shared amongst researchers across the world, new collaborations will be formed.  The innovations resulting from these collaborations will hopefully result in better treatments for cancer.

We at The Genome Center at Washington University were happy to get official word that we will be adding an additional 21 Illumina Genome Analyzers to our portfolio of sequencing technology.  That enables us to sequence enough DNA to be equivalent to an entire human genome per day (at 25x coverage).  There is a lot of excitement about the potential such capacity brings.  The Genome Center’s director had this to say:

“Our intention to substantially scale-up with this technology reflects our commitment to large-scale sequencing projects that aim to uncover the underlying genetic basis of various human diseases. With the rapid decline in the cost of whole-genome sequencing, we believe now is the time to embark on initiatives which were previously not possible,” said Richard K. Wilson, Ph.D., Professor of Genetics and Director of the Genome Center at Washington University. “We are confident that we can further reduce the cost and accelerate the rate of human genome sequencing.”

A scale up of sequencing capacity brings a scale up in IT capacity.  We’ll be watching our internal network, disk and HPC resources and scaling as appropriate.  It will be likely that these sequencers alone will generate upwards of 20 TB of data per day, which needs further processing on The Genome Center’s computational resources.  I’m excited about the possibilities that this scale up will bring!

From Blue Data Center Will Be Powered by the Tides (found via @tkunau/@ecogeek):

At first, tidal power will only cover one-fifth of the data center’s needs, but Atlantis hopes that if the first phase is successful, they can expand the tidal array to make up the remaining wattage.

Sun’s Colorado Consolidation Saves Millions describes how Sun used Liebert’s XD rack cooling, clear vinyl cold isle curtains and flywheels to increase the density of its data center while also reducing energy consumption.  They reduced 165,000 square feet of data center space into 700 square feet while reducing their monthly power usage by one million kilowatt-hours.

When we considered the XD cooling units, there were two options: chilled water or refrigerant.  In the case of chilled water, there was the question of potential water leaks in these rack-attached units.  With the refrigerant option, there was the question of an increase in the number of condensers and where they would be placed and how much  more maintenance would be needed.  With either option, there is also an increase in the need for maintenance inside the server room amongst the servers, storage, switches, etc.  The obvious benefit of the XD units is the fact that they can provide enough cooling for up to 30 kW in a single rack.  Although, if I recall correctly, there is a limit to the total number of racks with the refrigerant-based version due to limits on the maximum pressure or capacity of the refrigerant in a single system.

As for the vinyl curtains, there is usually more of an objection to their aesthetics.  Personally, I would like to see them installed to help keep the cold air completely contained in the cold aisle, where it is intended.  Especially in raised floor environments with high velocity air flow where the cold air might be pushed outside the confines of the cold aisle without such containment.

One question about Sun’s use of the flywheel: How large are your flywheels?  Flywheels generally supply on the order of ten seconds or so of power, which is usually enough time for generators to kick on but cuts it very close.  What type of services run out of Sun’s Colorado facility?

Out of necessity, we have some file systems that are 25 TB.  As of 2009, we consider 25 TB a large file system and we are concerned about the potential downtime that may result if an fsck is needed.

Some storage vendors advertise that they can have single file systems that are hundreds of terabytes or even a petabyte.  Often, however, there is no mention of when or if fsck or similar operations would be needed and how long they take.

ZFS claims to eliminate the need for fsck and Chunkfs (ext2 enhancements from around 2006) claims to reduce fsck times by splitting the repair domain.  Further, “journaling file systems only speed fsck time in the case of a system crash, disconnected disk, or other interruptions in the middle of file system updates. They do not speed recovery in the case ‘real’ metadata corruption” (see third paragraph here).

1.) What do you consider a large file system? (What file system do you use for them?)
2.) Are you concerned about fsck times?  (Why or why not?)
3.) Can you predict fsck times based on some parameters (e.g., inodes used, disk size, etc.)?
4.) Any special cases related to fsck or similar operations for clustered file systems?

Here’s a notable quote from Not All Apps Are Fit for the Cloud | The Intelligent Enterprise Blog

With cloud computing the trick is not to follow the hype and the crowd, but to understand your own issues and applications first. From there you can make an educated call as to what applications make sense to outsource to a good cloud computing platform, and what applications to keep local. Keep in mind that this should be an evolving process, and you can always relocate applications as the cloud computing resources improve, and clearly they will.

Which applications have you put in the cloud?

Follow

Get every new post delivered to your Inbox.