Data Centers: Don’t Throw that DIMM Away!

According to Facebook, DDR Memory is the #2 failure in the Data Center.  A Carnegie Mellon paper that studied DDR Memory failures in Facebook’s data center reported that FB swapped DIMMs out of ~1% of their servers on a monthly basis.  Given the number of servers that Facebook has….this suggests they are swapping DIMMs every hour of every week of every month all year long!  So I asked around….what do they do with the DIMMs they swap out?  The answer came back on a LinkedIn group response from someone who collects the ‘bad’ DIMMs.  His response?  They destroy them.  With DRAM prices skyrocketing and larger capacity and more expensive DIMMs becoming the norm when will Data Centers get a clue? ….IT MIGHT NOT BE THE DIMM.  Even Google’s 2009 study concluded that memory failures sometimes ‘followed the system’. So who is to blame? Well I don’t want to point any fingers, but….we have seen some pretty cheap motherboards (missing ground planes, bad connectors, routing issues), BIOS updates that program the memory controller incorrectly and BIOS bugs incorrectly interpreting the DIMM (or SODIMM) SPD (small eprom with timing and characteristics). See my article on ROUNDING.  Oh…and occasionally a Memory Controller bug or two.  Not to mention the poor overall ground/power design that radiates noise from one memory channel to the other. So what’s in your Data Center? Today’s Data Centers are driving Server complexity up, but the market is driving Server price down.  Thus suppliers are squeezed for margins and test/design validation gets neglected.   Memory errors don’t scale well, so if you don’t want to be throwing good DIMMs away...

DDR Memory in Medical Devices: A disaster waiting to happen?

Many medical device manufacturers are experts on the medical portion of their product, but what about the compute engine?  This is the part that does not touch the patient but makes the decision based on the data.  Many medical device manufacturers use off the shelf DDR3 or DDR4 embedded motherboards to be part of their overall system.  But what do they know about them?  In most cases they just ASSume that proper validation was done and that there is nothing to fear!  Well they are not exactly correct. Take a look at what we found A medical device manufacturer complained of intermittent memory failures that caused their medical device to hang or crash.  The failures were troublesome but random and not easy to isolate.  The team looked for months but could not isolate the failures enough to find root cause.  FuturePlus Systems to the rescue!  Within 3 days we found the problem.  The BIOS was not setting ODT correctly in the DRAMs Mode Registers.  It turns out the BIOS was recently updated and a bug was introduced that was not setting up the DIMMs correctly. Here is a picture of what a few of the DDR3 Memory data signals looked like. Figure 1: Eye Scan using the Keysight U4154B and a FuturePlus Systems interposer. Figure 2: ODT at the proper settings.  Eye Scan using the Keysight U4154B and a FuturePlus Systems interposer Vive la difference!!!   So what’s in your medical device system?  Don’t be liable for not doing ‘due diligence’ on every part of your product.  FuturePlus Systems offers a Memory Subsystem Audit to ensure quality and proper operation. ...
Request More Information/Quote or Call: (603) 472-5905
Send
Request More Information/Quote or Call: (603) 472-5905
Send