DDR4 3DS DIMMs: The next big thing in the Data Center

In order to give DDR4 a mid life kicker memory vendors are up’ing their game and producing 3DS DDR4 DIMMs.  What is 3DS you ask?  Its 3 Dimensional Stacking of die in a single package.  Not to be confused by ‘twin die’ which is just 2 die next to each other and not stacked.  3DS uses TSV (through silicon via’s) to make the connection between the dies. 3DS is a game changer when it comes to density.  DIMMS of 128GB, 256GB and possibly 512GB on a single DIMM is enabled by this technology.  RDIMMs or LRDIMMs can implement 3DS and have up to 4 ranks. The 3DS protocol works by introducing the concept of logical ranks in addition to physical ranks. The screen shot below from the DDR Detective shows what the traffic on a 3DS DDR4 memory bus looks like. Waveform showing interleaved traffic between the different physical and logical ranks on a single DDR4 3DS DIMM. The 3DS protocol is also different, as timing parameters between the physical ranks and the logical ranks have to be controlled.  FuturePlus Systems, who took the lead role in JEP 175 DDR4 Protocol Checks, has also created the 3DS protocol checks found in the 3DS option of its FS2800 DDR Detective product. DDR Detective 3DS specific violations.  These run continuously never missing a clock edge and can run for days checking to make sure no potential for data corruption due to protocol errors occur. What’s in your Server?  Well if its 3DS you will want to make sure you’re getting your money’s worth as these DIMMs can be $4000 or more...

Data Centers: Don’t Throw that DIMM Away!

According to Facebook, DDR Memory is the #2 failure in the Data Center.  A Carnegie Mellon paper that studied DDR Memory failures in Facebook’s data center reported that FB swapped DIMMs out of ~1% of their servers on a monthly basis.  Given the number of servers that Facebook has….this suggests they are swapping DIMMs every hour of every week of every month all year long!  So I asked around….what do they do with the DIMMs they swap out?  The answer came back on a LinkedIn group response from someone who collects the ‘bad’ DIMMs.  His response?  They destroy them.  With DRAM prices skyrocketing and larger capacity and more expensive DIMMs becoming the norm when will Data Centers get a clue? ….IT MIGHT NOT BE THE DIMM.  Even Google’s 2009 study concluded that memory failures sometimes ‘followed the system’. So who is to blame? Well I don’t want to point any fingers, but….we have seen some pretty cheap motherboards (missing ground planes, bad connectors, routing issues), BIOS updates that program the memory controller incorrectly and BIOS bugs incorrectly interpreting the DIMM (or SODIMM) SPD (small eprom with timing and characteristics). See my article on ROUNDING.  Oh…and occasionally a Memory Controller bug or two.  Not to mention the poor overall ground/power design that radiates noise from one memory channel to the other. So what’s in your Data Center? Today’s Data Centers are driving Server complexity up, but the market is driving Server price down.  Thus suppliers are squeezed for margins and test/design validation gets neglected.   Memory errors don’t scale well, so if you don’t want to be throwing good DIMMs away...

Speed up your Servers Memory Performance by Understanding Rounding!

Those of you who read my previous Blog post on the new DDR4 Revision B spec know that DDR4 has a much better defined Rounding Algorithm. So why is this important? The Early Days Timing parameters in JEDEC DDR specs are sometimes listed in nanoseconds, microseconds or milliseconds. When testing or simulating logic that runs on the DDR bus one often has to convert time listed in ns, us or ms into clock cycles and a non-integer number results. Since clock cycles occur in integers only we have to either round up or round down. As a carryover from spec to spec over the years there were various notes in the spec that referred to a ‘simple round up’. As the specifications evolved more timing parameters were added to the spec and the notes became sparser and sparser as to how to handle all these new parameters. With the advent of DIMMs and SODIMMs there needed to be a method for having a specific value or an allowable range of values be associated with a particular DIMM or SODIMM and a method to list other optional features that were implemented on that particular DIMM/SODIMM. Thus the SPD was created. SPD stands for Serial Presence Detect. This eprom like part is on every DIMM and SODIMM and is read by the BIOS in order to properly configure the DIMM/SODIMM for the system that it is residing in. So what does the SPD got to do with Rounding? Well the folks that work on that specification quickly realized the issue and the need for a specified rounding method. When the BIOS...

Is your DDR4 Memory Controller Compliant?

Finally!  After 2 ½ years FuturePlus Systems was successful in sponsoring JEDEC’s first document on protocol checks, JEP175 DDR4 Protocol Checks.  But we didn’t do it alone!  Many thanks to the other Test and Measurement vendors, EDA vendors and Silicon vendors who took the time to review, comment and contribute.  This document was driven by the need to standardize the rules behind a memory controller’s accesses to the DDR4 DRAM.  Absent the Alert signal, which only asserts for Address/Command Parity or CRC errors, the DRAM has no way to tell the Memory Controller ‘hey you just did an incorrect command sequence or you violated command timing’.   The result of incorrect accesses may not be apparent immediately as that location or adjacent locations may not be accessed right away.  The result can be data corruption. The document is the WHAT not the HOW as these measurements can be made with a Logic Analyzer, Mixed Signal Oscilloscope, Protocol Analyzer (think DDR Detective) or implemented as part of a simulation test bench.   The figure below gives a quick overview of how the Protocol Checks are defined in the new JEP175 DDR4 Protocol Checks Document. Figure courtesy of FuturePlus Systems There are dozens of checks defined in the document but they are in no way the definitive list of ALL possible DDR4 Protocol Checks.  We had to start somewhere so this is the list that was agreed upon.  In order to assist in plugging in all the defined values for the various DDR4 B speed bins (1600, 1866, 2133, 2400, 2933 and 3200, MT/s) FuturePlus Systems has gone one step further and created...
Request More Information/Quote or Call: (603) 472-5905
Send
Request More Information/Quote or Call: (603) 472-5905
Send