DDR4 3DS DIMMs: The next big thing in the Data Center

In order to give DDR4 a mid life kicker memory vendors are up’ing their game and producing 3DS DDR4 DIMMs.  What is 3DS you ask?  Its 3 Dimensional Stacking of die in a single package.  Not to be confused by ‘twin die’ which is just 2 die next to each other and not stacked.  3DS uses TSV (through silicon via’s) to make the connection between the dies. 3DS is a game changer when it comes to density.  DIMMS of 128GB, 256GB and possibly 512GB on a single DIMM is enabled by this technology.  RDIMMs or LRDIMMs can implement 3DS and have up to 4 ranks. The 3DS protocol works by introducing the concept of logical ranks in addition to physical ranks. The screen shot below from the DDR Detective shows what the traffic on a 3DS DDR4 memory bus looks like. Waveform showing interleaved traffic between the different physical and logical ranks on a single DDR4 3DS DIMM. The 3DS protocol is also different, as timing parameters between the physical ranks and the logical ranks have to be controlled.  FuturePlus Systems, who took the lead role in JEP 175 DDR4 Protocol Checks, has also created the 3DS protocol checks found in the 3DS option of its FS2800 DDR Detective product. DDR Detective 3DS specific violations.  These run continuously never missing a clock edge and can run for days checking to make sure no potential for data corruption due to protocol errors occur. What’s in your Server?  Well if its 3DS you will want to make sure you’re getting your money’s worth as these DIMMs can be $4000 or more...

Is your DDR4 Memory Controller Compliant?

Finally!  After 2 ½ years FuturePlus Systems was successful in sponsoring JEDEC’s first document on protocol checks, JEP175 DDR4 Protocol Checks.  But we didn’t do it alone!  Many thanks to the other Test and Measurement vendors, EDA vendors and Silicon vendors who took the time to review, comment and contribute.  This document was driven by the need to standardize the rules behind a memory controller’s accesses to the DDR4 DRAM.  Absent the Alert signal, which only asserts for Address/Command Parity or CRC errors, the DRAM has no way to tell the Memory Controller ‘hey you just did an incorrect command sequence or you violated command timing’.   The result of incorrect accesses may not be apparent immediately as that location or adjacent locations may not be accessed right away.  The result can be data corruption. The document is the WHAT not the HOW as these measurements can be made with a Logic Analyzer, Mixed Signal Oscilloscope, Protocol Analyzer (think DDR Detective) or implemented as part of a simulation test bench.   The figure below gives a quick overview of how the Protocol Checks are defined in the new JEP175 DDR4 Protocol Checks Document. Figure courtesy of FuturePlus Systems There are dozens of checks defined in the document but they are in no way the definitive list of ALL possible DDR4 Protocol Checks.  We had to start somewhere so this is the list that was agreed upon.  In order to assist in plugging in all the defined values for the various DDR4 B speed bins (1600, 1866, 2133, 2400, 2933 and 3200, MT/s) FuturePlus Systems has gone one step further and created...

JEDEC DDR4 Revision B Spec: What’s different?

The JEDEC JC42.3 committees have issued the B version of the DDR4 specification. This version is several years in the making as the original JESD 79-4 DDR4 SDRAM specification was released in September 2012 and the A version published in November of 2013. Those of us in JEDEC have been using the task group version of this spec for several years, finally it is available to non JEDEC members. So what is new and exciting about this B revision? Let’s take a look! Ballouts A X32 ballout was added so we can get a 32 bit bus in a single package. Provisions were also added for a 2 die stack in this configuration. Editorial Updates Several changes to make things clearer. This is in response to ambiguities and misunderstandings that have happened over the years with the A version Single Rank Dual Die per package This specifies how to put 2 x8 die to create a x16 configuration. 8GB,16GB and 32GB addressing was also specified. Targeted Row Refresh Contrary to popular belief TRR is NOT in the final B version. It was for a while, but then removed. Originally, this was in response to the Row Hammer issues of DDR3. Looks like the DRAM vendors found another way to reduce RH failures for DDR4 other than making the Memory Controller babysit frequently accessed Rows. DDR4 does exhibit fewer Row Hammer failures….but it still has some Row Hammer failures (that’s a whole different article!) 3DS Although there is a separate specification for 3DS, the B version does add additional CAS/CW latencies for 3DS. Post Package Repair This is enabled/disabled in...

Critical Memory Performance Metrics for DDR4 Systems: Page Hit Analysis

Page Hit and Miss is often a metric used to describe caching architectures.  In this context a Hit is when the page was already open and the Read/Write transaction occurred.  A Miss is when an Activate[1] had to occur just prior in order to open the page.  Opening a page takes time and burns power.  An Unused is when the page was opened and then closed with no transaction targeting it.  Memory Controllers use various locality of data algorithms to keep pages open to improve performance.  That is, they open pages ahead of time in order to improve Hit rate.  If they guess wrong and a page was not needed it ends up being closed without being used.  This not only hurts performance because it takes time to open and close pages, but it wastes power as an open page burns more than a closed one.   To gain maximum insight this should be broken down on a per-direction (Read or Write), per-Channel, per-Rank or per-Bank basis.  Below is an example measurement of this metric. [1] Per the JEDEC DDR4 standard an Activate command opens a row (also referred to as a page) and a Precharge command closes it. WHY Measure this? A page is an allocated space in memory that the controller must ‘open’ prior to reading or writing to.  If pages are allocated and never used performance and power is wasted.  Measuring this gives insight into various page allocation algorithms. Software targeting different applications can act very differently with regards to memory page allocation. By understanding this metric different memory architectures and software can be designed for a better...

Critical Memory Performance Metrics for DDR4 Systems: Bus Mode Analysis

For DDR4 there are 11 different modes and these metrics are Rank based.  These include the following: Reset, Idle, Active, Precharge Power Down, Active Power Down, Maximum Power Down Mode, Self-Refresh, DLL Disable, Write Leveling, MPR Mode (also known as Read leveling or Read training), and  VREF Training Mode.  To make the best use of this measurement these modes should be represented by the amount of time spent in each mode as Time (in seconds), or percentages (time spent in mode divided by elapsed time). WHY Measure this? Gives engineers a relative measure as to how often various modes are entered and for what length of time the system spends in these overhead states. General Verification of the JEDEC specified modes of operation. To quickly look for infrequent events. A quick analysis of no boot scenarios. To isolate problems in Memory Validation. Power Management is included in this metric so it may seem like a redundant measurement but the new insight gained is in the additional modes and how they all interrelate.  Below is an example measurement on our example DDR4 system.  The real insight is gained by the second by second playback to show the movement of the system in and out of these various modes. Summary Due to the advancements in FPGA technology, FPGA based test equipment can now count every cycle, transaction and time spent in almost all important events.  This allows memory subsystem performance measurements to be expanded to give greater insight into DDR4 performance. Bus Mode Analysis is one of those new metrics that can be tracked on any DDR4 DIMM or SODIMM...

Critical Memory Performance Metrics for DDR4 Systems: Power Management

If you are Facebook and its 3 am on the East Coast of the United States you probably want to see your Servers in a low power state.  This can save you money and make your server farms more ‘green’. For DDR4 there are several ways to help reduce power consumption.  They are: PreCharge Power Down Active Power Down Self Refresh Max Power Savings Reducing the frequency and stopping the clock Key metrics here are not only the entering of these states but how long the memory stays in these states.  In addition, additional power savings can be had if the clock is stopped during these power saving modes.  Measurement and analysis of these events is key to maximum power savings.  These metrics should show percentages (used cycles versus total cycles, or versus CKE qualified cycles) and be broken down on a Channel, Slot, and Rank basis.  Another key metric would be the amount of seconds or cycles spent in each mode. Why measure this? Cost Savings Memory Controller Code changes to increase power savings can be evaluated and verified Software efficiency: comparing two pieces of code that accomplish the same task functionally may be different with regards to power management. Summary The trade-off between Power Management and Performance is a never ending tug of war.  If you want high performance you will pay for it with more power consumption.  So some applications may be very aggressive with power management, like Facebook, but others, like high frequency traders want high performance.  So burn baby...
Request More Information/Quote or Call: (603) 472-5905
Send
Request More Information/Quote or Call: (603) 472-5905
Send