Memory Channel Validation Audit
FuturePlus Systems will use its extensive cache of test equipment, experience and know how to perform a Memory Channel Validation Audit.
Memory Channel Validation Audit: This test procedure is not meant to be a design validation. It is meant to be an audit that a robust electrical and protocol DDR Memory Channel validation was done. As an added benefit this procedure can also be used to:
- Spot check motherboards from manufacturing to ensure quality.
- Isolate failing memory channels in the field on servers displaying above average memory errors
- Check for BIOS bugs that program the Memory Controller incorrectly thus causing JEDEC specification violations
Procedure: This testing will be broken into several parts. The first is the electrical audit the second is the protocol and timing JEDEC specification audit, third is Performance Timing Analysis, the fourth is SPD/Mode Register Test and lastly is Row Hammer testing.
Electrical Audit: This testing ensures that the signals at the DDR DIMM connector are acceptable with regards to signal swing, alignment, data valid eye size and that none of the strobe signals, data signals, address, command or control signals look appreciably degraded with respect to their form or function. It will be a qualitative measurement. Eye size will be measured at the DIMM connector as seen by a DIMM interposer.
Protocol Timing Audit: This test procedure ensures that the BIOS has programmed the memory controller correctly for key timing parameters of the DDR memory. It also ensures that under heavy traffic loads the memory controller adheres to the JEDEC specification. DRAM soft errors can result if the JEDEC specification has been violated by commands being too close together or too far apart.
Performance Timing Audit: The DDR Detective® margin testing feature will be used on each channel under heavy traffic load to see how fast the memory controller is issuing transactions. This will be compared to the JEDEC specification to see if any performance gaps can be found. For example, if the minimum Read to Read Same Bank Group is 5 clocks and the system consistently operates at 10 that is 5 clocks of performance being left out of the system. Over billions of transactions this can add up to a considerable performance hit. This test can also be run while the system is executing the end users application or a representative workload. Which can give good insight as to why a particular server does not do well under certain workloads.
SPD and Mode Register Settings: In many cases the BIOS can program the Memory Controller incorrectly due to errors in the Serial Presence Detected EPROM. In addition the Memory controller sets up each DIMM in each Channel of the memory subsystem based on the information found in the SPD. These Mode Register settings should be consistent and within the JEDEC specification. These will be checked for accuracy.
Row Hammer Testing: As geometries shrink and capacities increase DDR Memory cells are susceptible to leakage current from adjacent cells. In the case of DDR Memory a ROW subjected to excessive ACTIVATE commands can leak current into adjacent ROWS. This ROW is referred to as the ‘aggressor’. If the adjacent ROWS, called the victim ROWS, are on the tail end of the cyclical refresh cycle their charge is low. Thus they are susceptible to leakage current that can cause a bit flip. The failure of the DDR Memory cell to hold its charge due to leakage current from an adjacent ROW when the adjacent ROW is targeted with excessive ACTIVATE commands is known as “Row Hammer”. The name was coined because the ROW is being ‘hammered’ with ACTIVATE commands. More information on this can be found on the DDR Detective web site.
FuturePlus System will run Memesis from ThirdIO and use the FS2800 DDR Detective® to verify that excessive ACT commands are being generated. If Memesis finds a Row Hammer Failure a report will be generated.