Who is to blame for DDR Memory ECC errors?

Is it the DIMM or the System? For DDR4 DIMMs and SODIMMs (that support ECC) the ECC (Error Correcting Code) is calculated by the Memory Controller for each byte on a write. A single bit per byte is provided as part of the calculation and is stored in a different device than the byte its protecting is stored in. However, there is no checking of the write data once it reaches the DRAM. The ECC is really only used to protect the data on the Read. Once the data is read back, the Memory Controller checks the ECC and if incorrect tries to do some kind of recovery. That recovery is system dependent and not specified by the JEDEC spec. In fact, the ECC calculations and algorithms are also not specified and many system vendors do not release their ECC algorithms. If it is a single bit error it will do the correction and write back the corrected data to the DRAM. Single bit errors are also called ‘Soft Errors’. If it detects a double bit error it cannot do any correction as the ECC algorithm is mathematically limited and can only do Single Error Detection and Correction but only Double bit error Detection. You may have seen the acronym SECDED and this is where it comes from, Single Error Correction, Double Error Detection. Double bit errors are sometimes referred to as ‘Hard Errors’ and they will usually cause a machine check and a system crash. System log files should show all of the soft errors and the address that the error occurred on. In addition, it should indicate...

What do you mean there is NO Validation Report?

In our Services department we see all sorts of systems, network switches, routers, and medical devices, etc.  They all share a common theme….the DDR Memory does not work right. The engineers sending us these problem systems are frustrated and we often hear ‘we started getting failures in the field after having it work for years’ or ‘the applications now can’t tolerate any failures’. We even get the occasional ‘this memory stick fails but this one does not, can you tell us why?’. As we go through our Memory Channel Audit we often ask the customers ‘Where is the Validation report for this system?’ The customers almost always say ‘we have no idea!’. Call me old fashioned but I recall working for a large enterprise vendor (DEC) where you had to thoroughly test and validate a system and produce a report that proved, at the very least, you tested it and looked at the Signal Integrity. Given that our society is addicted to the internet, high speed communications, phones, laptops, air travel and on line everything, you would think that validating the platforms and systems that run all of these applications and make all of these critical calculations would at least have some kind of Validation Report. But they don’t and their customers are buying literally millions of them and the general public has become overly reliant on them. The engineers who deploy these systems and are responsible for them in the field should not buy them unless the suppliers PROVE they are good. Given that we are so addicted to the online world we have created we should pressure...

Who won the Logic Analyzer Wars?

We learned recently that Tektronix has discontinued the last of its logic analyzer family. Sigh….Here at FuturePlus Systems we walked the careful balance between the two Whales, Tektronix and HP/Agilent/Keysight. Mostly siding with the later and only crossing over to the ‘dark side’ at the customer’s request. I remember vividly visiting the impressive Beaverton Tektronix campus, hat in hand touting our superior interposers and hardware skills looking for the elusive key to the Tektronix software development environment so we could sell into that market. Once we started down the Tektronix path we had to carefully dodge the wrath of our best friend Agilent. Like the old Girl Scout song I sang as a child “make new friends but keep the old….one is silver and the other gold”. Tektronix was Silver but Agilent was clearly Gold. The wars started in the early 2000’s with one large vendor in particular pitting the two giants against each other. It was brutal with every little technical spec thrown back in our faces as the One Large Vendor led us into the ring to chew each other to death. As it turns out the One Large Vendor had made a costly mistake…..they never thought that if they pushed too hard one vendor would walk. As it turned out one of them did and the remaining Whale charged big and delivered what ever they wanted. The One Large Vendor was able to entice the other Whale back into service a few years later and the wars heated up again around 2010. Then the Logic Analyzer business began a slow and steady decline. The reasons were...

Want to Ca$h in on Bitcoin, BlockChain and Cryptocurrency? Speed up your DDR Memory Accesses

Attention all Bit Coin, Ethereum Miners, Block Chain Fans and Distributed Ledger Technology experts.  Do you REALLY understand the computing limits of your hardware?  These applications are among the most compute intensive applications today.  Like most compute intensive applications DDR Memory is involved. There is some confusion over memory bandwidth versus memory latency.  Latency is the time to first access.  See below examples of a memory subsystem running well below the minimum latency allowed by the JEDEC DDR4 specification for some parameters.  Identifying these bottle necks can dramatically increase your memory access time thus your mining application.  Tuning your system for minimum latencies can add $$ to your crypto wallets. Figure 1:  DDR4 Memory Latencies measured on every clock cycle continuously.  Measurement made by the DDR Detective from FuturePlus Systems Bandwidth on the other hand is the amount of data that can be transferred over a certain time.  This is the Mega Bytes per Second metric.  See below.  This metric is important as it determines the amount of data bandwidth that can be sustained over a longer period of time.  If your latency can be improved this number will also improve. Figure 2:  DDR4 Memory Bandwidth measured on every clock cycle continuously on a per bank per rank basis.  Measurement made by the DDR Detective from FuturePlus Systems If your mining hardware is using the latest DDR4 Memory there is another metric (over DDR3) that needs to be considered.  That is Bank Group tuning.  In DDR4, back to back transactions to the same Bank Group, results in a performance penalty.  Back to back accesses to different Bank Groups is...

Fast 3200MT/s DDR4 SODIMMs

FAST SODIMMS for DDR4 are here! Traditionally SODIMMs (Small Outline DIMM) have been used in the mobile environment because of their smaller size.  For DDR3 SODIMMs did not have ECC so they were not even considered for Servers.  When DDR4 was created there was discussions within JEDEC for SODIMMs to be used in more robust environments so ECC was added to the specification.  However, due to the smaller mechanical size of an SODIMM, memory capacity is limited.  Here at FuturePlus we make sure all our DDR Validation Tools work in a variety of systems and at all supported speeds.  The best way to do that is to validate our tools in as many platforms as we can get our hands on.  Which leads me to this little baby! Yes this ASRock is water cooled!  We had some fun setting it up and adjusting the mood lights it comes with to make the water look pretty.  It has 4 SODIMM channels and each channel is a single slot.  In addition these SODIMMs are vertically mounted.  Those of you who are true SODIMM fans know that in most cases the SODIMMs are mounted on an angle so as to reduce vertical height. So how does this memory bus look? ASROCK SODIMM 3200MT/s: Measurement made with a FuturePlus Systems FS2836 and a Keysight Logic Analyzer Take a look at those EYES!  This is a burst scan of both the read data and write data.  You can see that the eyes allow for ample margin for signal capture.  This Asrock system looks, well….rock solid! ASROCK SODIMM 3200MT/s: Measurement made with a FuturePlus Systems...

What is DDR4 Memory Gear-Down Mode?

A Reliability, Availability and Serviceability  (aka RAS) feature more clearly documented in the new JEDEC DDR4 Rev B spec, Gear-down mode, allows the DRAM Address/Command and Control bus to use every other rising clock of the DDR4 Memory bus clock. The Memory Controller indicates that it wants the DRAM to operate in Gear-down mode by setting bit 3 in Mode Register 3 at boot time.  The system then follows this operation with a sync pulse which is a single clock assertion of Chip Select.  The DRAM then notes that sync pulse assertion and sync’s to that rising clock edge.  It then uses every other rising edge of the clock after that.  So even though the memory controller clock frequency has not changed the DRAM only uses every other edge. Since the data uses both edges of the clock and now the DRAM Address/Command and Control uses every other edge of the rising clock they refer to it as ¼ rate or 2N.  Normally the Address/Command/Control uses only the rising edge of the clock. This is called ½ rate. The screen shot below shows what the bus actually looks like from the memory controller’s point of view in Gear-down mode. Waveform as seen on the FS2800 DDR Detective To reflect what the DRAM is actually using the test equipment needs to be able to adjust to gear-down mode and show what the DRAM is actually seeing on the DDR4 memory bus. State Listing as seen on the FS2800 DDR Detective, what the DRAM sees for DDR4 bus operations while in gear-down mode. Some little nuances come to light when a...
Request More Information/Quote or Call: (603) 472-5905
Send
Request More Information/Quote or Call: (603) 472-5905
Send