Skip to content
rounding-768x256
FuturePlus SystemsAug 16, 2017 5:50:29 PM4 min read

Speed up your Servers Memory Performance by Understanding Rounding!

Those of you who read my previous Blog post on the new DDR4 Revision B spec know that DDR4 has a much better defined Rounding Algorithm. So why is this important?

The Early Days
Timing parameters in JEDEC DDR specs are sometimes listed in nanoseconds, microseconds or milliseconds. When testing or simulating logic that runs on the DDR bus one often has to convert time listed in ns, us or ms into clock cycles and a non-integer number results. Since clock cycles occur in integers only we have to either round up or round down. As a carryover from spec to spec over the years there were various notes in the spec that referred to a ‘simple round up’. As the specifications evolved more timing parameters were added to the spec and the notes became sparser and sparser as to how to handle all these new parameters. With the advent of DIMMs and SODIMMs there needed to be a method for having a specific value or an allowable range of values be associated with a particular DIMM or SODIMM and a method to list other optional features that were implemented on that particular DIMM/SODIMM. Thus the SPD was created. SPD stands for Serial Presence Detect. This eprom like part is on every DIMM and SODIMM and is read by the BIOS in order to properly configure the DIMM/SODIMM for the system that it is residing in. So what does the SPD got to do with Rounding? Well the folks that work on that specification quickly realized the issue and the need for a specified rounding method. When the BIOS reads a number from the SPD and its listed in ns, us or ms they need to convert it to clock cycles based on the speed the bus in running at.

 

rounding-768x256

Figure 1: The DDR4 Rounding Algorithm

 

These smart engineers also realized that if they always went with a simple round up they would be causing a performance hit. This is especially true as clock cycles became smaller as performance increased. In order to reach a compromise, of sorts, the JEDEC DRAM committee settled on a ‘guardband’ approach. That is, they take the resulting number and subtract the ‘guardband’ and then do a simple round up. The result was that if the decimal portion of the resulting division was small they did not lose a whole clock cycle on that parameter. The guardband would bring it just below the resulting whole number thus the round up after the subtraction would result in the whole number being the result. The percentage they chose was 2.5% (up from 1% in previous versions of the SPD Spec).


tfaw-rounding-768x437
Figure 2: An example of how rounding effects the tFAW parameter. A smaller number means higher performance

 

How I got dragged into all this Rounding Stuff!
During the specification process for JEP175 DDR4 Protocol Checks (championed by FuturePlus Systems) we had to write the actual equations for the various timing parameters. In doing this we quickly realized that many of the numbers needed to be rounded in order to achieve an integer result. So we started our journey to find what algorithm to use. Low and behold we found one in the SPD that contained a guardband and we found reference (in a note) in the DDR4 JESD79-A spec that inferred a simple round up with no guardband. We raised the question at the JEDEC committee meetings and a host of opinions and ballots resulted. In the end the DRAM committee did adopt the SPD rounding algorithm. HOWEVER, once the committee members realized the change was going to be more than they anticipated there was a follow up ballot (which passed) to keep the simple roundup for non SPD parameters and the 2.5% guardband for SPD parameters. HOWEVER that ballot was not incorporated into the B spec (I do not know why). So what we have is a fairly big change to many of the latency parameters.

Why it matters for Server Performance
Once upon a time a very smart Server Engineer said to me ‘ Barb, if you can save me 1 clock tic out of 100, that is a 1% performance improvement and that is a big deal’. So although 1-2% may not seem big it can help make the difference in your your workload. Check out these examples.

Guardband_examples_ddr4-768x627

 

So if the BIOS for your system is not using the correct rounding algorithm your getting cheated out of some clock tics. For timing parameters that are the sum of two parameters that rounding applies to your losing 2 clock tics every time those transactions take place.

The Bottom Line
Rounding matters! Get it right! If your team needs help analyzing your systems memory performance contact us. We have the tools and services that can help.

COMMENTS

RELATED ARTICLES