Test Drive: Apple Mac Pro Memory
In the first edition of this month''s Affordable HD enewsletter, we keep the focus on the Apple Mac Pro, specifically analyzing the optimal memory configurations for editing and encoding with Apple Final Cut Studio, Adobe Creative Suite 4, and Telestream Episode Engine. I looked at several scenarios.
First, I tested performance at 8GB, 12GB, and 24GB of memory to understand where the sweet spots might be for all involved applications. Then, to assist those trying to identify the optimal memory configuration for Mac Pros, I tested performance at 8GB and 12GB capacity, using different combinations of memory to achieve each total. Specifically, at 8GB, I compared the performance of two 4GB dual in-line memory modules (DIMMs) against four 2GB DIMMs, and at 12GB, I compared the performance of three 2GB DIMMs against the combination of two 2GB DIMMs and two 4GB DIMMs.
Why bother with these latter tests? Because the Mac Pro''s new Intel Nehalem chipset has a 3-channel integrated memory controller, which, at least in theory, makes the optimum memory configuration equal amounts of RAM in each of the first three DIMM slots. To be clear, as you can see in Figure 1, the Mac Pro''s memory tray, there are eight memory slots on a dual-CPU system (four on a single-CPU system). In the figure, there''s a DIMM in the first three slots on each side, while the fourth is empty.
Several websites have reported that performance can actually drop when you configure RAM outside this optimal configuration. For example, Bare Feats reported that adding a fourth DIMM to the mix can actually slow performance by up to 33 percent. Why? In theory, because adding the fourth DIMM forces the CPU to split the memory bandwidth between the third and fourth DIMMs, which drops the effective throughput to those DIMMs by 50 percent.
That said, Bare Feats is clear that the test used to produce these results, the digLloydTools (DLT) stress test, is an artificial, worst-case test. It concludes: "DON'T PANIC: If you have memory installed in all eight slots of your eight-core Nehalem (or all four slots of a four-core Nehalem), it may not penalize your real-world application performance. The vast majority of real-world applications do not saturate the memory bandwidth."
I didn''t want to duplicate these tests, but I was interested in how mixed-memory configurations would perform given the 3-channel memory controller and the fact that upgrading all three slots to identical RAM configurations may not be the most affordable approach. But I get ahead of myself. Let''s begin at the beginning.
Apple sent the original Mac Pro to me with 12GB of RAM (six-2GB), and I was interested to see how upping that configuration to 24GB of RAM (six-4GB) would impact performance. To make a long story short, not much. To add some perspective, I threw in performance stats at 8GB using a four-2GB configuration. That means two 2GB DIMMs in each side of the Mac Pro''s memory tray, with slots 3 and 4 open on both sides.
Apple sells the dual-CPU Mac Pro in the two-2GB-per-processor, 8GB total configuration, with the upgrade to 12GB (six-2GB) only $200 more. Apple doesn''t offer the 24GB configuration, but the 32GB configuration costs $6,100 extra, which I extrapolated down to $4,000 for Table 1 as a rough guess of the price for a 24GB configuration. I know you can buy cheaper memory from other outlets, including Crucial and Trans International, but configurations are priced differently and always changing, so let''s use the Apple numbers for comparison.
I ran multiple tests using multiple programs. Let''s start with Final Cut Pro, which involved a 30-minute wedding ceremony shot and edited by Carrie Cannaday, a shooter/editor of broadcast, wedding, and other event work in southwest Virginia. Cannaday shot in HDV with multiple camcorders, which I rendered to a QuickTime Reference Movie. I then input that file into Apple Compressor and produced an H.264 file for streaming and a DVD-compatible MPEG-2 file. The comparative times are shown in Table 1.
Note that the percentage differences in the 12GB column measure the difference between 8GB and 12GB configuration, while the percentage differences in the 24GB column measure the difference between 12GB and 24GB. Any way you look at it, the performance differences are fairly minor throughout and not always in favor of the higher memory configurations (and yes, where anomalies existed, I tested twice to confirm).
For example, the 12GB configuration was actually slower at H.264 encoding than the 8GB, though not by much. The lack of substantial difference is even more surprising given that the 12GB configuration was theoretically optimal (one DIMM per memory controller channel) while the 8GB wasn''t (one channel open). Still, for $200 extra, the 12GB appears to be worth the price, while the 24GB configuration clearly isn''t for these applications.
The second round of tests involved Adobe Media Encoder and two of my own real-world projects. The first was a 90-minute ballet shot with two HDV camcorders mixed via Adobe Premiere Pro''s multicam feature and rendered to MPEG-2 for an SD DVD. The second and third tests involved a 10-minute single-camera DV shoot produced as an audition for America''s Got Talent. I rendered that file to H.264 and to MPEG-2.
Again, some anomalies, though in the longest and most demanding test, the 12GB configuration was 10 percent faster than the 8GB configuration for only $200. I''d spend the money. I didn''t see any results that even tempted me to go higher than 12GB.
I produce mostly in HDV with a smattering of AVCHD, though this latter component is increasing. Beyond my personal choices, I wondered how additional memory impacts working with different HD formats?
To test this, I used a series of multiformat synthetic benchmark tests, one a short project of 2 minutes or less, the other 10 minutes long. The short project involved multiple picture-in-picture effects, including an Adobe After Effects chroma key effect incorporated via Dynamic Link. My goal was to stress system memory and simulate the production of a heavily edited but short project such as a 60-second commercial.
The second round of tests involves 10 minutes of lightly edited source material, including color correction and a logo, but no picture-in-picture or Dynamic Link. This test was designed to assess pure throughput in the typical event-type production such as concerts, ballets, and sporting events. Note that I didn''t run the long test on the Red Digital Cinema footage in all configurations because of time considerations and the fact that if you shoot with a Red camera, you can probably find the extra $200 for the 12GB of RAM. For the record, I rendered all formats to the outputs specified in Table 3.
Again, while the results aren''t striking, I''d spend the $200 extra for the 12GB, and probably wouldn''t opt for the 24GB configuration unless I was working on extremely time-sensitive materials.
The final tests involved Telestream Episode Engine, a streaming-media encoder with very efficient multiprocessor use. Here I ran two tests. The first involved encoding a single HD file to nine different streaming formats: a mix of VP6, H.264, and Windows Media output in different resolutions and data rates. The second involved encoding 16 1-minute SD source files to 14 output files in the same three streaming formats, plus MPEG-2.
Again, the time savings associated with the additional RAM wasn''t striking. I checked these results with the Telestream folks, who said that they recommend 1GB of RAM per CPU, and weren''t surprised that the additional RAM beyond 8GB produced negligible benefits. Hey, efficient code is efficient code. Still, if you''re configuring a Mac Pro for heavy-duty batch encoding use with Episode Engine, I''d spend the dough for 12GB, but no more.
So, the 8GB in a two-2GB configuration performed quite well against two configurations with more memory in the supposed ideal configuration. Why would this be? Primarily because editing and encoding are far-less-than-realtime operations, so memory throughput isn''t the primary performance bottleneck. For this reason, I wouldn''t generalize these results to realtime applications such as a streaming server or large data set visualizations. Still, for most day-to-day editing and encoding chores, violating the Nehalem''s ideal configuration seemed to be a non-event.
Since I had multiple memory configurations (and apparently way too much time on my hands) I decided to run some additional tests. The first was seemingly the most severe, two 4GB DIMMs (one on each side of the memory tray) compared to the four-2GB configuration used for the 8GB tests above. I'm not quite sure how you''d ever get to the first configuration (someone steals your original DIMMs, but not the computer itself, so you add back two 4GB DIMMs?), but here are the results.
In Final Cut Pro tests, the results actually appear statistically significant (unlike the Premiere Pro tests you''ll see in a moment) though nowhere close to the 2X theoretical advantage you would expect with the four-2GB configuration if memory throughput was the actual bottleneck (see Table 5).
In contrast, with Premiere Pro, the memory configuration didn''t significantly change the results in the real-world tests (see Table 6).
Or in the synthetic tests (see Table 7).
On the other hand, in the Episode Engine tests, which were probably the most demanding, some significant differences did appear, especially in the HD trials (see Table 8). Before we draw any conclusions, however, let''s look at the next set of tests.
In this set of tests, I compared the 12GB in the ideal configuration (six-2GB) to a makeshift configuration of one 2GB and one 4GB DIMM per side. Again, not sure how you''d arrive at this latter configuration, but I was curious to see the results (Table 9).
In Final Cut Studio tests, the results were surprisingly mixed. Here, H.264 encoding was actually slower in the supposedly optimal configuration, while all other results were faster, though none significantly.
Premiere Pro real-world tests showed a similar anomaly, with MPEG-2 tests significantly slower in the optimal configuration, and other tests about the same (Table 10).
Premiere Pro synthetic tests were a big yawn, though where there was a significant difference; the “optimal configuration” was slower (Table 11).
Even Episode Engine showed little difference in comparative performance (Table 12).
Why would this be? Impossible to tell.
What does it all mean? I''d probably buy memory in the optimal configuration when purchasing the system new or making major upgrades. On the other hand, if I had a system with six 2GB DIMMS and found two 4GB DIMMS in my Chanukah stocking, I probably wouldn''t lose any sleep that in my day to day editing and encoding chores, working outside the optimal configuration was costing me significant performance one way or the other.