A Technical Discussion on Megahertz Myths | www.creativeplanetnetwork.com
RSS
Home
Loading

A Technical Discussion on Megahertz Myths

An interview with Mike Rockwell, Avid chief technology officer.


Bob Turner interviewed Avid CTO Mike Rockwell about what AppleComputer and AMD call the “megahertz myth.” Avid offersXpress and Media Composers on both Apple and Windows operatingsystem-based platforms so discussing this subject – whilesubjective does not feature inherent commercial prejudice.

Turner: Mike, I am trying to write an article explaining what isimportant when evaluating/selecting a postproduction platform. I wouldlike to start with public positions from the major players.

First, in the PC World's article (“AMD: Megahertz Isn'tEverything; As Intel pushes to 2GHz, AMD argues performance, not speed,is what matters” by Tom Mainelli, August 24, 2001) you can findthe following segment:

"On the eve of Intel's 2GHz Pentium 4 launch Monday, the folks atAMD are eager to make one point: Megahertz isn't everything. 'Ourcombination of Athlon and DDR technology outperforms their combinationof P4 and Rambus technology,' says Tim Wright, director of desktopmarketing at AMD. ‘Megahertz is only part of theequation.’"

Please comment on Tim Wright's statement.

Rockwell: As with most things, this depends largely on yourapplication. Some applications depend on throughput, some depend oninteger computation, some depend on floating point calculation, somedepend on data manipulation. Each system is good at some things and notas good at others. In general, though, if all other factors are equal,a processor with a higher megahertz will perform faster than one with alower megahertz.

Turner: Fair enough. At the Apple Web URL http://www.apple.com/g4/myth, there is a QT videowhere Steve Jobs and Apple Senior VP of Hardware Jon Rubinstein detailwhy the clock speed of a computer isn't an accurate way to comparesystem performance. Overall system design and processor-architecturedifferences affect real-world application performance; otherwise youmight be fooled by what he terms "the megahertz myth." This appears toreiterate Wright's position. What is your reaction to the QuickTimevideo?

Rockwell: I think that there is some truth to what they aresaying, but there is some marketing FUD in there as well. One can seesome corroborating evidence of what they are saying with the drop inperformance of a 1.41GHz P4 vs. 1GHz P3 on the same operations.

Turner: In this presentation, they demonstrate why an 867MHz G4performed 87% faster than a 1.8MHz P4. On their website, http://www.apple.com/powermac/specs.html, and ontheir PowerMac G4 handout (http://a1712.g.akamai.net/7/1712/51/fae8dc87c5abc2/
www.apple.com/powermac/pdf/PowerMacG4_DS-e.pdf
) they show this inchart form. Is this true? If not, why did it appear so? In youropinion, was this demonstration fair?

Rockwell: I don't know if the Photoshop demo used filtersthat were SSE optimized or not. If they weren't, then it was not a fairdemo. SSE is Intel's equivalent to the Altivec (VelocityEngine)instructions, and they are pretty comparable -- especially SSE2which is in the P4. The other thing is that for normal programs thatdon't do image processing, I think that the Pentium 4 can be faster ina number of cases.

Turner: Do you agree with Apple that there are four factors toperformance: frequency, pipeline stages, number of functional units(number of instructions per cycle), and cache design (level 1, level 2,level 3 cache)?

Rockwell: There is also RAM architecture. For imageprocessing, this is less important than for video processing, where youare pumping large amounts of data to and from memory. The latestPC-based RAM bus memory has 3.2GB per second of memory bandwidth. DDRmemory also performs well and is even cheaper. The Mac memoryarchitecture, to my knowledge, currently has around 1GB per second oftotal memory bandwidth.

Turner: Can you explain in layman's terms the important aspectsof cache design as it relates to the type of performance benefits apostproduction workstation will offer?

Rockwell: Today's processors spend a good chunk of their timewaiting for data. The bandwidth of RAM has not kept up with theincrease in processor speed. To give you a feel for this, look at theratio of the clock speed of the processor to the clock speed of thememory. For example, on a Mac G4 the RAM memory bus clock speed is133MHz, while its processor is running at 867MHz. This means that ifthe processor needs some data that is in main memory it has to waitaround six (867/133) clock cycles before it can do anything.

To help with this, system designers put caches in the system. Acache is basically a small amount of very high speed memory that sitsbetween the processor and the main memory. Level 1 cache is usuallydirectly on the CPU and generally runs at the same speed as theprocessor. If the data is in the level 1 cache, the processor canaccess it immediately. Level 2 cache can either be on-processor oroff-processor. Level 3 cache is usually off the processor. By trying toanticipate CPU's requirement for data, a cache can greatly improve theperformance of a system.

With a fast main memory bus, the requirements for caches aredecreased. So you need to look at both main memory performance andcache performance together to get an overall idea of how the systemwill perform. On a Pentium 4, the memory bus can run as fast as 400MHz,which gives it around three times the throughput and 1/3 the latency ofthe G4. It also has a 256K on chip level 2 cache. The G4 also has a256K on chip cache, but since its memory bus is at 133MHz there is alsoa 2MB level 3 cache that is running at 216MHz to help with the latencyissues.

The pattern of data access in some applications results in most oftheir information being in the cache. In this case, caches dramaticallyimprove performance. Unfortunately, when processing video in realtime,you often get cache misses and you can end up degrading to theperformance of the RAM system. In this case, the Pentium 4 has adecided advantage because of its faster main memory architecture. Ifyou are processing video one frame at a time in non-realtime, then theperformance differences become smaller because you are more likely tohit the cache.

Turner: In layman's terms, can you comment on the accuracy andimportance that Apple places on "pipeline stages," and if its claimsare true, why does Intel use a 20-stage pipeline and Apple only aseven-stage pipeline?

Rockwell: A pipeline is like a bucket brigade of firemen. Thelast person in line won’t get any water until it has passedthrough the hands of each person in line. A bubble in the pipeline islike having each fireman pour out their bucket. They then have to waitfor the bucket from the end of the line. In this analogy, actualprocessor performance is how much water reaches the end of theline.

In real programs, many things can cause a bubble. If the program canto take two different paths based on whether some condition is true orfalse, it can only prepare the pipeline for one of those paths. If itneeds to take the other path, you have a bubble and the pipeline has tobe flushed and start from the beginning.

This is an interesting trade-off. Smaller pipeline stages let yourun the processor at a higher clock speed. If you code an algorithmcarefully and hand tune it, you will not see nearly the number ofpipeline bubbles that Apple is describing. It's debatable which isbetter: a processor that runs most code fairly well or a processor thatruns tweaked code like a banshee.

The consumer probably wants the former while the professionals wantthe latter (and the software to go with it). Ironically the consumer ismore apt to just look at the megahertz number on the box when he buys.He never really appreciates that he's not getting the performance heexpects.

Turner: At Apple’s website URL http://www.apple.com/powermac/processor.html, andthe supplementary URL http://www.apple.com/g4/, there is a graphic whosecaption reads, "The PowerPC G4 Velocity Engine processes information at128-bit chunks, compared to 32 - or 64-bit chunks in traditionalchips." The graphic shows data blocks passing easily through a widegate on the G4 but bottlenecked in the Pentium.

The purpose of the graphic is to compare the design of the P4 andthe G4 Velocity Engine. Video Systems readers will look at thisgraphic and believe that the Intel system has a severe bottleneck. Isthis fair? What would you like them to know when seeing thisgraphic?

Rockwell: This is actually inaccurate now with the Pentium4's SSE2. The statement may have been made prior the introduction ofSSE2. Or maybe the P4 is not considered a “traditionalchip.” Still, Pentium 4’s SSE2 processes data in 128-bitchunks just like the G4. The G4 does have an advantage in that it has aseparate set of dedicated memory for the Velocity Engine. On the P4,its memory is shared with the normal floating point operations.

Turner: Why should Video Systems readers not care aboutJohn E. Warnock's statement (featured on the Apple website):"Currently, the G4 is significantly faster than any platformwe’ve seen running Photoshop"? What is the missing informationthat they should consider when reading that statement?

Rockwell: Still-image processing is different than video.Also, there are other items to consider when looking at performance.Take PCI, for example. The Mac has 33MHz/64-bit PCI, which allows for atotal throughput of 256MB per second. Workstation class PCs have66MHz/64-Bit PCI, which allows for a total throughput of 512MB persecond. Some PCs are coming out with 133MHz/64-bit PCI, which supports1024MB per second. This becomes very important when looking at workingwith HD data rates.

Turner: Avid has selected Intel-based platforms for their AvidXpress DV. The reason they give (according to Charles Russell, productmarketing manager), is "megahertz matters." He goes on to say that withthe faster processing available on Intel platforms, realtimetransitions are available without the hardware acceleration boards thatpeople with Macintosh computers need.

Please comment on this. What is missing from the simplicity ofthe statement? Surely there is more to this technology than that?Please offer your personal view.

Rockwell: As I stated previously, memory throughput is asmuch of a concern as processing speed when it comes to video. Also,with properly tuned algorithms using SSE2, you can do better on aPentium 4 then on a G4. The reasons for choosing Intel for Xpress DVinclude more than just performance, but I'll leave that to themarketing folks to answer.

Turner: Does this imply that we will not see a Mac version ofXpressDV at NAB?

Rockwell: You know that Avid has a policy that we will notcomment on any possible product before it is officially announced, andI am not going to say whether or not such a product is being developed.I will add that we have been advocates of crossplatform support for ourediting software products.

Turner: Thank you, Mike Rockwell, for your candid replies, andfor making technical concepts understandable for non-engineers.