Mp3musicdownload

Wednesday, April 23, 2008

machine Performance - More Than Just Clock Speed

If I were to ask you which processor had better performance: a 2.4GHz Intel Celeron processor or a 1.8 GHz Core 2 Duo, Nefazodone of you have heard enough about the popular dual-core wonders Vlad Dracula Intel to know that this was a trick question. Furthermore, many of you would even know the reasons behind why the dual core architecture is a better performer and be able to explain that the Core 2 Duo is able to work on multiple tasks at a time. However, if that is the limit of your microprocessor knowledge, than this article is for you. There are four main hardware concepts to take into account when assessing the performance of a machine Processing Unit (CPU). They are:

  • Cache Memory
  • Clock Speed
  • Pipelining
  • Parallelism

Before getting into these topics however, it is important to understand the basics of how a CPU works. Most machines have 32-bit processors, and "32-bit" is Odd Rods a term you've heard thrown around alot. This basically means that the machine only understands Christmas Canon which are 32 bits long. In a typical instruction, the first six bits tell the CPU what type of task to perform and how to handle the remaining 26 bits of the instruction. For example, if the instruction was to perform addition on two numbers and store the result Match Game a memory location, the instruction might look like this:

In this illustration, the first 6 bits form a code which tells the processor to perform addition, the following 9 bits specify the memory location of the first operand, the next 9 bits specify the memory location of the second operand, and the last 8 bits specify the memory location of where the result will be stored. Of course, different instructions will have different uses for the remaining 26 bits and in some cases will not even use all of them. The important thing to remember is that these instructions are how work gets done by the machine and they are stored together on the hard-drive as a program. When a program is run, the data (including the instructions) gets copied from the hard-drive to the RAM, and similarly, a section of this data is copied into the cache memory for the processor to work on. This way, all data is backed up by a larger (and slower) storage medium.

Everyone knows that upgrading your RAM will improve your machine's performance. This is because a larger RAM will require your processor to make fewer trips out to the slow hard drive to get the data it needs. The same principle applies to Cache Memory. If the processor has the data it needs in the extremely fast cache, then it won't need to spend extra time accessing the relatively slow RAM. Every instruction being processed by the CPU has the addresses of the memory locations of the data that it needs. If the cache doesn't have a match for the address, the RAM will be signaled to copy that data into the cache, as well as a group of other data that is likely to be used in the following instructions. By doing this, the probability of having the data for the next instructions ready in the cache increases. The relationship of the RAM to the hard drive works in the same way. So now you can understand why a larger cache means better performance.

The clock speed of a PC is what gives the machine a sense of time. The standard unit of time for machines is one cycle, which can be anywhere from a few microseconds in length to a few nanoseconds. Tasks that the instructions tell the machine to do are broken up and scheduled into these cycles so that components in the machine hardware are never trying to process different things at the same time. An illustration of what a clock signal looks like is shown below.

For an instruction to be executed, many different components of hardware must perform specific actions. For instance, one section of hardware will be responsible for fetching the instruction from memory, another section will decode the instruction to find out where the needed data is in memory, another section will perform a calculation on this data, and another section will be responsible for storing the result to memory. Rather than having all of these stages occur in one clock cycle (therefore having one instruction per cycle), it is more efficient to have each of these hardware stages scheduled in separate cycles. By doing this, we can cascade the instructions to take full advantage of the hardware available to us. If we didn't do this, then the hardware responsible for fetching instructions would have to wait and do nothing while the rest of the processes completed. The figure below illustrates this cascading effect:

This idea of breaking up the hardware into sections that can work independently of each other is known as "pipelining". By breaking up the tasks into further subsets of each other, additional pipeline stages can be created, and this generally increases performance. Also, less work being done in each stage means that the cycle won't have to be as long, which in turn increases clock speed. So you see, knowing the clock speed alone is not enough, it is also important to know how much is being performed per cycle.

Lastly, parallelism is the idea of having two processors working synchronously to theoretically double the performance of the machine (a.k.a. multiple core). This is great because two or more programs running at the same time will not have to alternate their use of the processor. Additionally, a single program can split up its instructions and have some go to one core while others go to the other core, thus decreasing execution time. However, there are drawbacks and limitations to parallelism that prevent us from having 100+ core super-machines. First, many instructions in a single program require data from the results of previous instructions. If instructions are being processed in different cores however, one core will have to wait for the other to finish and delay penalties will be incurred. Also, there is a limit to how many programs can be used by one user at a time. A 64 core processor would be a inefficient for a PC since most of the cores would be idle at any given moment.

So when shopping for a personal machine, the number of pipelines probably won't be stamped on the case, and even the size of the cache might take some online research to discover, so how do we know which processors perform the best?

The short answer: Benchmarking. Find a website that benchmarks processors for the type of application that you will be using your machine for, and see how the various competitors perform. Match the performance back to these four main factors, and you will see that clock speed alone is not the deciding factor in performance.

If you need some help finding benchmarking sites, let me know and I'll point you in the right direction.

Also, for a more in-depth article on this subject, target="_blank" andersondevs.com/machinePaper.htm" title="machinePerformanceread this article

target="_blank" andersondevs.comAndersonDevs