Tuesday, April 7, 2009

optimize application for ibm cell processor

That goes the title of my final year project. Finally today, I submitted everything, three copies of my final report plus a CD containing the source code! It was a long two semesters project, yet it's interesting because I had a chance to explore a new processor architecture which was released in 2005. It is called the IBM Cell Broadband Engine Architecture (IBM Cell BE, or Cell processor for short), which was jointly developed by Sony, Toshiba, and IBM. The development process itself was started in March 2001 and upon its release, the Cell processor is used to power up Sony PlayStation 3 machine.

This processor is developed for multimedia and game applications as well as other numerically intensive workloads. It provides support both for highly parallel codes, which have high computation and memory requirements, and for scalar codes, for which fast response time and a full-featured programming environment. The Cell BE architecture provides both flexibility and high performance by utilizing a 64-bit multithreaded Power Processor Element (PPE) with two levels of globally coherent cache and eight Synergistic Processor Elements (SPEs), each consisting of a processor designed for streaming workloads, a local memory, and a globally coherent DMA engine.


an example of the rendering of a high dynamic range imaging (HDRI) tone-mapped image

So, this project itself attempted to optimize a non-trivial but interesting application, which is OpenEXR, a high dynamic-range (HDR) image library, on the Cell platform. The library is optimized to use all the available resources of the Cell processor. In order to achieve this some optimization methods (such as function inlining, branch predication, and loop unrolling) and vectorization with SIMD instructions are done as well as parallelization of execution with multiple Synergistic Processor Elements (SPEs) in order to fully utilize the SPEs in the IBM Cell BE. This project has managed to successfully achieved speedup factor of 1.94 and 1.26 respectively for pixel data writing and reading process and if the time for data initialization is assumed to be negligible, the speedup factor achieved for both processes are 3.25 and 1.88 respectively.


the result of my FYP is captured in this one graph, hehe

So basically that's for the technical thingy. I felt that in the past two months, I've done much more than what I've done from September last year to February this year hahaha. All things that I've done in the last semester seem to be a waste since most of them didn't really go into my final deliverable. However, there are many interesting experience that I get from this FYP. Here's a list of things which I did for the first time (in my life!) during this fyp period: fell asleep in the lab till the following morning, drank four cups of coffee in one day, spent most of my day in the lab, and didn't talk (chatting doesn't count :P) to anyone for one day. On average, I spent 12 hours a day in front of the computer and sleep for only 5 hours a day, hahaha what a life. Here are some photos that remind me of all the time I spent doing this FYP...


the entrance to my lab..


see all the rubbish accumulated in front of my lab


my workstation in the lab


back to my room


delicious nastar from my mom which accompanied me for a week :P


I'm a coffee addict!


first draft..


got a playstation 3 machine, and installed linux in it haha


final submission!

So, I still have to prepare for the presentation in front of two examiners, it will be on 20 April. Some assignments are waiting to be "touched" :P, and finally exams, will be my last 3 exams in NUS, gotta enjoy hahaha..

No comments: