I am a compiler engineer intern in the MATLAB Coder group. Currently I am working on the IR level and my job is to tune the passes to improve the quality(performance) of generated code.

This is the 6th week of my internship

How do you feel in general?

It is hard at first, but I enjoy the challenges.

I come from a Computer Graphics/Simulation background. And I have relative less working experience on the compiler field. The only experience of compiler construction I have is about LLVM and ispc. This experience is useful but I find it hard to directly apply to my daily job.

My workmates are very friendly and they help me a lot. It feels very different to work with workmates because I was used to working with open source community remotely.

What is the most challenging task in your first part of internship?

In fact I had only two commits in the first part. I think the most challenging task is to make improvements on all test cases.

A common scenario is that one change improves some test cases but worsens others.

This happens because the passes are organized in a pipelined manner, the change in one pass can affect all others at the down stream.

For now I still have not found a systematic way to resolve this. My approach is to examine the generated IR at each step and try to locate the earliest fault site. Later I will look at how LLVM/GCC community deal with this issue.

What have you learned in your internship?

Build handy tools to speed up your workflow.

In my group we have many debugging/profiling tools. And some tools are specially designed to fit our requirements. I think this is the most advantage over other compiler infrastructures such as LLVM.

I also find that the MATLAB language exhibits very different characters compared with other general purpose programming language. Its syntax carries more information for potential optimization. And it requires more considerations in some topics such as dependency check.

Which part do you want to improve in your remained time?

I need to train my intuition of the IR. Sometimes I find it too slow to analyze the IR step by step from the beginning.

It would be much faster to use intuition or experience to narrow the search range in a small part of code first and then look at the details after that. Actually this is what my workmates do when they help me debug.

Besides, I realize that I am especially weak in the analysis involving live ranges. More practice is needed here.

Do you have some suggestions on working on a large code base?

Our code base is large but I think it is not complex. The sources are well organized and the options/flags are designed “orthogonally”, which means I can work on a part of code base and do not bother with other parts.

However it requires a lot of effort to keep the code base clean. Take the orthogonality as an example. When you are going to add a new option, you have to read all other options to ensure your new option is not a “combination” of others.

As a suggestion, I would put the papers used for implementation in the comment. And also comment the trade off(if any) I made. This will be useful for others to understand how you implement an algorithm from paper.