Code generation occurs once the abstract syntax trees have been created and checked against the backus naur rules. This is important to note as there is no point in generate machine code if the program has got syntax errors. In order to do code generation the compiler will do three things -

  • Convert the code into a intermediate language
  • Convert that into machine code
  • Perform register allocation.

Each command written in a high level language will be converted into many lines of machine code.We say that this is a one to many translation.

Intermediate language is used in order to aid the compilation process. Normally when a language is created it will be used on many different types of computer. As each has their own machine code instruction set we would need to create n compilers, where n is the number of OS's we are writing it for. If we use a intermediate representation then we only need to code the final parts specific to the architectures. That is only machine code generation and register allocation needs to be rewritten.

When we convert to IR (intermediate representation) we have effectively converted the code into a pseudo assembly language. The biggest thing which is different between IR and the abstract syntax trees is that the IR will be flattened out into a queue like structure. Once this has occurred it is a fairly easy task to map the IR representation to machine code instructions. Effectively we only have to code the difficult part of the process once and the easy part n times.

IR mapping is done by looking at matching typical abstract syntax tree patterns. Each command will match to a certain syntax tree pattern. This pattern matching keeps going until the whole process is finished. How this is done is out of the scope of the course.

Register allocation then occurs. This is important as variables used will still be held in the IR representation. However there is a good chance that the names originally used by the programmer have been removed to save memory. For each part of the code there is going to be a number of registers which will be required. When a register is no longer required it can go back to the pool of available registers. As such when it comes to decide which registers it can use it must look at which ones are free and then allocate them. When they are no longer needed they must be de-allocated. The process to do this is called graph colouring and again is out of the scope of this course. For the interested reader the algorithm for graph colouring is a NP-complete problem which can not be solved in polynomial time. As such we have to do a best guess for register allocation which means that we never get the most efficient allocation, just the next best thing,