[C&C Center Top] [User's Guide Top]

5. Compilation and Interactive Execution of C/C++ Programs

Organization of this section

The C/C++ compiler family on kyu-cc is IBM XL C/C++ Enterprise Edition V7.0 for AIX.

For more details, please refer to the following online manuals

Interactive sessions run on p570 (16CPUs).  You can use at most 16 CPUs in interactive sessions. Larger resource is available only through batch jobs.

Tips


5.1  C/C++ Compilers

You must choose an adequate C/C++ compiler depending on the language standard and parallelizing approach you are going to use.

command language standard
description file name convention (extension)
xlc ANSI-C89- compliant sequential  *****.c
cc pre-ANSI-C89 sequential  *****.c
xlC C++ sequential  *****.C / *****.cc
xlc_r ANSI-C89- compliant OpenMP/automatic parallelization  *****.c
cc_r pre-ANSI-C89 OpenMP/automatic parallelization  *****.c
xlC_r C++ OpenMP/automatic parallelization  *****.C / *****.cc
mpcc pre-ANSI-C89 MPI  *****.c
mpCC C++ MPI  *****.C / *****.cc
mpcc_r Pre-ANSI-C89 hybrid (MPI + OpenMP/auto-parallel)  *****.c
mpCC_r C++ hybrid (MPI + OpenMP/auto-parallel)  *****.C / *****.cc

Why do you have so many compiler commands?

Some other compilers are also available, such as: c99 for ISO/IEC 9899:1999 compliant C programs, or compilers supporting extended long double type.  Please refer to "Getting Started" online manual for more details.

File name convention

File name restrictions listed for C programs are mandatory.  You must rename your source file if it is not compliant to the required naming convention.  On the other hand, C++ programs can have .C, .cc, .cp, .cpp, .cxx, .c++ suffix. 

Default Optimization Level

If you do not specify any optimization, the C/C++ compilers give syntax check and debug the highest priority. To obtain a faster executable code, you must specify an adequate optimization level.

In some optimization levels, however, the compilers may not preserve the original execution order of operations and may produce some undesirable side effects. Users should be aware of such aspects of compiler optimization, and should be careful about computation accuracy.  Please read Section 5.4 for more details.


5.2  Basic Operations

This subsection gives a simple example on how to compile and execute an ANSI C89 compliant C program. 

First, you compile and link a source program and create an executable code by xlc command. Suppose that you have a file whose name is "example.c".

kyu-cc% xlc example.c

If the compilation is successful, an executable code will be stored in a file "a.out".

To execute this code, you type the name of created file as a command. 
kyu-cc% ./a.out 

Note that "./" is appended in front of a.out. This means: "Execute the file located in the current working directory."

For security reasons, the default command search path on kyu-cc DOES NOT include "./". Without this, the shell is likely to complain as follows.
kyu-cc% a.out
a.out: Command not found.
Exit 1
kyu-cc%



5.3  Creating an Object File

By adding compiler option -c, you can create an object file without creating an executable code directly.
kyu-cc% xlc -c example.c
The file name extension for an object file is ".o ". In the above example, "example.o" will be created.

Such an object file is useful when you keep your well-debugged subroutines in a separate file while editing half-finished programs in another file. Suppose that you are now editing a main program in "main.f90" and you have your subroutines in "sub.f90".

First, you create an object file "sub.o" by compiling "sub.c" with -c option.
kyu-cc% xlc -c sub.c ↓

Then, you compile your main program and link it with this object file.
kyu-cc% xlc main.c sub.o ↓ 

In this way, you can save compilation time if you modify the main program over and over again, since only one compilation is involved for your subroutines.

To create a single executable code, you can process multiple source files and multiple object files at one shot.


5.4  Useful Compiler Options

The following table summarizes useful compiler options.

-c Create an object file instead of an executable file.  The output file is a ".o" file for each source file.
-o filename Store the output (executable or object) into the file specified by filename, instead of the default (*.o or a.out).
-lm Link mathematical functions in math library.   This option must be specified at the end of the command line.
-O Apply basic optimizations only.
-O3 Apply deeper optimizations such as changing the execution order of operations.  This may cause some side effects.
-O4 Apply further optimizations in addition to those caused by -O3.
-O5 Try the deepest optimizations.
-qstrict (With optimization option -O3, -O4, or -O5) Create an executable/object code which preserves the original execution order of operations specified in the source.

Most Recommended Optimization Option

In most cases, the following compiler options are expected to give you a sufficient performance improvement.  Note that these options will require a longer compilation time, and may cause side effects in the computation results.
kyu-cc% xlc -O3 -qarch=pwr5 -qtune=pwr5 main.c ↓
kyu-cc% ./a.out ↓


5.5  Measuring an Execution Time

You can measure the elapsed time and the CPU time by using timex (/usr/bin/timex) command.
kyu-cc% timex xlc example.c
real 4.01 …Elapsed time (sec.)
user 1.70 …User CPU time (sec.)
sys 0.80 …System CPU time (sec.)

kyu-cc% timex ./a.out ↓
real 10.30 …Elapsed time (sec.)
user 10.24 …User CPU time (sec.)
sys 0.03 …System CPU time (sec.)

Cautions when you measure the execution time of an MPI program


5.6  Linking Numerical/Graphics Libraries with C Programs

Generally speaking, compiler option "-l" (l is lowercase L) must be added when you link a numerical/graphics library with your C program.   These "-l" options must be specified after all other compiler options.  This is because "-l" options are not used by the compiler itself, but they are just passed to ld command invoked by the compiler.
kyu-cc% xlc main.c -lessl 

To use IMSL C Library, however, a slightly different style is used.
kyu-cc% xlc $CFLAGS main.c $LINK_CNL

Currently available libraries are shown in the table below, with their compiler options.

library name
options
ESSL
-lessl (for the sequential version) *1
-lesslsmp (for the thread-parallel version) *2
IMSL Fortran Library $CFLAGS (for compilation) / $LINK_CNL (for link-editing) *3

*1
The sequential version of ESSL is thread-safe, that is, each library function can be called from a parallel execution part of an OpenMP or automatically parallelized program, as well as a sequential program.  In parallel programs, each thread can execute the function independently without destroying each other's variables.

*2
The thread-parallel version of ESSL provides some thread-parallel functions.  Such a function itself creates multiple threads and runs in parallel.  They can be called from a sequential program, or a sequential part of an OpenMP/auto-parallel program.  In this case, your source program must be compiled by a compiler command having "_r" suffix such as "xlc_r".

*3
To use IMSL C (Fortran) Library, these environment variables must be set properly.  Each user must execute a special shell script, cttsetup.csh.  The easiest way to do this automatically is to add the following one line into your .cshrc/.profile script.  It will run the shell script each time you log in or start a new shell process (window).
source /usr/appl/CTT6.0/ctt/bin/cttsetup.csh


5.7  Automatic Parallelization

Automatic parallelization can be applied to for loops having array operations only when the compiler can tell that a particular loop in question can be parallelized automatically.  Therefore, some C/C++ programs cannot be parallelized and cannot enjoy the performance improvement. 

Compiler Options for Automatic Parallelization

Automatic parallelization is enabled with the following compiler option.

-qsmp=auto Tell the compiler to perform automatic parallelization.

Environment Variable for Execution

Environment variable OMP_NUM_THREADS declares the number of threads to be invoked in parallel.

Warning:
The current charging system for
a parallel program is based on the total CPU time of the program.  This means that most parallel programs takes more money than a sequential version.  You must carefully consider the tradeoff between the increased cost and the improved response.

Example

This example compiles "test.c" with automatic parallelization enabled.  The compile command also requests the recommended level of optimizations.  Then it declares the number of parallel threads as 4, and executes the program.

kyu-cc% xlc_r -O3 -qarch=pwr5 -qtune=pwr5 -qsmp=auto test.c
…Compilation with automatic parallelization enabled

kyu-cc% setenv OMP_NUM_THREADS 4
…Setting up the number of threads as 4

kyu-cc% ./a.out
…Executing


5.8  Executing an OpenMP Program

Compiler Options for an OpenMP Program

The following compiler option is necessary for compiling an OpenMP source program and create a parallel executable code.

-qsmp=omp Tell the compiler to  create a parallel executable code from the OpenMP source.

Environment Variable for Execution

Before execution, the number of parallel threads must be declared by an environment variable OMP_NUM_THREADS.

Warning:
The current charging system for
a parallel program is based on the total CPU time of the program.  This means that most parallel programs takes more money than a sequential version.  You must carefully consider the tradeoff between the increased cost and the improved response.

Example

An OpenMP program "test.c" is compiled with the recommended optimizations, then it is executed with 6 threads.

kyu-cc% xlc_r -O3 -qarch=pwr5 -qtune=pwr5 -qsmp=omp test.c
…Compiling an OpenMP source program.

kyu-cc% setenv OMP_NUM_THREADS 6
…Setting up the number of threads as 6

kyu-cc% ./a.out &8595;
…Executing


5.9  Execution an MPI Program

Special Command for Executing an MPI Program

To compile an MPI program, mpcc/mpCC commands are used.  Individual compiler names for language standards can be found in 5.1 C/C++ Compilers

The number of MPI processes (tasks) is specified by an execution option "-procs n" where n is the number of processes (the default is 1).

Example

An MPI program "test.c" is compiled with the recommended optimizations, and it is executed with 4 processes.

kyu-cc% mpcc -O3 -qarch=pwr5 -qtune=pwr5 test.c
…Compiling an MPI program
kyu-cc% ./a.out -procs 4
…Executing with 4 processes


[C&C Center Top]  [User's Guide Top]