The C/C++ compiler family on kyu-cc is IBM XL C/C++ Enterprise Edition V7.0 for AIX.
For more details, please refer to the following online manuals.
Interactive sessions run on p570 (16CPUs). You can use at most 16 CPUs in interactive sessions. Larger resource is available only through batch jobs.
You must choose an adequate C/C++ compiler depending on the language standard and parallelizing approach you are going to use.
| command | language standard |
description | file name convention (extension) |
|---|---|---|---|
xlc |
ANSI-C89- compliant | sequential | *****.c |
cc |
pre-ANSI-C89 | sequential | *****.c |
xlC |
C++ | sequential | *****.C / *****.cc |
xlc_r |
ANSI-C89- compliant | OpenMP/automatic parallelization | *****.c |
cc_r |
pre-ANSI-C89 | OpenMP/automatic parallelization | *****.c |
xlC_r |
C++ | OpenMP/automatic parallelization | *****.C / *****.cc |
mpcc |
pre-ANSI-C89 | MPI | *****.c |
mpCC |
C++ | MPI | *****.C / *****.cc |
mpcc_r |
Pre-ANSI-C89 | hybrid (MPI + OpenMP/auto-parallel) | *****.c |
mpCC_r |
C++ | hybrid (MPI + OpenMP/auto-parallel) | *****.C / *****.cc |
Some other compilers are also available, such as: c99 for
ISO/IEC 9899:1999 compliant C programs, or compilers supporting
extended long double type. Please refer to "Getting
Started" online manual for more details.
File name restrictions listed for C programs are mandatory.
You must
rename your source file if it is not compliant to the required naming
convention. On the other hand, C++ programs can have .C, .cc,
.cp, .cpp, .cxx, .c++ suffix.
If you do not specify any optimization, the C/C++ compilers give syntax check and debug the highest priority. To obtain a faster executable code, you must specify an adequate optimization level.
In some optimization levels, however, the compilers may not preserve
the
original execution order of operations
and may produce some undesirable side effects. Users should be aware of
such aspects of compiler optimization, and
should be careful about computation accuracy. Please read Section 5.4 for more details.
This subsection gives a simple example on how to compile and execute
an ANSI C89 compliant C program.
xlc
command.
Suppose that you have a file whose name is "example.c".
kyu-cc% xlc example.c ↓
|
If the compilation is
successful, an executable code will be stored in a file "a.out".
To execute this code, you type the name of created file as a command.
kyu-cc% ./a.out ↓
|
./" is appended in front of a.out.
This means: "Execute the file located in the current working directory."For security reasons, the default command search path on kyu-cc DOES NOT
include "./". Without this, the shell is likely to
complain
as follows.
kyu-cc% a.out ↓ |
By adding compiler option -c,
you can create an
object
file without creating an
executable code directly.
kyu-cc% xlc -c example.c ↓ |
.o ".
In the above example, "example.o" will be created.
Such an object file is useful when you keep your well-debugged
subroutines in a separate file while editing half-finished programs in
another file.
Suppose that you are now editing a main program in
"main.f90" and you have your subroutines in
"sub.f90".
First, you create an object file "sub.o" by compiling
"sub.c" with
-c option.
kyu-cc% xlc -c sub.c ↓
|
Then, you compile your main program and link it with this object file.
kyu-cc% xlc main.c sub.o ↓ |
In this way, you can save compilation time if you modify the main program over and over again, since only one compilation is involved for your subroutines.
To create a single executable code, you can process multiple source files and multiple object files at one shot.
The following table summarizes useful compiler options.
| -c | Create an object file instead of an executable file.
The output file is a ".o" file for each source file. |
| -o filename | Store the output (executable or object) into the file
specified by filename, instead of the default (*.o or a.out). |
-lm |
Link mathematical functions in math library. This
option must be specified at the end of the command line. |
-O |
Apply basic optimizations only. |
-O3 |
Apply deeper optimizations such as changing the execution
order of operations. This may cause some side effects. |
-O4 |
Apply further optimizations in addition to those
caused by -O3. |
-O5 |
Try the deepest optimizations. |
-qstrict |
(With optimization option -O3, -O4, or -O5) Create an executable/object code which preserves the original execution order of operations specified in the source. |
In most cases, the following compiler options are expected to give you a sufficient performance improvement. Note that these options will require a longer compilation time, and may cause side effects in the computation results.
kyu-cc% xlc -O3 -qarch=pwr5 -qtune=pwr5 main.c ↓ |
You can measure the elapsed time and the CPU time by using timex (/usr/bin/timex)
command.
kyu-cc% timex xlc example.c ↓ |
Cautions when you measure the execution time of an MPI program
Generally speaking, compiler option "-l" (l is lowercase L) must be added when you link a numerical/graphics library with your C program. These "-l" options must be specified after all other compiler options. This is because "-l" options are not used by the compiler itself, but they are just passed to ld command invoked by the compiler.
kyu-cc% xlc main.c -lessl |
To use IMSL C Library, however, a slightly different style is
used.
kyu-cc% xlc $CFLAGS main.c $LINK_CNL
|
Currently available libraries are shown in the table below, with their compiler options.
| library name |
options |
||
|---|---|---|---|
| ESSL |
|
||
| IMSL Fortran Library | $CFLAGS
(for
compilation) / $LINK_CNL
(for link-editing) *3 |
*1
The sequential version of ESSL is thread-safe, that is, each library
function can be called from a parallel execution part of an OpenMP or
automatically parallelized program, as well as a sequential
program. In parallel programs, each thread can execute
the function independently without destroying each other's variables.
*2
The thread-parallel version of ESSL provides some
thread-parallel functions. Such a function itself creates
multiple threads and runs
in parallel. They can be called from a sequential
program, or a sequential part of an OpenMP/auto-parallel program.
In
this case, your source program must be compiled by a compiler command
having "_r" suffix such as "xlc_r".
*3
To use IMSL C (Fortran) Library, these environment variables must be
set
properly. Each user must execute a special shell
script, cttsetup.csh.
The easiest way
to do this automatically is to add the following one line into your .cshrc/.profile
script. It will run the shell script each time you log in or
start a new shell process (window).
source /usr/appl/CTT6.0/ctt/bin/cttsetup.csh |
Automatic parallelization can be applied to for loops having array operations only when the compiler can tell that a particular loop in question can be parallelized automatically. Therefore, some C/C++ programs cannot be parallelized and cannot enjoy the performance improvement.
Automatic parallelization is enabled with the following compiler option.
-qsmp=auto |
Tell the compiler to perform automatic parallelization. |
Environment variable OMP_NUM_THREADS
declares the number of threads to be invoked in parallel.
Warning:
The current charging system for a parallel program is based on the total CPU time of the program. This means that most parallel programs takes more money than a sequential version. You must carefully consider the tradeoff between the increased cost and the improved response.
This example compiles "test.c"
with automatic parallelization enabled. The compile command also
requests the recommended level of optimizations. Then it declares
the number of parallel threads as 4, and executes the program.
kyu-cc% xlc_r -O3 -qarch=pwr5 -qtune=pwr5 -qsmp=auto test.c ↓ |
The following compiler option is necessary for compiling an OpenMP
source program and create a parallel executable code.
-qsmp=omp |
Tell the compiler to create a parallel executable code
from the OpenMP source. |
Before execution, the number of parallel threads must be declared by
an environment variable OMP_NUM_THREADS.
Warning:
The current charging system for a parallel program is based on the total CPU time of the program. This means that most parallel programs takes more money than a sequential version. You must carefully consider the tradeoff between the increased cost and the improved response.
An OpenMP program "test.c" is compiled with the
recommended optimizations, then it is executed with 6 threads.
kyu-cc% xlc_r -O3 -qarch=pwr5 -qtune=pwr5 -qsmp=omp test.c ↓ |
To compile an MPI program, mpcc/mpCC
commands are used. Individual compiler names for language
standards can be found in 5.1
C/C++ Compilers.
The number of MPI processes (tasks) is specified by an execution
option "-procs
n" where n is the number of processes (the
default is 1).
An MPI program "test.c"
is compiled with the
recommended optimizations, and it is executed with 4 processes.
kyu-cc% mpcc -O3 -qarch=pwr5 -qtune=pwr5 test.c ↓ |