An Introduction to Parallel Computing with MPI in C

Manolis Katopis
6 min readDec 2, 2022

--

When I began studying parallel computing with MPI, I did not know what to expect. I felt that I was diving deeper into computer science. Admittedly, it was a subject that intrigued me for a long time, and because of that, it caused me to write my first article.

A short explanation of parallel computing

Despite having some difficulties, the fundamental logic is straightforward. The difficulties that one would face when designing a parallel algorithm, are the difficulty of trying to synchronize the parallel processes, the lack of an IDE that would help with the inevitable debugging, and the fact that each parallel algorithm is not compatible with every parallel machine.

Before going any further, let me explain what Parallel Computing is, and why it is needed.

There are two main computing disciplines.

1. Serial computing.

Processes are completed in serial order, and

2. Parallel computing.

Processes are divided between multiple processors and threads. therefore reducing the time needed for a larger problem to be completed and the computational costs.

A graphic example of Parallel and Serial computing

What is the MPI?

So, how is MPI implemented, and how does it work?

MPI is one of the most used interfaces for writing parallel programs for distributed memory systems. It utilizes many functions which send messages between the processes.

An MPI program is composed of processes that can communicate. Each of these has a unique ID and can learn its ID in runtime by using a specific function from the MPI interface. There are two main categories in MPI functions. The blocking functions block the processes that call them, and the non-blocking functions as the name suggest, allow the process to work uninterrupted. Both of them have their shortcomings. The blocking functions, inevitably need more time to complete their work. The non-blocking ones are faster in execution but there is the danger of deadlock and the programmer must always have it in mind when developing a parallel program. In this article, we are going to see what are the main blocking functions, how they work, and what they accomplish.

Additionally, MPI functions give the ability to each process to broadcast a message to many other processes, to some of the other processes, or only to a specific process. When communication takes place between two processes, this is called Point-to-Point Communication, and in other cases, it is called Collective Communication.

We are going to analyze Point-to-Point Communication in this article.

How does the communication take place?

The communication is implemented with the use of built-in functions of the MPI interface, specifically with the use of the “communicator”. A communicator owns several processes that can send messages to each other. To communicate two processes, they need each other’s rank, a tag that will be analyzed in a few paragraphs, and to be owned by the same communicator.

Not exactly the communicator that we are talking about

How does code written with MPI is compiled and executed?

After installing mpich in your Linux OS, in the terminal, give the command

mpicc -o nameOfTheExecutableCode nameOfTheSourceFile.c

to compile the program that you have written, by the standards of the MPI interface.

The bash command to run the compiled program is

mpirun -np (numberOfProcessesToBeInitialized) ./nameOfProgramToExecute

The “o” argument in the mpicc command, is used to create an executable file.

The “np” argument in the mpirun command, is used to determine the number of processes that will be initialized in the execution of the program.

What are the MPI dataTypes and how are they used?

The basic MPI_DataTypes are essentially the same as the basic dataTypes of C, just with slightly different names. The basics of the MPI_DataTypes are shown below:

There are also a few special MPI_DataTypes that are structs:

MPI_Comm: It is the communicator struct. Processes must have the same communicator, to communicate.

MPI_Status: It is a struct that contains information relevant to the status of the MPI calls.

MPI_DataType: It is a struct that takes care of the MPI data types.

But what are the MPI functions?

The MPI interface utilizes a vast number of functions that make communication between the processes easier. Below we will see and analyze the basic blocking functions that are used to create a parallel program.

 MPI_Init(&argc, &argv);

Implements the MPI functions. No MPI functions should be called before this function.

MPI_Finalize();

Finalizes the MPI functions. No MPI functions should be called after this function.

 MPI_Comm_size(MPI_COMM_WORLD, &size);

Gets information about the number of processes that are running in runtime, and stores that number in the “size” argument.

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

Any process can learn its rank. The rank of every process is stored in the “rank” argument and is unique for every process. It refers to the number of processes that run within a specific communicator, and they are numbered with integers, starting from 0 to n-1, with n being the total number of processes.

MPI_Get_processor_name(char *name, int *resultlen);

This command provides information about the name of the processor that the processes run. It is rarely useful to know which processor the program uses. In the “name” argument the name of the processor is stored, and in the “resultlen” argument the length of the process is stored.

int MPI_Send(void *buffer, int count, MPI_Datatype datatype, int destination, int tag, MPI_Comm communicator);

With this function, messages can be sent by the processes. Until the message is sent, the sender processes stops running and starts running after the message is sent.

This function has a lot of arguments, their usage is the one stated below:

1. buffer: The starting address of the buffer, where the message to be sent is positioned.

2. count: The number of the elements in the buffer (Not the size of the message in the buffer in bytes).

3. DataType: The type of data that is sent.

4. Destination: The rank of the receiver process.

5. Tag: Defines the message type and makes it unique. Helps the receiver and the sender to know that the expected message has been received.

6. Communicator: The communicator that owns the receiver process.

 int MPI_Recv(void *buffer, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status); 

With this function, messages can be received by the processes. Until the message is received, the receiver process stops running and starts running once more after the message is received.

1. buffer: The starting address of the buffer, where the message to be sent is positioned.

2. count: The number of the elements in the buffer (Not the size of the message in the buffer in bytes).

3. dataType: The type of data that is sent.

4. source: The rank of the source process.

5. tag: Defines the message type and makes it unique. Helps the receiver and the sender to know that receive the expected message.

6. communicator: The communicator that owns the sender process.

7. Status: Returns information relevant to the result of receive.

The aforementioned were the basic nuts and bolts of MPI. Of course, MPI gives a lot more options for communication between processes. In this short article, we only scratched the surface of the subject. I hope that I helped you learn something more about MPI.

I was assisted in my studying, by:

https://mpitutorial.com/

A very descriptive tutorial is authored by Wes Kendall. There is also a GitHub repository with many examples.

https://www.open-mpi.org/doc/

The MPI documentation.

An introduction to Parallel Programming, by Peter Pacheco.

A textbook that dives deeper into the algorithmic part of parallel computing.

--

--

Manolis Katopis
Manolis Katopis

Written by Manolis Katopis

A Software Engineer, who is interested in computer science, math and philosophy.

No responses yet