Home Page > > Details

CSCI 2122,Help With Thread Pools Programming, c/c++ Programming Java Programming|Debug Matlab Programming

Lab 9: Multithreading and Thread Pools
CSCI 2122 - Fall 2020
1 Introduction
This lab is designed to introduce you to the basics of multithreading. By the end of this lab, you will be expected to
understand how to create threads, handle critical sections using semaphores, and create thread pools.
In this lab you are expected to perform the basics of cloning your Lab 9 repository from the GitLab course group.
A link to the course group can be found here and your repository can be found in the Lab9 subgroup. See the
Lab Technical Document for more information on using git. You will notice that your repository has a file in the
Lab9 directory named delete this file. Due to the limitations of GitLab, we are not able to push completely empty
directories. Before you push your work to your repository (which should be in the Lab9 directory anyway), make
sure to first use the git rm command to remove the extra file. If you do not, your pipeline could fail.
Be sure to read this entire document before starting!
1
2 Multithreading
Multithreading is a common practice in performance computing and is the primary way to get the most out of your
modern CPUs. As the size of components gets smaller and smaller, they are now very close to the smallest component
we can pass electricity through without causing interference with other components. Due to this limitation,
manufacturers are now (instead of making things smaller) putting multiple processing units on their CPUs, called
cores. These cores can act as independent processing units from the point of view of the operating system (and thus
your software), allowing you to create lightweight processes called threads, which can be processed in parallel.
Often you will hear about limitations in software design which keep CPUs from reaching their maximum potential on
a single process. Most software, and especially as you go back in time, tends to be single-threaded from the point of
view of the operating system, meaning very little of the software is capable of utilizing more than one core of a CPU
at a time, which is a leftover practice from times when CPUs weren’t capable of the kind of multiprocessing they’re
able to achieve now. Video games are a huge culprit, as many of the popular engines still run on a single ”catch
all” frame calculation cycle which acts as a single monolithic main loop for all of the game’s logic and graphics handling.
In this lab we will discuss some basic principles for handling multithreaded software, then code some simple examples
before moving on to more heavy-duty applications.
2.1 Heavyweight vs. Lightweight Processes
The first thing to understand is the difference between heavyweight and lightweight processes, which is a very straightforward
distinction: programs are heavyweight processes and threads are lightweight processes.
When it comes to heavyweight processes, the operating system treats them as distinct entities in the memory space.
They’re cloned from a previous heavyweight process, and then the cloned data is overwritten with the new program
so that it is distinct relative to the code which it was loaded from. Once the heavyweight process is creating by
the operating system, it has its own memory, it’s own address space, and its memory is protected so that no other
programs are able to access it without permission of the system or the process itself. These are the programs you see
if you right your Task Manager in Windows, or htop on Timberlea. This is also how every one of your C programs
starts when you execute it.
A lightweight process is a member of a heavyweight process. In general, they’re viewed through the lens of a thread.
Threads are sub-programs of a heavyweight program and do not have their own memory assigned by the operating
system. Instead, they share the existing memory already allocated to the heavyweight program they belong to. This
means that threads are able to see any global variables, functions, and definitions held within a program and are
able to communicate through those mechanisms freely, as long as the code designer has allowed that kind of interaction.
Even though threads are considered a part of a heavyweight process, they are still scheduled independently
to ensure maximum resource utilization. You will learn more about process scheduling in the Operating Systems class.
When a heavyweight program is allowed to run by the operating system, the system also decides which of the
program’s threads it will place on the CPU to execute. Threads will run freely until their subroutine is completed,
at which point their return value is stored and the thread is destroyed. Since it is up to the operating system which
threads are allowed to execute on a CPU at any given time, there’s no guarantee which order the threads will be
executed in, or for how long, which is important to note when we’re trying to work with shared memory later.
2.2 The POSIX Thread Library
When working with threads in this course, we will be using the POSIX Thread Library, which is normally referred
to as pthreads. The pthreads library is available on Timberlea and will supply you with all of the functionality you
need to take advantage of multiprocessing in C. You can view the man page for the pthreads library by entering man
pthreads into the terminal on Timberlea. You will also find additional man pages for all of the commands specific
to the pthreads library by entering them into the man program accordingly. Note that we will not be using any of
the advanced features of pthreads, mostly because it’s not necessary under most use cases, but also because we don’t
want to make the concept of threads too complicated.
2.3 Importing and Compiling POSIX Threads
In order to import the pthreads library into your code, there are a few things to consider. The first thing to understand
is that pthreads do not like to be compiled without also being linked. This means that you will not be able
to easily apply the -c option in your gcc commands in your Makefile. When you compile a C source file (.c) which
includes pthreads, you will have to do it directly. For this reason, in this lab, you will not be required to make any
object files (.o) for any source file which imports the pthreads library.
To import the pthreads library into your code, you will need to use #include . This will give you
access to all of the pthread functions outlined below. You will also need to include the library import option on your
gcc command, which is -lpthread. This should be the last thing attached to any gcc command which requires it to
ensure maximum compatibility with your files and other gcc options.
2.4 Creating a POSIX Thread
The basics of a pthread revolve around creating functions which your threads will execute to completion. These
functions need to be constructed in a very particular fashion before being passed to the pthread library to have the
thread created and executed.
2
In order to create and execute a pthread, you will need to use pthread create function. The pthread create
function takes four parameters, which are outlined as follows:
1. A pointer to a thread ID, represented by the pthread t data type.
2. A struct of attributes which you wish to modify, represented by the pthread attr t data type. For our
purposes, this will always be set to NULL.
3. A function pointer to a function which the thread will execute when it is created.
4. A void* to any data you wish to pass to the function held by this thread when it begins execution. In this lab,
we will create structs for this purpose.
When the pthread is created successfully, it is also immediately executed. The pthread itself is held internally, in
the systems created by the pthread library, and thus you do not have direct access to it. However, when your
pthread create function ends, it assigns an integer value to the provided pthread t, which can be used later to
designate which thread you would like to reference in further function calls to the pthread library. In practice, the
pthread t type is and int value, although its exact size is implementation-defined, and thus it is not appropriate to
use integer types directly. You can see an example of how to create a few simple pthreads here:
1 // Compile with : gcc --std = c18 fileName .c -lpthread
2 # include < stdio .h >
3 # include < pthread .h >
4
5 void * example ( void * args )
6 {
7 pthread_t me = pthread_self () ;
8 printf (" This is inside thread %ld .\n", me ) ;
9 }
10
11 int main ( int argc , char ** argv )
12 {
13 pthread_t thread ;
14
15 pthread_create (& thread , NULL , example , NULL );
16 pthread_create (& thread , NULL , example , NULL );
17 pthread_create (& thread , NULL , example , NULL );
18 pthread_create (& thread , NULL , example , NULL );
19 pthread_create (& thread , NULL , example , NULL );
20
21 return 0;
22 }
When this program executes, it will create five threads, each executing a single print statement which prints its
thread ID. Each thread in this case is being executed using a single pthread t variable, so each time a thread is executed
its ID is lost. For the purposes of the example, this is sufficient. In a real execution scenario, creating multiple
threads is better done with an array of pthread t values. Creating an array of pthread t values is no different than creating
any other array. It can be iterated through with pthread create calls to initialize and execute all of your threads.
You may notice that we used the pthread self() function. This function, when used inside a thread, will return you
the ID it has been assigned. This can be useful for determining which thread is which in situations where it may be
important to have their execution monitored or synchronized with other threads.
2.5 Passing Arguments to a POSIX Thread
To pass arguments to a pthread, you can convert any type of data into a void* and pass it into the pthread create
function as the final argument. This can be any data you choose, although we recommend using structs for anything
beyond a simple value, as they’re the easiest way to manage a variety of different data types that would normally be
associated with a single function.
An example of passing data to a pthread can be seen here:
1 # include < stdio .h >
2 # include < pthread .h >
3
4 typedef struct _Args
5 {
6 char * this ;
7 int that ;
8 float other ;
9 } Args ;
10
11 void * example ( void * args )
12 {
13 Args * arg = args ;
14 printf ("My arguments are : %s %d %f\n", arg -> this , arg - > that , arg -> other );
15 }
16
17 int main ( int argc , char ** argv )
18 {
19 pthread_t thread ;
20 Args arg ;
21 arg . this = " Hello !";
22 arg . that = 13;
23 arg . other = 815.0 f;
24 pthread_create (& thread , NULL , example , & arg );
25 return 0;
26 }
3
As you can see, if you have a firm understanding of how void pointers work (and you should by now, after all of the
lists and collections you’ve had to set up with void pointers!), it should be fairly easy to pass arguments into your
thread’s function during creation. We can simply cast the incoming void pointer to whatever struct type we’re using
and have access to all of the fields, assuming it was properly allocated. In the above example you may notice that I
did not manually allocate the Args struct. You can allocate it manually if you so desire, but I did not for this example.
When you run this code, you may notice that sometimes it doesn’t print anything at all. What’s going on? It turns
out that, by default, your program will not wait for the individual threads to finish. If the program creates a thread
and then exits too quickly, the thread may not have time to properly execute and will be cancelled by the operating
system when the heavyweight process ends. How can we stop that from happening?
2.6 Joining a POSIX Thread
To ensure your threads finish their execution, you can perform a pthread join on them. Joining a thread to your
program has two benefits. First, joining a thread stops the main program logic from continuing until the thread in
question stops. If you have multiple threads currently executing and you want them to be guaranteed to finish, you
can join each one in your code, one after the other. This can be done manually, with a series of individual lines, or
via a loop if you have to iterate through an array of thread IDs.
The second benefit of using a join is that you’re able to receive a return value from the function the thread is running.
You may have noticed in the previous code snippets that the example function has a very specific signature: it must
return a void*, and it must also accept a void* as a function parameter. We saw in the previous example that
we can pass a void* into the function via the pthread create function. In order to retrieve data from the function via
a return statement, we must do so with a pthread join function call.
A pthread join takes in two parameters:
1. A thread ID value, represented by the pthread t data type. Note that unlike pthread create, this is not a
pointer.
2. A void** value for holding the returning value after the join has completed.
The second parameter can be a little strange at first. The reason it is a void** and not a void* is because the
pthread join function has to be able to give you the pointer to the data inside it. If you only have it a void*, it
would only be able to affect the data the pointer is pointing to. What the void** allows the function to do is not just
change the data in the pointed memory location, but it can change the whole void* to a totally different pointer
location. While this seems complicated, all you really need to do is create a pointer for the data type you’d like to
store the returned value in, then pass in the address of that variable. You can see an example of a return value with
a join here:
1 # include < stdio .h >
2 # include < stdlib .h >
3 # include < pthread .h >
4
5 typedef struct _Args
6 {
7 char * this ;
8 int that ;
9 float other ;
10 } Args ;
11
12 void * example ( void * args )
13 {
14 Args * arg = args ;
15 printf ("My arguments are : %s %d %f\n", arg -> this , arg - > that , arg -> other );
16
17 int * value = malloc ( sizeof ( int ));
18 * value = 15;
19
20 return value ;
21 }
22
23 int main ( int argc , char ** argv )
24 {
25 pthread_t thread ;
26
27 Args arg ;
28 arg . this = " Hello !";
29 arg . that = 13;
30 arg . other = 815.0 f;
31
32 int * result = NULL ;
33
34 pthread_create (& thread , NULL , example , & arg );
35
36 pthread_join ( thread , ( void **) & result );
37
38 printf (" Returned Value = %d\n", * result );
39
40 return 0;
41 }
You will notice a few things in this code. First, we don’t allocate the result variable. This is not necessary, as the
thing being returned is being allocated. It’s important to allocate the data you plan on returning and storing it in
a pointer. Failure to do so could lead to your values being deallocated when your function ends. Always remember
4
that if you don’t allocate something yourself inside a function, C will automatically deallocate it when the function
ends. Normally this isn’t a problem because C will pass-by-copy, but there are situations where you can run into bad
copies. For example, if you try to create int value = 15 and then return value, C will inform you that you will lose
the value of 15 because int value is local to the function and we be freed automatically when the function ends.
You should also notice that we specifically have to convert the result’s address (being passed into the join function)
to a void** in order for this to work. If you don’t include that type cast, C will complain that the address types do
not match.
2.7 Returning Values without Join
While the above section says you can receive values from the thread by calling a pthread join and giving it a correct
double pointer to store the return value in, it’s also possible to return values in other (less clean) ways. Since you are
handing in a pointer to an argument struct, there’s nothing stopping you from creating a field in that struct which
is capable of storing an output value (or a value for determining ongoing status). Since you still have access to that
argument struct on the outside of the thread, having the thread update that struct while you periodically check it
from outside the thread could prove useful, easier, and more convenient (depending on the situation) than using a
join. Remember that using a join forces your code to stop and wait for the thread, and you may not necessarily want
to do that to see what’s happening inside!
2.8 Critical Sections and Race Conditions
Since your threads simply execute and are not directly controlled by you after they’re created, you can run into
problems with certain types of code where it’s possible for two threads to operate on the same piece of data simultaneously,
possibly creating instability in your data structures. Consider the following example:
You create an array list and decide that you want to add 100,000,000 integers values to it. Since it would take a
single thread a very long time to read in and add all of those values to the array list, you decide to create ten threads
to split the job up. That way each thread can add 10,000,000 values for you, and since they’re in parallel they should
take about 1/10 of the time.
However, because the operating system doesn’t know what the threads are doing and is likely to want to let every
thread have at least some execution time, it will let each thread run for 10 seconds. Your first thread starts running
(along with a few others) and everything seems fine, until it gets very close to the moment when your thread will be
moved off of the CPU to give another thread some time to execute. Your thread gets the value 27 and tries to add
it to the end of your array list. It manages to get the memory allocated, stores the 27 inside it, and just before it
manages to increase the size of your array list by 1, the operating system swaps it for another thread on the CPU.
That new thread then tries to add something to the array list, but because the first thread wasn’t able to increase the
size in time, this new thread adds something to the end of the array list, which it sees as the same index as the last
thread. It allocates new memory to the last index, overwriting the value 27 and leaking that memory (since we no
longer have a pointer to it) and then increments the size of the array. It eventually is switched out by the operating
system and the original thread is returned to running from the same place it left, where it increments the size and
moves on long nothing happened.
So from this situation, we’ve lost one of our values (27), and the array list thinks that it has one more element inside
it than it actually does, meaning the stability of the array list is now broken. It’s possible that this situation could
happen multiple times and you could end up with some serious errors down the road.
This is referred to as a race condition, where each thread is attempting to change some shared data before the
others are able to do so. The place where this fight for shared data control is referred to as a critical section in your
code, and it is important to protect your critical sections from the impact of multiple threads fighting for data control.
To avoid this problem, we will implement a type of code locking structure called a semaphore. A semaphore is a
simple piece of code which acts as a check-in or waiting area for your threads. In practice, a semaphore is a very
simple piece of code which acts as a number. The number is initially set equal to the number of threads you want
to allow access to your critical section. In this lab, that number will be 1, to ensure that the critical section is
entirely mutually exclusive, sometimes shorted to mutex. Mutually exclusive things, by definition, cannot happen
together, so when you see people talk about something mutually exclusive, in means that only one thing in the list
can happen at a time. In this case, threads in the critical section will be considered mutually exclusive (only one
thread can process at a time) if the semaphore is working correctly.
Every time a thread reaches a semaphore wait point, it checks to see if the semaphore is greater than 0. If it is, it
will automatically reduce the value of the semaphore by 1 and then proceed past the wait point. If the semaphore
is 0 or less, the thread will block. A block is what occurs when the operating system is waiting for some kind of
feedback from the user, but it can also be used to temporarily put a process to sleep. This forces a new thread to
be loaded while the previous thread waits for the semaphore to go back to positive. This can happen to multiple
threads, making them all stop and wait at the wait point.
When a thread enters a critical section, it is able to perform any calculations on the critical section it desires. When
it is done, it will move through a semaphore post point. A post point is where the thread lets the semaphore know
that it has completed its work inside the critical section and thus the next thread is free to move inside. When it
reaches the post point, it tells the semaphore to increase its value by 1. If this makes the semaphore positive, the
next thread will proceed past the wait point (decreasing the semaphore value by 1) and the process will repeat.
5
We can create a semaphore with the pthread library. This is done by importing the library (which is
included in the pthread library). This gives us access to the sem t data type, the sem init function, the sem wait
function, and the sem post function.
2.9 Using POSIX Semaphores
Similar to pthreads, semaphores are created using their own data type, sem t. These are best used in a global scope
(outside of any function) and can be declared below your includes and defines. Once a semaphore is created, you will
need to initialize it before you start creating threads. This can be done with the sem init function. This function
accepts three parameters:
1. A pointer to a sem t value. The value of a semaphore is assigned by the operating system, but is generally a
semaphore value plus a waiting queue.
2. An integer flag for determining whether or not this semaphore should be shared by sub-processes. Leave this
set to 0.
3. An integer for setting the initial semaphore value. In this lab, setting this to 1 should suffice.
Once a semaphore is initialized, it can be freely used in your code. Once you have identified a critical section, you can
place a sem wait function call before it. The sem wait function accepts a pointer to a sem t type, which determines
which semaphore the threads should be waiting in. If you have more than one critical section, you should also have
more than one semaphore, as each should be filtering threads into different blocks of code.
At the end of your critical section, you should include a sem post function call, which accepts a single pointer to a
sem t value. The pointer passed in should match the pointer passed into the original sem wait call. Don’t mix these
up, and if you have multiple semaphores nested together, make sure you are posting them in the correct order.
An example of a semaphore can be seen here:
1 # include < stdio .h >
2 # include < pthread .h >
3 # include < semaphore .h >
4 # include < unistd .h >
5
6 sem_t wait_here ;
7
8 void * example ( void * args )
9 {
10 sem_wait (& wait_here );
11
12 printf (" Sleeping for 2 seconds ...\ n");
13 sleep (2) ;
14 printf (" Woke up! Leaving the critical section .\n");
15
16 sem_post (& wait_here );
17 }
18
19 int main ( int argc , char ** argv )
20 {
21 sem_init (& wait_here , 0 , 1) ;
22
23 pthread_t threads [5];
24
25 for ( int i =0; i < 5; i ++)
26 pthread_create (& threads [i], NULL , example , NULL );
27
28 for ( int i =0; i < 5; i ++)
29 pthread_join ( threads [ i], NULL );
30
31 return 0;
32 }
When you run this program, you should find that a thread sleeps, and then wakes up, always in that order. Each
thread waits its turn to center the critical section and thus there should never be a mixing of sleeps or a mixing of
wakes. Every thread should sleep, then wake, and thus do it five times in sequence. If you comment out the sem wait
call, you might find a different behaviour.
3 Thread Pools
A thread pool is a specific type of program which allocates a specific number of threads to a given data task. Normally
thread pools are created to ensure that only a certain number of threads are created and running at any given
time. This is especially useful in shared resource systems (like Timberlea) or systems where stability is incredibly
important. It turns out that creating too many threads in rapid succession has the possibility of overwhelming any
system, and thus enforcing some restraint gives you the benefit of increasing the speed at which tasks are performed
without sacrificing system stability.
There are many ways to create thread pools, but we will perform a very simple pool where we create a queue of
Operations and as each thread finishes execution, we will dequeue another Operation and create a new thread in
place of the old one. This is a very simple model which still suffers from overhead of creating many threads, but
still provides us the ability to manage the number of concurrent threads very easily. Other types of thread pools
can be more efficient by never finishing execution of a thread while waiting for more tasks to be given. This has
the additional benefit of not having to constantly recreate threads at the cost of being more complicated to implement.
6
The thread pool requires a queue and an array. You will be given an array size for managing a certain number of
threads. You should never have more threads running than the given integer value. Since you have no direct means
of knowing whether or not a thread has completed processing, you will need to create an argument struct capable of
reporting when a thread is complete. Under normal circumstances you could join the thread immediately, but in the
case of a thread pool it would be inefficient to do so. Your goal is to loop through your currently running threads and
any time you find one which has completed processing, then you join it and retrieve its return value before dequeuing
the next Operation and creating a new thread. Every time a thread completes and its value is returned, you must
store the value in an array list to accumulate all of your data. When all of your Operations have completed, you will
return the array list.
Since the threads are not managed and the order of thread execution is outside of our control (controlled by the
operating system), the array list’s values will be in a somewhat random order. You will need to sort these values
and print them. You should already have the programs necessary to sort these values. We recommend looking back
through previous lab pipeline results to find a means by which you can sort the values in your array list.
7
4 Lab 9 Function Contracts
In this lab you will be responsible for fulfilling two lab contracts: the Threads contract and the Pool contract. Each
contract is designed to test you on some of the things you’ve learned throughout the instruction portion of this lab.
All contracts must be completed exactly as the requirements are laid out. Be very careful to thoroughly read the
contract instructions before proceeding. This does not, however, preclude you from writing more functions than you
are asked for. You may write as many additional functions as you wish in your C source files.
All contracts are designed to be submitted without a main function, but that does not mean you cannot write a main
function in order to test your code yourself. It may be more convenient for you to write a C source file with a main
function by itself and take advantage of the compiler’s ability to link files together by accepting multiple source files
as inputs. When you push your code to Gitlab, you don’t need to git add any of your extra main function source files.
For those of you who are concerned, when deciding which naming conventions you want to use in your code, favour
consistency in style, not dedication to a style that doesn’t work.
The contracts in this document are worth the following points values, for a total of 10.
Contract Points
Threads 3
Pool 7
Total 10
8
4.1 Threads
4.1.1 Problem
You will create three programs for testing various types of thread features.
4.1.2 Preconditions
You are required to write three programs for creating and testing threads:
1. threads: You will create a program which accepts an array and squares each value in the array using threads.
2. unsafe: You will create a program which attempts to increment and print a variable without the use of
semaphores.
3. safe: You will create a program which attempts to increment and print a variable by protecting your critical
section with semaphores.
Each program must include a relevant .c file, which should contain all of your function implementations, and a relevant
.h file, which should contain your structure definitions, any necessary typedefs, and all of your forward function
declarations. When you compile, you will need to include the source file in your command in order to ensure the
functions exist during the linking process. You may include any additional helper functions as you see fit. Since you
are dealing with pointers, you will need to check all of your pointers to ensure they are not null. Trying to perform
operations on null will lead to segmentation faults or other program crashes at run-time.
Details on threads and semaphores can be found in the Multithreading section of this document. The bool type
referenced in this contract is found in . You are expected to do basic error checking (such as checking
for null pointers and correct index boundaries).
Your threads program must include the following structs (typedef-ed appropriately):
Structure Name Fields Functionality
Args (typedef Args) int* arr A pointer to an array that you will recalculate the values of.
int start The starting index to perform calculations from.
int end The ending index to perform calculations to (non-inclusive).
Your threads program must include the following functions:
Requirement Conditions
Function void* fill(void*)
Input Parameters A void pointer to an Args struct.
Return Value A void pointer (to fit the thread function requirement). NULL in practice.
Notes This should be passed into the thread creation function. This should return NULL. When
executed, this function should iterate through the [start, end) in the provided array
and square each value before returning null.
Requirement Conditions
Function void fill memory(int*, int)
Input Parameters An int pointer to an integer array, and an integer representing the number of threads to make.
Return Value None.
Notes This function should create the correct number of threads N, create an Args struct for every
thread, then divide the array into N equal ranges and execute a thread on each of those ranges.
The provided integer array is guaranteed to have 10,000,000 values.
Your unsafe program must include the following structs (typedef-ed appropriately):
Structure Name Fields Functionality
Count (typedef Count) int counter A value for counting upward to 1000.

Contact Us - Email:99515681@qq.com    WeChat:codinghelp
Programming Assignment Help!