CS343: Concurrent and Parallel Programming

Hardest course in 3B. Taught by Peter Buhr and in uC++.

Course website: https://student.cs.uwaterloo.ca/~cs343/F24/index.shtml

Concepts

Control Flow
Exception Handling
Coroutine
Task
Mutual Exclusion
Busy Waiting
Barging
Lock
Dekker’s Algorithm
Racing
N-Thread Mutual Exclusion
Skipped two: Knuth/De Bruijn && Eisenberg and McGuire
Bakery Algorithm
Arbiter
Owner Lock
Synchronization Lock
Condition Lock
Barrier
await
_Accept
Monitor (Synchronization)
External Scheduling
Internal Scheduling
I skipped 9.5 External and Internal Scheduling : combining both (note)
uCondition
Deadlock
Future
Java Monitor
Lock-Free
ABA problem
GPU
Go

Final

Dec. 14 (Sat.) 2024 @ 9:00-11:30 am in STC 0010, 0020

Note, every instructor is unique emphasizing different points and teaching at different speeds. As a result, some previous exams may have questions about material not covered in the current term and/or missing topics relevant for this term’s exam.

The majority of the final exam covers the material in the course notes from Section 6.3.4 to Section 11.6, inclusive, the section in the course textbook on automatic/implicit-signal monitors, and assignments 4-6.

Goal: is 85%

Want to do well in this course because it’s interesting. Ended up getting 75% on the final. Good enough.

Prep: Final Study CS343, Long Answers CS343

F22
W19
F18
W23 (long answers)

Thoughts on final and class:

Long answer is literally the rafter problem from previous finals
Same format Peter always uses
- CAS: write pseudocode in C
- Instead of banker’s algorithm he asked the other one
- Semaphore for long answer: external, internal and automatic scheduling
I really enjoyed this course and going to lectures listening to Peter go on about concurrent programming. Amazing course that we have to take, I recommend taking it! 10/10 course. I enjoyed doing the assignments (for the most part) even if they were long asf. Just make sure to start them in advance so it wasn’t that bad looking back.
Tips:
- Read his textbook whenever you don’t understand the course notes (go to the relevant sections)
- I started reading the textbook for the first half of the course, but as the term went on and we had more and more assignments from other classes, I stopped reading it (only the parts that confused me)
- Start assignments early!
- Find a group of friends, so you can learn together (for the assignments)
- Do practice exams from previous years, it’s the same format every year. Practice, practice and practice

Midterm

Oct 31 (Thu.) 2024 @ 19:00 - 21:00 in BMH1689 / MC4064

The midterm covers the material in the course notes from Section 1 to Section 6.3.3, inclusive, and Assignments 1-3

Our midterm: https://student.cs.uwaterloo.ca/~cs343/F24/exams.shtml

$\to$ Midterm CS343

Assignments

To download files for an assignment use the following command, where D is an assignment number 1-6:

wget -m -np -nd -R 'index.html*' -P D https://student.cs.uwaterloo.ca/~cs343/assignments/D/

Lecture 1

Basically, Peter just randomly started roasting us.

Advanced Control Flow (Review)

while and for are interchangeable
make sure to give yourself eye-candy by outdenting the breaks within a loop
A loop exit NEVER needs an else clause.
Flag variables are the variable equivalent to a goto because they can be set/reset/tested at arbitrary locations in a program.

Multi-exit loop

Multi-exit loop (or mid-test loop) has one or more exit locations occurring within the body of the loop, not just top (while) or bottom (do-while).

Static multi-level exit

Static multi-level exit exits multiple control structures where exit point is known at compile time.
Labelled exit (break/continue) provides this capability.

Note: in Java and micro C++, label breaks exist!

Why is it good practice to label all exists?

Avoid making mistakes

Avoid all the flag variables with multi-level exit!

BUT occasionally a flag variable is necessary!
- When you want to retain a variable from another block
Other uses of multi-level exit to remove duplicate code.

Normal and labelled break are a goto with limitations.

Cannot loop (only forward branch) $\Rightarrow$ only loop constructs branch back.
Cannot branch into a control structure

Only use goto to perform static multi-level exit, e.g., simulate labelled break and continue. So basically, breaks are a more refined/controlled goto statement.

Understand goto

You need to understand the importance of goto’s. We have a bunch of goto in our program. More refined ones. Need to understand for example in a for loop or while, where the goto are.

Dynamic Memory Allocation

Do not use dynamic memory allocation if it can be done on the stack!

Dynamic memory allocations are for loosers.

Situations we need to use the stack:

When storage must outlive the block in which it is allocated (ownership change).
When the amount of data read is unknown.
When an array of objects must be initialized via the object’s constructor and each element has a different value.
When large local variables are allocated on a small stack.

Lecture 2

A catch is a sequel.

Lecture 3

Thu Sep 12

p.21 of notes:

Declare object before calling open(). It also separates normal code and exception handling.
Eiffel: contract programming
retry: only ends when your contract is fulfilled. So it will continue trying. Magically goes to try after retry block.
This is easy to simulate: trivial code which is demonstrate on the right

Resumption

A resumption handler is a corrective action so a computation can continue.
stack is not unwound! since dynamic return.

we choose how things are to be fixup by calling f using different parameters
Problem: need to pass down the fixup routines
But with resumption (left): find someone to handle it, then control returns after resume E(). Order n operation to find the call routine and order 1 to resume. Stack is not unwound in this case.

Exceptional Example:

resumption: goes back to B6 resume() if it were instead of throw
retry: Eiffel
terminate

Coroutine

A coroutine is a routine that can also be suspended at some point and resumed from that point when control returns.
State of coroutine:
- execution location
- execution state
- execution status - active or inactive or terminated
Coroutine doesn’t have to start from the top.

coroutine: fancy sequential program. Not concurrency!

two different approaches for coroutine:

a semi-coroutine
a full coroutine

Let’s understand by example:

Semi-Coroutine

_Coroutine_: new type in micro C++. It has a main()
no execution state

I will trash Python - Peter Bhur

In a coroutine, the program starts in main().
only want resume and suspend to access in the coroutine
All coroutines inherit from base type uBaseCoroutine

suspend and resume doesn’t change the stack: they context switch instead

You need to find zen.

Read until Nonlocal Exceptions to do assignment

TODO: write some notes for Coroutine

Nov 7, 2024

Why cannot use external scheduling??

Using internal scheduling: (if you have uCondition bench == internal scheduling)

signal: delayed/defered
girl sits down: now boy wakes up.
boys wake up at boys[ccode].wait() in the if clause (wakes up exactly where he sat down)
gets phone number and puts it in the uCondition boys[CCodes]
boy signals the chair
boy returns the girls phone number

Apparently we don’t need a chair $\to$ use signalBlock

replace signal to signalBlock and remove all the exchange.

8.5 Readers/Writer

dont need to check if its empty before calling signal. (does it for you)
uCondition tells us if someone is on bench (using .empty), so we dont need to have variables that counts compared to semaphore solution

just put it in a routine

everything is put inside the lock
since its nomutex, multiple threads can run the nomutex function at the same time

doenst make sense for multiple threads in nomutex write
we further simplify the monitor

μC++ uCondLock and uSemaphore also support shadow queues with type uintptr_t data.

2 counters
if reader arrives and there is a writer in the room (wcnt > 0). only thing that makes reader happy is for the writer to leave. so shut all the doors except the one that allows the writer to leave.
Why has the order of the member routines changed?
- definition before use

micro C++ throws a non-local exception to the person sitting on the chair to notify in the cooperation failed

8.7 Nested Monitor Calls

Nested monitor problem: acquire monitor (lock) M1, call to monitor M2, and wait on condition in M2.
Monitor M2’s mutex lock is released by wait, but monitor M1’s monitor lock is NOT released ⇒ potential deadlock.
Releasing all locks can inadvertently release a lock, e.g., incorrectly release M0 before M1.
Same problem occurs with locks.
Called lock composition problem.

monitor lock in endRead() and another in read()
NOT TESTED ON - as well as the semaphore version

Nov12

Intrusive Lists → use it for assignment 5 and 6 (not necessarily need to)

8.9 Counting Semaphore, V, P vs. Condition, Signal, Wait

Difference between counting semaphore and Monitor

multiple Vs may start multiple tasks simultaneously, while multiple signals only start one task at a time because each task must exit serially through the monitor

Can this simulation be reduced?
- Yes, use external scheduling (remove 2 to 3 lines)

8.10 Monitor Types

Monitor: We do some work and let the compiler do some work.

Explicit scheduling
- Done undercover by the monitor
Implicit scheduling

We write out explicit scheduling:

Monitor kicks in when it becomes empty, try to get more work (wait/exit)
We take care of who comes in, and signals

Theoretical Monitor:

It’s ready to rock n roll
Assigning different relative priorities to these queues creates different monitors (e.g., C < W < S).

Starvation for case 7-13
- things that got signalled starves if there are constantly new callers coming in
- All unsound
Case 5-6
- If you do signal and flips a coin, it can either be signalled task or the task that signals comes in
- Arbitrary selection
Case 3-4
- Flip a coin; either Call or Signalled
- Could need Barging Avoidance, can have starvation without avoidance
- Java monitors
Case 1-2
- Has barging prevention, no barging

Implicit Signal

Write boundedbuffer using a Monitor

Conditional critical region (in insert())
Not busy waiting
Use waitUntil for assignment 5
- simulate: build order n queue and service it
- 3 solutions for building a waitUntil monitor. 2 of them are based on signal or signalblock (with barging, but still can use it). 3rd solution has no barging
- waitUntil macro and leave macro to: build put you to sleep
- READ TEXTBOOK

first four comes from the cases
Last row comes from the BoundedBuffer Monitor
Priority
- prioritize things inside the monitor
No priority
- bargers can barge in (i.e., ahead of Ws)
2nd row is the 80% case / correct row
No idea which column is the correct. Still debating jesus

Coroutine Monitor

can use resume(), suspend(), condition variables (wait(), signal(), signalBlock()) or _Accept on mutex members
coroutine can now be used by multiple threads, e.g., coroutine print-formatter accessed by multiple threads.

8.11 Java Monitor

Java has _Mutex members (they used synchronized for mutual exclusion)
Only 1 condition queue
Has barging

Bounded Buffer:

Barrier with java monitor:

Don’t get on airbus 380

Fix by :

Instead of counting down, we oscillate between 0 and 1
Still wrong because of spurious wakeup!!!

Fixed using tickets (for while condition)

nested monitor problem. Still doesn’ twork

Nov 14, 2024

Last row, reject? is not safe

The two lines in insert and remove that are red should be blue. Typo
Monitor version of doing external scheduling
We want to remove the blue, and push it (synchronization and mutual exclusion) in the main.
_When provides synchronization powers
Go randomly picks one of insert or remove if both _When is true.
In miro c++ we have the power to determine which one to prioritize (which bench to look at)
- For example, here we look at insert bench first, then remove bench
No starvation can occur, because we cannot insert more when we have more than 20. So we have prevention
Once Accept One Call to satisfy your cooperation.
Do i need to have for loop in a5????
Different from monitor, when we create a Task, it is automatically running, all the doors re closed. Contrast to monitor, nothing is in, all doors are open
_Accept needs to have a stack, and signalled has a stack

allows to do try-accept
- dont use: it’s busy-waiting

reduce the time producer/consumer

Internal Scheduling

NOt a rendezvous failure: possibility that the rendezvous happens in the future. We are postponing it to the future. empty.wait() didnt leave, it just sat down in the chair.

push all the rest of the code down into main
Everything in insert/remove is to make synchronization work
Takeaway, you can push everything in main.
Code doesn’t work: logic is write, but technical error because peter used signals.

Your full time person is just sitting there! Must use signalBlock

Nothing in stop(), we just use it to break out in the _Accept
We put it at the first _Accept because we want it to stop right away, but we can control it by putting after other _Accept too.

doesn’t disappear: it’s a monitor (when it runs out the end)
we can just accept the destructor! → segments fault
- cannot return to the object when it’s already destructed
- make it work like a signal, we don’t put it on the chair

We can do everything on a5.

Goes all the way to 145:

7. Concurrent Errors

Race Condition

no laws → chaos

No Progress

Livelock

Massive amount of CPU used → they are all looking at their right

Starvation:

Every time it looks to the right, there is still someone there.
The guy is spinning, looking to its right

Big amount of CPU usage: either livelock or starvation

Deadlock: No CPU used.

Two kinds:

Synchronization
- signal before a wait
Mutual Exclusion
- All four drivers drives into the intersection at the same time.
- They need to come in at the same time

There are 5 conditions that must occur for a set of processes to deadlock.

A concrete shared-resource requiring mutual exclusion, i.e., exists without a task.
1. A task “wanting to drive across the intersection” is not a resource.
A process holds a resource while waiting for access to a resource held by another process (hold and wait).
Once a process has gained access to a resource, the runtime system cannot get it back (no preemption).
There exists a circular wait of processes on resources.
These conditions must occur simultaneously.

Tuesday Nov 19, 2024

Simple example using semaphores:

7.3 Deadlock Prevention

Deadlock can be prevented by eliminating one of the 5 conditions:

no mutual exclusion
- silly
no hold & wait: do not give any resource, unless all resources can be given
- acquire both locks technically
allow prevention
no circular wait: by controlling order of resource allocations
- ordered acquiring the locks → so no cycle
- ordered resource policy
  - useful in small embedded system, when there’s limited amount of resources that you can order
prevent simultaneous occurence:
- Show previous 4 rules cannot occur simultaneously

Very high level overview of deadlock prevention!!

7.4 Deadlock Avoidance

Monitor all lock blocking and resource allocation to detect any potential formation of deadlock.

Achieve better resource utilization, but additional overhead to avoid deadlock

Banker’s Algorithm:

Each Tasks have figured out worst case. which doesn’t make sense how?
T1 is trying to get another R1, to update 2 to a 3. So we ask oracle and runs the banker’s algorithm to check if its safe or not →deadlocK?

Allocation Graphs

oracle maintains this graph of resources.
task come and go in the graph
oracle cause T3 to block, red line , then check for cycle

Use Rick’s algorithm: graph reduction to locate deadlocks

pretend to run the task, by checking for only in arrows

Two algorithm that avoids deadlock!: Banker’s and Rick’s algorithm.

7.5 Detection and Recovery

Let it happen (deadlock), then try to recover from it
Build the graph every T seconds instead.
Decision is hard to make, how do we prevent?
We can restart a task: but it has already changed states.

Which Method To Chose?

Mac, windows do nothing!

Does go detects deadlock? Yes

Go to page 173 page:

9.3 Increasing Concurrency

171: For voteTask

move the work in the accept into the workers → Administrator

Administrator:

something that doesn’t do the real work. does management
accepts calls and returns values
It never calls anybody
Special kind of servers, never calls out, only accepts calls in
A6

Thu Nov 21, 2024

notifier worker: waits for a press down keyboard event. When it receives, it calls in to the administrator
timer: interacts with administrator
workers: calls in with result of last computation, and ask more work from administrator. if no work, it sits down on a bench in the admin
Trick: feels like there is two calls → but administrator cannot make calls.
- worker calls in, no work, so it sits down and wait(). this call is still active (in the middle of a call) → will be woken up
Return the results through the arguments, and work is returned
Simple workers: only interacts with administrator
Complex workers: can also interact with client and others potentially
How can administrators communicate between each other?
- Hire a courrier
- Only interact between each other from a special worker called a courier.
use signalBlock()!

A6: 4 admin, a pool of couriers (between watcard office and the bank) Bank is an admin. if no money, wait until parent wakes up to give student money. when you have enough money, courier comes back . finesse first worker call (since no argument with result)

Client Side:

Make an asynchronous call from client to server, that way the caller doesn’t wait for call to complete. puts a buffer → producer will put something in the buffer, that way when the admin decides to accept the call and just gets the info from the buffer.

not start/wait: we already have a running thread. need to connect to the server. The server thread never ends, keeps running.
time between these two calls is the magic concurrency
if result is not ready when second call is made
- caller blocks
- caller has to call again (poll)

Tickets: protocols:

how to match client with the work it dropped off earlier?
- some kind of protocol to identify the client with the work
- think of dry cleaning
- second call to pull the result by passing the ticket 🎟️

Call-Back Routine:

Don’t be unreasonable, server cannot stop for you
Advantage to server: push the results to you
Each client gets to write their own call-back → flexibility

Futures:

Combine two step protocol down to one
the future is returned immediately and it is empty.
caller “believes” the call completed and continues execution with an empty result value.
future will eventually be filled in some future time

Sketch of a general future:

link field: allows server (gives out futures), we need to find that future later on.
- one technique is have a chain of futures so we can scan the list and find it
server creates the future, which clients get ahold of, so at the end, it’s the client’s job to delete it. transfer of ownership and deletion
theory: future should be ok and don’t need to worry about it

Use Future_ISM<T>!!!!

Client:

create array of 10 future integers
- each one can hold an integer, but initially empty
pass integer in and will compute some value : for the server → will eventually give me a result x10 → server has 10 workers doing this
get the result: f[i]() → we either wait for result or get it right away
- two calls? first call does the heavyweight. second call just gets it.
- Problem in c++: << operator has equal priority, so we don’t know which one will call first, which one does the heavy lifting.
f[3]=3: we want to change the future. BUT you cannot change it. because you can share the future with other threads, want to guarantee others can get the right value. Only read only, write once variable.
You can use reset()
cancel(): really weak. not safe dont use it
After the future result is retrieved, it can be retrieved again cheaply (no blocking
acquire the lock on the stream osacquire(cout), then we go to sleep with the lock waiting!! with f[i]()
you can have a future pointer! server will eventually give me back answer to the pointer. pointer points to the place where future returns the answer. no one can change this pointer. you can change the thing that it is pointing at

Server:

Work unit: data passed, future we give back to client → both part of the call f[i] = server.perform(i)
perform routine: create work unit and push onto a list (to remember). and send back the future. I have a copy and client also have a copy. They are reference counts. That is if someone deletes theirs, you still have a copy of it.
server’s worker: take a work off the list, compute it, calls a delivery to stick it onto the future.
- either in server or worker, you need to decide
You can stick an exception in the future!! must dynamically allocate a future!

Complex Future Access (client side):

_Select allows you to select a future and do the work
- _Select( f1 || f2 && f3 );
What we want to do is pull it out to its long form:
- we can put statements after each one of them and put _When clauses!

f1 becomes available, executes statement-1 and can continue do other work after.
f2 becomes available, executes statement-2, then blocks until f3 becomes available.
f3 becomes available, executes statement-3, it executes statement-3 and continue

Tues Nov 26 2024

Pull futures into their long form so we can add statements afterwards!

Difference between future selector and accept?

Accept only have or, future can have or and and.

In vendingMachine, you want the student to go to sleep

Highly unlikely to have a program with futures like this: just an example
Go have select but doesn’t work with futures but with channels (I think it’s what threads are called in go)
He’s just showing off with the select statements. On assignment we only need to do tiny select statements an a bit of futures

10. Optimization

What is the -O flag for compilation?

Compiler tries to shove everything into one file. All the work we do gets thrown in the garbage. But we only do it for ourselves.

General forms of optimizations (-O):

reordering: data and code are reordered to increase performance in certain contexts
eliding: removal of unnecessary data, data accesses and computation (inlining code)
replication: everything can bed duplicated

Need to be isomorphic → generate the same result every time!

10.1 Sequential Optimizations

Compiler asks if we can flip the two and it will get the same thing?
- First one yes, second one no, third one no, fourth one no

Can flip all of them

Elide code:

compiler will get rid of the unnecessary loop with no body
compiler will try to see if the function is tail recursive, and convert it to a loop? Got rid of recursion and replaced it to looping

We have a integer and floating computation unit. So they can be computed at the same time on the chip. Changing sequential to a micro parallel program, with small bits of parallelism.

Compiler will utilize the hardware’s environment to optimize your program.

10.2 Memory Hierarchy

Virtual memory → cache (what he’s talking about)
Paging: from disk to memory
Arithmetic can only be done in the registers
1000 times difference between disk and memory, and another 1000 times difference between memory and CPU
once x is in the register, we lazily let → principle of locality. let it sit in the cache because maybe: lazily pull and eagerly push

Cache Review

Solution: use hardware cache to stage data without pushing to memory

apparently if you just print the x, y, z, the addressees are in reverse order
pull x into cache, then load it to register (256 bytes are pulled in, just like paging, bringing more than you need, you bring in the whole page)

socket
each core has registers
Registers are NOT shared, but cache/memory/disk are SHARED.
Cache: instruction and data cache? might be split
- 3-4 levels of cache to get to registers
- ssd on your disk drive
- hierarchy just keeps getting bigger

Stick structure for x for sequential program in memory
madness, which one is the real x??

Tree structure
If T1 needs to push the value into L1 from register, it will flow thorugh all of them

If we did this eagerly, we stop system, run protocol to update the cache. Does it work? no
What actually happens is that the cache updates lazily
- We are going to see stale data

10.3 Concurrent Optimizations

Concurrent execution: weak memory ordering → can read stale data

Only reordering disjoint reads doesn’t cause any problems

Problems:

compiler is breaking concurrent program

Compiler f it up

Example: double-check locking

changing from sequential to concurrent version
What can happen, you can have a race, enter and all see that the ip is null. Compete to the lock, one wins. Double check → if it has been done, before.
- BUT, the blue line is wrong
- But the compiler: sees two writes, and says that we can flip it.

Thu Nov 28, 2024

Eliding:

Elide reads value by copying into register: load by replicating

Replication:

Both compiler and hardware are moving things around

10.4 Memory Model

Compiler knows what kind of machine your running on, and will make it work
Don’t need to know the subtle differences (even Peter doesn’t remember)
Our laptop is running TSO
Arm macs have RC → harder to do concurrency

10.5 Preventing Optimization Problems

Moving shared variables can cause race conditions!
We are trying to write race free programs in this course
Peter accepting defeat
Two approaches
- add hoc
- formal
volatile → need to store it back to memory
When to use volatile?
- If program doesn’t work and put in volatile and it works. Then your program needs volatile
C/C++ vs. Java volatile
- They are different.
- Java volatile is stronger, it prevents disjoint reordering
  - disadvantages: too conservative? assuming all programs are sequential consistency (SC)

Dekker for TSO: (doubt we need this)

do green and red or blue and red.
if you don’t do red, it still works, it won’t run as fast

11 Other Approaches

11.1 Atomic (Lock-Free) Data Structures

You can make struct, queues, basically data structures concurrent friendly (lock free).

11.1.1 Compare and Set Instruction

CAS : compare-and-set $\to$ does not prevent ABA problem.

read, and sometimes you will write. conditional write: new atomic instruction
we can use it (CAS) to build mutual exclusion
- if lock is open, set it to close

How to push a node onto the stack using CAS and pop:

use copy to change value into $t$

11.1.3 ABA Problem

An interview question!

Context-switch happened and B is gone/popped. Stack is corrupted!

What is the name of the problem that occurs if only the CAS instruction is used to build a lock-free stack?
- ABA problem

11.1.4 Hardware Fix

Put a ticket counter.

counter with header node
compiler trick so type matches

We only count the pushes
Atomically update the new top pointer and new ticket counter value.
Pop doesn’t have anything special
How does this help?
- we start with 3 on the stack, that is 3 pushes
- context-switch, since they pushed A back on to remove B, ticket value is not 3, it is 4 since it pushed! so it fails.
- what lock free people do, to ensure they don’t have ABA problem with their data structures

One other problem that can happen: we dereference pointer to ask about the next field. this dereference might already be corrupted. reaching out for a random storage. → Segment fault can happen

11.1.5 Hardware/Software Fix

Trevor brown → lock free data structure

Keep track of data structures that got removed. If we access pointers and check if it’s on the hazard list.

Building micro garbage collection data structure.

There are c++ lock free data structures.

Locks v.s lock-free

lock-free has no ownership (hold-and-wait) $\Rightarrow$ no deadlock

lock-free can only handle limited set of critical sections lock can protect arbitrarily complex critical section versus

lock-free no panacea, performance unclear

combine lock and lock free?

11.2 Exotic Atomic Instruction

VAX computer → used to be the best computer. Has a atomic instruction to insert and remove from a doubly linked list. Hardware people can help the lock-free people!

Load link, and Store conditional! → someone does the work for you and watches the memory

If we go back to ABA problem. It fixes it.

In theory, we want multiple watchpoints to avoid ABA problem.

Intel came up with Software Transactional Memory → really slow no one uses it.

Tue Dec 3, 2024

STM → hard to do it: no one is working in this area (according to peter)

11.3 General-Purpose GPU (GPGPU)

Coprocessor → sits beside the CPU, connected by a bus. Very fast communication, high bandwidth connection

SIMD (GPU): everyone has to execute the same.

no cache, but a lot of registers.
threads, warp, kernel, etc they all have their own memory
we have to figure out the memory

compiler people use vector-processing to do faster memcopy for instance.
nobody is using the vector-processing unit that is sitting in our laptop

11.4 Concurrency Languages

Ada

protected type in ada is the monitor
cannot use ada’s waituntil to code up the dating service, since ada forces the waituntil at the beginning

Ada came up with the requeue statement, changing internal scheduling to external scheduling.

The problem: need to pick up all the computations if it is to be put on another accepted statement.

In contrast, waiting on a condition variable automatically saves the execution location and any partially computed state $\Rightarrow$ Coroutine

11.4.2 SR/Concurrent C++

Example of people trying to get rid of external scheduling.

Not allowed to use

introduced by clause → calculates for each true when clause and selects the minimum

11.4.3 Java

Need to call the start() routine. If you don’t call start(), it’s an object, not a Thread yet.

In microC++, the curly brackets do the join, then deletes it. But not in Java.

Java program:

In java: any class object has to be declared on the heap
Like μC++, when the task’s thread terminates, it becomes an object, hence allowing the call to result to retrieve a result.
see section 8.11 p.161 for monitors
Java is using heavy kernel threads

11.4.4 Go

MTV sanfran

C forall: Go on steroids, but backwards compatible with go

Non-object-oriented, light-weight (like microC++)
light-weight threads runs on top of kernel
go foo(3,f) // start thread in routine foo
- go routines are tasks
- go is just like a start in function foo
- no address given, we don’t know the name of the thread, we just start them
- no way to get a return value
- no direct communication!
instantly terminates… we need to wait for all the threads to finish, not necessarily easy to do that.
threads communicate through channel
- peter don’t like channel: you have a paradigm shift. but 90% time you do call/return, one paradigm call/return and works nicely.
- channel are just a bounded buffer
- operator ← send/receive
Go vs. microC++

11.4.5 C++11 Concurrency

2 way of start threads: callable and functor
main: C++ doesn’t do implicit join with the curly brackets like how microC++ does. you need to explicitly call join.
every threads need to have join: you can have a hidden parent that does all the join or something

thread keeps on running, but storage s is gone, you’ll have problems if the thread try to reach that memory

Locks:

have mutex, condition, but has barging, etc.

11.5 Threads & Locks Libraries

java.util.concurrent: Doug came up with this famous library

He built actors without the message passing: Executor/Future (actors with futures, instead of with message)

11.5.2 Pthreads

Created for C.

type unsafe
no external scheduling, only internal scheduling that has barging and spurious wake up probably

Insert function example in notes

we need while because of barging and spurious wake up

11.6 OpenMP

#pragma omp: BEGIN … END

#pragma mp parallel for: COFOR

also has barrier: …

The End.

🪴 Avril Chen

Explorer

CS343: Concurrent and Parallel Programming

Concepts

Final

Midterm

Assignments

Lecture 1

Advanced Control Flow (Review)

Multi-exit loop

Static multi-level exit

Dynamic Memory Allocation

Lecture 2

Lecture 3

Coroutine

Semi-Coroutine

8.5 Readers/Writer

8.7 Nested Monitor Calls

8.9 Counting Semaphore, V, P vs. Condition, Signal, Wait

8.10 Monitor Types

8.11 Java Monitor

Internal Scheduling

7. Concurrent Errors

Race Condition

No Progress

7.3 Deadlock Prevention

7.4 Deadlock Avoidance

7.5 Detection and Recovery

Which Method To Chose?

9.3 Increasing Concurrency

10. Optimization

10.1 Sequential Optimizations

10.2 Memory Hierarchy

10.3 Concurrent Optimizations

10.4 Memory Model

10.5 Preventing Optimization Problems

11 Other Approaches

11.1 Atomic (Lock-Free) Data Structures

11.1.1 Compare and Set Instruction

11.1.3 ABA Problem

11.1.4 Hardware Fix

11.1.5 Hardware/Software Fix

11.2 Exotic Atomic Instruction

11.3 General-Purpose GPU (GPGPU)

11.4 Concurrency Languages

11.4.2 SR/Concurrent C++

11.4.3 Java

11.4.4 Go

11.4.5 C++11 Concurrency

11.5 Threads & Locks Libraries

11.5.2 Pthreads

11.6 OpenMP

Graph View

Table of Contents

Backlinks