CS247 Lectures
Lecture 1
Allow us to hide complex details behind an interface.
A client only needs to understand the interface to use a class, they are free to ignore the underlying implementation.
Examples:
- BST - client will call insert/delete/search - don’t need to worry about tree balancing.
- Dictionary - can simply insert/lookup values - no worries about internals,hash function being used, etc.
Ideally, use the compiler to enforce certain properties of the ADT.
Example: Linked List ADT: One property we may wish to have is that all nodes are on the heap
Detecting issues at compiler-time ensure that users of our software never encounter this particular suite of run-time bugs.
We wish to prevent tampering and forgery of our ADTs.
Forgery:
- creating invalid instances of an ADT. Tampering:
- Modifying a valid instance of an ADT to become invalid in an unexpected way.
Example: Rational ADT
What is the explicit keywork? explicit keyword
- Used for single parameter constructors
- Prevents implicit conversions
Suppose we have the following function:
Without the explicit keyword, this will compile f(247)
. However without the explicit keyword, this won’t work. You will need to write:
Another example:
- Using explicit can be helpful to catch mistakes:
So here, without explicit constructors, the Node
constructor that takes an int
as a parameter can be implicitly used for converting int
values to Node
objects. This can lead to unintended conversions that might not be what you expect. The error is that with q(4)
, it will create a temp Node
object using the Node(int data)
constructor and then it will be passed to the q
function. This is not the intended behaviour. Using explicit
in front of the constructor declaration will disallow conversions from int
to Node
. So we would need to explicitly create a Node
object using the constructor and the implicit conversion form int
to Node
wouldn’t be allowed.
Another example:
- This works because
std::string
has a single-parameter constructor which takes in achar*
and is non-explicit. - The key part of the implicit conversion occurs when the string literal
"hello world"
is passed as an argument to the functionf
. Here’s how it works:
- The parameter type of the function
f
isstd::string
. - The argument passed to
f
is the string literal"hello world"
. - The compiler recognizes that the parameter type and the argument type do not match exactly.
- The compiler looks for a constructor in the target type (
std::string
in this case) that can accept the provided argument type (const char*
). - The
std::string
class has a non-explicit, single-parameter constructor that takes aconst char*
as an argument. This constructor is designed to convert C-style strings tostd::string
objects. - The compiler automatically invokes this constructor to create a temporary
std::string
object from the string literal"hello world"
. - The created
std::string
object is then passed to the functionf
.
This entire process is the implicit conversion in action. The compiler automatically handles the conversion from the argument’s type (const char*
) to the expected parameter type (std::string
).
Does the Rational ADT need to be a class?
- Not necessarily
- ADTs are abstract - not tied to one particular implementation
- Could use a struct
- Classes are nicer:
- Construction/destruction is guaranteed (on the stack)
- Can enforce legal value ranges via access specifiers (public/private/protected)
With a C struct
: someone can easily set denom = 0
.
Not possible with our class: denom
is private, we could add logic to our constructor to reject denom = 0
.
This is an example of an Invariant: something about a class or ADT that you always expect to be true.
We gave Rational a default constructor - constructor that can be called with no arguments.
Should all ADTs have a default constructor?
- Not necessarily - only if it makes sense.
- Consider a Student ADT with name, id, birthday fields - no sensible defaults.
If you do not write any constructors at all, you get a compiler provided default constructor: (can be called with no arguments)
- Leaves primitive fields uninitialized (garbage vales)
- Default constructs object fields. If you write any constructor yourself, the compiler provided constructor is not supplied.
Do we need 3 constructors?
- No - we can use default parameters to better code styles
- so default parameter of num is 0 and demon is 1.
We don’t need to write different Constructor for different number of arguments. We can just use default parameters!
Default parameters MUST be trailing Consider:
If we were to call Foo{5} - which ctor are we trying to call here? z = 5, or x = 0 and y = 5?
- Ambiguous - won’t compile - default parameters must be trailing.
- Additionally - default parameters only appear in the declaration, not the definition.
We’ve use the MIL - member initialization list to set our fields: What’s the difference between:
In general, using the MIL is considered better. To understand why - think about object construction. Happens in 4 steps:
- Space is allocated (stack or heap)
- Call the superclass constructor
- Initialize fiels via MIL
- Run the constructor body
Lecture 2
Last Time: ADT design, explicit, MIL.
This Time: MIL finished, operator overloading.
In MIL, we give it the value immediately whereas in setting things in the constructor body, you have to default construct an object field. It takes time because it will have a method call, then you immediately overwrite it in the body.
- Using the MIL is considered better style than assigning the fields in the constructor body.
- Always use it.
There are cases where using the MIL is necessary: (tested in Midterm)
- field id is const, need to use MIL
- References must always be initialized (have a number).
- Tries to default compile A, but it’s not in class B. So does not work.
- Stuck on step2: Since B inherits from A, it will want to store x too in itself, so it will want to initialize x but it’s not in the MIL of B. It will try to default construct it from inside class A. But there is no default constructor because we provided a constructor. Hence, it won’t compile.
What if we gave A a default constructor?
Breaks down for a different reason. Step 2: Didn’t specify how to initialize in superclass, it goes to A and default constructs. Attempts to initialize x
to 0. Step 3: fields are initialized, integer y
is a primitive field, it will be left with a garbage value. Step 4: constructor body runs. Cannot set the value of this->x = x
because x
is a private field. Cannot set private fields in the subclass, because they are not accessible. Only public and protected fields.
Correct way: to initialize B with MIL
We specified that we wanted to initialize the A
portion of this object using the value of x
. When we are creating our B
object, we need to initialize the A
portion, we call the constructor taking in one argument, calls the constructor in A. All goes well.
BUT since we don’t have default constructor for A, and you call when constructing in B A{} without arguments, it won’t compile.
The 4 Steps:
- Space is allocated (stack or heap).
- Call the superclass constructor.
- Initialize fields via MIL.
- Run the constructor body.
Now let’s support Rational Operations!
To support this functionality, we need Overloading.
- Overloading is implementing two functions / method with the same name, but differing in the number or types of arguments provided.
For example, the negate
function overloaded:
Note
Cannot overload based solely on return type.
To perform an operator overload we define a function with the name operator we define a function with the name operator concatenated with the operator symbol.
operator +, operator>>, operator!, operator==
The number of arguments, must match the arity of the operator.
Example:
+: binary operator - 2 args
!: unary operator - 1 args
To support cin >> r >> q
; where r
, q
are Rationals. Define the following operator overload:
cin
is passed to istream& in
, r
is passed to Rational& r
.
- Why is
istream
passed by reference &?- Because copying is disallowed for
istreams
.
- Because copying is disallowed for
- The Rational is passed by reference because we want changes to be reflected outside this function.
- We return
in
to allow for chaining.cin >> r >> q
cin >> r
is evaluated first.- If we return
in
, it simplifies tocin >> q
.
Problem
r.num
andr.denom
are private fields, cannot be accessed in the operator.
Solution 1: Provide accessor methods
- Provide methods
getNum
andgetDenom
which return references to the num anddenom
fields. - Sometimes paired with mutator methods
setNum
andsetDenom.
.- (Could enforce invariants like
denom != 0
) with these methods. These are sometimes called getters/setters
- (Could enforce invariants like
Solution 2: Declare operator >>
as a friend function
Now, support p = q + r
adding two rationals together.
Define operator+
for two Rationals:
Take in arguments via constant reference:
- Constant - don’t want lhs or rhs to change.
- Reference - quick, no copying.
Declaring all these overloads as friends is a pain!
Alternative: Define operator overloads as methods in Rational.
this
takes the place of the lhs- Essentially,
r + q == r.operator+(q);
, they are semantically equivalent.
Note
operator<<
andoperator>>
are usually defined as standalone functions (outside of the class). Becausecin
/cout
appear on the lhs.
What if we want r + 5
? Answer: Function Overloading
What about 5 + r
? Doesn’t work, because order of the arguments matters.
We want an integer on the lhs
, so we need a standalone function here:
What about p = q + r
?
Setting one Rational to another?
- Compiler provides a Copy Assignment Operator for you.
- Can also write our own copy assignment operator:
- Why is the return type
Rational&
, why do we return*this
?- Dereference? or something
We can also chain operator=
:
Lecture 3
Last time: CS247 Lecture 2 MIL, Overloading
This Time: Operator overloading finished. Value categories. Linked List, Abstract Data Type
Implementation as a method for operators
There are some operator overloads that MUST be defined as methods in the class. These are:
operator=
operator[]
operator->
operator()
operator T
where T is a type.
Supporting division of Rationals, e.g q/r
. (Exercise)
Supporting cout << q << endl
where q
is a Rational:
- The
rational1
object is first inserted into thecout
object, followed by a space, and then therational2
object is inserted. Theoperator<<
function is called multiple times in succession, and theostream
objectcout
is returned each time to allow for this chaining of<<
operators. - We want an
ostream
on the lhs→standalone function, need access toq.num
,q.denom
, need to declare as a friend. - Usually we don not provide
endl
in theoperator<<
definition. Don’t want to force our users to use new lines.
Consider:
ofstream
is anostream
, so here, we print out in the file. We pass file into out. Print out to the filer.num / r.denom
.
Our operator<<
and operator>>
definitions also work for reading from and printing to files (and other types of streams).
Finally, we have Rational z{q}
;
or Rational z = q;
or Rational z(q)
- Running the copy constructor. (because
z
is declared for the first time) - Compiler provides a copy constructor that simply copies all the fields. We can also write our own.
Having written all of these operator overloads our Rational ADT is easy to use from a client perspective.
Exercise: Preventing denom = 0
. Multiplication and subtraction arithmetic operators.
- Explain const keyword
const
after the function declaration: It indicates that theoperator-
function does not modify the internal state of theRational
object on which it is called. In other words, it promises not to modify the member variablesnum
anddenom
of the current object.const
after the parameter declaration (const Rational& other
): It indicates that theother
parameter is a constant reference. It means that theoperator-
function accepts aconst
reference to aRational
object as an argument. Theconst
qualifier ensures that theother
object cannot be modified within the function.- Overall, the
const
at the end of the member function declaration indicates that this particular member function is a constant member function, which can be called onconst
objects or non-const
objects.
Takeaways
- Overloading operators gives us a convenient syntax.
- Classes allow us to enforce invariants, good ADT design.
- Friends can be used (sparingly) to give access to private fields.
”As they say in C++, you should never have too many friends. Which shouldn’t be a problem for Waterloo students…"" Thanks Ross.
Value Categories
Every expression in C++ has both a type and a Value Category.
We will discuss 2 categories: lvalues and rvalues.
[! lnote] Lvalue An lvalue is any expression that you can take the address of.
For example:
Rvalue
An rvalue is a temporary values, it will be destroyed “soon”.
For example:
Another example:
We cannot run &f()
- the string isn’t stored anywhere permanently, just put in a temporary until it is saved into s.
Note
The references we have seen so far are lvalue references. Lvalue references (references declared with
&
) can only bind to lvalues. This is because lvalue references provide an alias or another name for an existing object, and it requires an actual object with a memory location to refer to.
For example:
rvalues cannot bind to lvalue reference
An exception: We can bind rvalues to const lvalue references (No complaints of changing anything!)
This is allowed, we won’t modify x, the Compiler creates a temporary memory location to store the 5 in.
We can create rvalue references. To bind an rvalue to a reference, you can use an rvalue reference (denoted by &&
) introduced in C++11.
Extend the lifetime of the rvalue to the lifetime of the reference. (References to temporary values)
- We can use the temporary value returned by
f
for as long ass
exists. - Usually we just do
string s = f();
Most commonly, used for overloading functions based on the value category of the expression:
Why is this useful? We’ll see shortly.
Finally, note type and value categories are independent properties.
- In this expression, although
s
references an rvalue, we can take s’s addresss
is an lvalue.
Linked List ADT
A slightly more complicated ADT example. Linked List ADT
Specifically I want to leverage lvalue/rvalue knowledge for efficiency. We’ll focus on invariants and encapsulation later. Now, we want correctness and efficiency.
I want the following client code:
First, copy constructor:
Next: Node p{n}
this executes the copy constructor - provided by compiler. Copies each field.
- But we just want
p
to be modified if we just use the compiler provided compiler!
Now, p.next->data = "z"
, modifies n
as well! Which we don’t want.
We have shared data. We’ll define our own copy constructor that recursively copies the data structure.
Custom Copy Constructor:
- Recursively calling itself with
new Node{*(other.next)}
.
Now Node p{n};
will perform a deep copy with our custom copy constructor:
- This is a deep copy, as opposed to a shallow copy.
Lecture 4
This time: Linked List ADT continued. Move Constructor and Move Assignment Operator. Elision. Encapsulation.
Issue: although n
is freed, the rest of the Nodes in the list are on the heap, must also be freed.
We will write a destructor, which runs when stack allocated objects go out of scope, and when heap allocated objects are freed.
Complaint: Efficiency
In memory it looks like the following:
- Here, we’ve copied 26 nodes, then delete the originals immediately.
We can observe that getAlphabet()
is an rvalue. It’s going to be deleted by the next line. No one else will use that memory. We can steal it.
We’ll write a Move Constructor for rvalues, which will be faster. Takes in an rvalue. Called on temporary values.
other
: is a rvalue that is passed in and other
references that rvalue, but once inside the function, other is an lvalue.
- Basically,
getAphabet()
comes in and givesn
its data, soa
, and also what the first node points at. We see thatn
points togetAlphabet()
’s second node. Then in the body, we setgetAlphabet
’snext
field tonullptr
. As demonstrated in the drawing, we quickly assignother
ton
. Son
points toother
now.
One more slight optimization:
other.data
has an address, it’s an lvalue. This invokes the copy constructor for the data string. Even thoughother.data
will die soon. We’d prefer to run the string move constructor.
std::move
is included in<utility>
, it forces an lvalue to be treated as an rvalue. So the Move Constructor runs.
We still have data sharing :
Compiler provided copy assignment operator which does shallow copies of pointers. We have data sharing, since p’s first node is changed to a and points to Node n’s b.
Data sharing again, leaked Node.
We’ll implement our own copy assignment operator.
Attempt # 1 :
Need to do some cleanup first when it is an assignment operator.
Example of breaking:
Here, we start by deleting the rest of the linked list. Therefore, n
no longer exists. Can’t do n = n
anymore.
To protect against self-assignment, add the following to the beginning of the assignment operator:
other
is an lvalue reference, we check if it’s the same thing in the memory location, then return *this
.
We’ll rearrange the copy assignment operator slightly: new
has the possibility to reject our request for more memory, we’ll then quit the function.
Call new
first so if it fails, we haven’t deleted anything yet.
Working copy assignment operator:
There is some code duplicating between copy constructor and copy assignment operator. Can address this using the Copy-and-Swap Idiom.
Basically, we create a temporary copy of the data from other
(by calling the copy constructor). Then we swap this newly copied data into our old data. In this case, we swap the first node data from newly copied temp
to p
ultimately what we want to return (p
contains a
). And also swap pointer pointing to tmp.next
. That way p
has the values we want to copy. The new data. Ans since, we’ve swapped the pointers, tmp will be deleted after this copy assignment operator. Since it is a temporary created inside of it.
Finally: Move Assignment Operator for more efficiency:
Important
So
getAlphabet
gets changed. And this is totally fine sincegetAlphabet
is an rvalue anyways. Basically, if ever the Move Assignment Operator gets called, it is copying everything from an rvalue, so we don’t care if it gets modified, which it will be.
Exercise: write this without swapping.
If you don’t define a move constructor/assignment operator but you do define a copy constructor/assignment operator, compiler will use copies in all scenarios.
Rule of Five / Big Five If you write one of the following, should usually write all: Rule of Five
- Destructor
- Move constructor operator
- Move assignment operator
- Copy constructor operator
- Copy assignment operator
Elision
In g++
, std = c++14
, neither move nor copy gets called! This is called Elision.
In certain cases, compiler can skip calling copy/move constructors, instead writing the object’s values directly into its final location.
Another example:
Here, {1, 5}
is written directly into r
.
Note: Elision is possible, even if it changes program output!
- Not expected to know all possible cases, just that it’s possible.
Disable with the flag --fno -elide -constructors
(slow down program though).
Lecture 5
Last time: Destructor, Move constructor, copy/move assignment operator, elision
This Time: Improving encapsulation, iterators.
How can the client misuse this ADT: Tampering and Forgery?
Forgery:
Mostly likely a segmentation-fault on any kind of use.
Tampering:
All these issues are from a common problem: Representation Exposure Every ADT has a representation in memory.
- e.g. Linked lists are implemented via a series of
Nodes
. If the client gains access to the memory, we call that representation exposure. (Client can use this knowledge to do things we don’t expect.)
We have expectations/rules for all linked lists:
- All next nodes are heap allocated (or
next
isnullptr
) - No sharing between separated lists
- No cycles in our list
Clients shouldn’t need to uphold these invariants, it’s our responsibility.
Solution: Encapsulation, provide public methods for the client to interact with our List.
This solves our representation exposure problem.
Client has no knowledge of Nodes
, no access to internal memory representation, no tampering or forgery allowed.
Lists are slow!
- th is not a constant time operation!
- Takes steps to find the -th node in the List
Total time is
We can’t just give clients access to Nodes again, because we would get Representation Exposure.
Design Patterns: Effective solutions to common problems.
Problem: Efficient looping over a data structure while maintaining encapsulation.
Pattern: Iterator Design Pattern: provide an abstraction for pointers.
Example:
Our clients use of an iterator will look like:
These are the functions we need to implement:
- for
List::begin/end
functions that return iterators - for
List::Iterator
:!=
to compare++
to go to the next Node*
to get the string at current Node
Implementation: (defined in .h
file)
post-fix vs. prefix ++
The teacher uses specifically prefix ++ so that the function overloaded is
Iterator& operator++()
. If we wanted it for post-fix ++, we would doIterator& operator++(int i)
…
If your implement an iterator like this, we can use the range-based for loop syntax: (shortcut syntax)
Note
When there is no reference(
&
) ⇒ it creates a copy fromcur->data
intos
. Under the hood, it callsl.begin()
andl.end()
to iterate through
Can also use auto
for type deduction:
Danger
auto
drops references, so if you want to modifyl
, useauto&
.
Iterator is a public nested class, with a public constructor.
Worry of Forgery: auto it = List::Iterator{nullptr};
(create invalid instance)
- creating an ending iterator
- This could be a problem for other kinds of Iterators where
nullptr
is not used.
Solution: Give iterator a private constructor
friend List
is a friend class so that the class List can use private copy constructor of Iterator class
Warning
In general: be cautious when using friends, weakens Encapsulation. Only make Friend (C++) if they can do something for you!
Subtle issue:
Doesn’t compile! because l
is a constant value. How should we know begin/end won’t modify the fields of this constant l
?
Solution: Declare begin/end as const
methods, promise we won’t modify fields.
- A constant object (or const reference to an object) may only have
const
methods called on it. - Now it compiles. (Actually this is bad, we will see in the next lecture)
In the next lecture, we will talk about physical vs logical constness.
Lecture 6
Last Time: Encapsulated List, Iterator Pattern
This Time: Physical vs. logical constness, const iterators, templates, separate compilation.
We saw we could modify a const List&
by using its iterator.
This is due to the difference between physical and logical constness.
- Physical constness is about a particular object’s implementation. Compiler guarantee’s this for const objects/const references. Ensures the fields won’t change.
- Logical constness asks not only for the fields to be immutable, but also that the memory associated with the representation of the ADT doesn’t change.
How to achieve this?
begin
andend
cannot beconst
methods, they return Iterators that return modifiable string references viaoperator*
We’ll write additional methods cbegin()
/cend()
which return const Iterators, which will only provide a const string&
when calling operator*
.
Now: Connect call begin
/end
on a const List&
. Can only call cbegin
/cend
, these don’t allow us to modify the list.
Complaint: duplication between the two iterators. Fix it using a template class. Template (C++)
Template classes are allowed to be parameterized via the type provided.
Classic example: C++ Standard Template Library
vector<int>
vs.vector<string>
is implemented using Template (C++).int
andstring
are type parameter denotes what the vector contains.
Specify the return type of operator*
as a template parameter, rest of the code is the same.
Note: Template definition of methods must be written in the header file. Node’s definition must also be moved into the header file, because the template needs to know the next field exists.
How do templates work? (briefly)
For each use of
MyIterator<T>
, compiler generates another copy of the class withT
substituted where necessary. Afterwards, compiles as usual.No run-time cost using templates, it’s as if we wrote our own
Iterator / ConstIterator
.
Separate Compilations
In List.h
(original version), we provided a forward declaration of struct Node
. Provided definition of the class in the .cc
file.
This is especially useful for separate compilation.
-c
compiler flag lets us produce an object file.
Think of object files as containing the machine instructions for that code. CS241 - Foundations of Sequential Programs with Linking.
Use object files to store the result of compilation, and reuse it if the .cc
files haven’t changed. With many files, significant speedup while developing. We prefer to put implementations in a .cc
file for the following reasons:
List.cc
changes recompileList.cc
intoList.o
. Link object files together to get an executable.List.h
changes all.cc
files that includeList.h
must recompile.A.h
includesList.h
any.cc
files includingA.h
must change. Relink all.o
file together.
We prefer forward declarations of classes where possible-minimizes recompilation.
When can we simply forward declare, when do we need to use
#include
?See example below:
- B needs to know how big A is,
#include
in order to determine their size and compile them. - C needs a field in A, need
#include
in order to determine their size and compile them. B and C require#include
in order to determine their size and compile them. - D all pointers are the same size, just a forward declaration suffices.
- E we don’t need to know the size of A, just that it exists for type-checking purposes.
- F we must
#include
A.h
to know thatdoMethod
exists.
Lecture 7
Last Time: Const Iterators, Templates, Separate compilation
This Time: Preprocessor, Makefiles, Tooling What is #ifndef
pattern?
The files we gave in A1 looked like:
- The above are Preprocessor directives: commands that allow transformations of the source code before it is compiled.
#include file
- replaces this line with the contents of file.
Consider an example: Linear Algebra Modules
We get a compilation error:
main.cc
includes Vec.h
main.cc
includes matrix.h
, includes Vec.h
Two definitions of struct Vec
in main.cc
not allowed.
To fix this issue of multiple definitions, we use “include guards”:
#ifndef VARIABLE
#define VARIABLE
defines a preprocessor variable. The code between#ifndef
and#endif
will only be seen by the compiler ifVARIABLE
is not defined.
Include Guard:
Works because once File.h
is included once, the variable becomes defined, so in all future includes
the block is omitted.
This doesn't fix all issues!
There is an issue with circular dependencies. Consider the following example:
The issue is that each class needs to know the other exists before it can be created.
Solution: Break circular dependency chain using Forward Declarations.
Separate Compilation
Speeds up compilation and development time: change one .cc
file update the .o
file, relink with other old .o
files.
Issue: if we change a .h
file, then many .c
files might need to be recompiled. Mental energy to figure out the dependencies and just recompile the relevant .cc
files. Might be faster to just recompile everything.
Solution: Use a build automation system. Keep track of what files have changed, keep track of dependencies in compilation, and just recompiles the minimal number of files to make a new executable.
We’ll discuss make: Create a Makefile.
Example: main.cc
(includes List.h
), List.h
, List.cc
(includes List.h
).
Makefile v1:
Need to identify the target and dependencies.
- target: myprogram
- dependencies:
main.o
List.o
- General format for first line is
target: dependencies
g++ main.o List.o -o myprogram
is called the recipe
This text is in a Makefile in our directory.
make creates the first target
Looks at the last modified time of dependencies. If last modified time is newer for a dependency than a target target is out of date, recreate it.
Still too much work! Still requires lots of updates to our makefiles.
Makefile v2:
CXX=g++
- is a special Makefile variable for compiler we’re using.CXXFLAGS = -std=c++14 -Wall -g -MMD
-Wall
for more warnings-g
for extra debugging support (need this for gdb and valgrind)CXX=g++
is a special Makefile variable-MMD
to include${DEPENDS}
OBJECTS=${CCFILES:.cc=.o}
- find and replace string operation {}
- include ${DEPENDS}
- compiles all object files with dependencies using
CXX
andCXXFLAGS
- compiles all object files with dependencies using
We’ve discussed a lot of programming, but we haven’t discussed methodology, other tooling yet.
Software development lifecycle: Waterfall Method:
- client → specs → program → build → test/debug it
- client (UML)→ specs → planning → programming (c++)→ build (makefiles)→ test/debug it (gdb/valgrind)→ use source control (git)→ release it → client (goes in a loop)
For the program, use source control and release it.
Lecture 8
Last Time: Makefiles, Preprocessor This Time: Methodologies, gdb/valgrind, source control, modelling
Agile Method: client → specs → planning → programming → building → testing/debugging → source control → releasing → client (goes in a loop)
- Client is looped into the process
- Typically, work is done in 1-2 weeks, called “sprints”. Basically no one uses waterfall method.
Next: Testing / Debugging
Debugging in C++
: Valgrind and gdb
Valgrind: Primary use is a memory checking tool.
- Provides a virtual CPU that it executes your program on - checks for memory access, etc.
Typically programs executed through valgrind are 3-4x slower than usual (don’t use it for performance monitoring).
It is important when using valgrind or gdb to compile with the -g
flag.
Using --leak-check=full
allows us to see where leaked memory was allocated.
Valgrind can only inform on memory leaks for one particular invocation of the program. Highlights importance of designing valgrind test cases that hit each part of the program.
Can also detect invalid reads: report the line of your source code where invalid read happened.
Invalid write - we get 3 pieces of information:
- Where the invalid write occurred
- Where the memory was freed
- Where the memory was initially allocated
Can also detect delete
/delete[]
.
GDB gdb = GNU debugger: Useful for inspecting variables seeing control flow of program.
gdb ./myProgram
run
- runs the program
runa arg1 arg2 arg3
run < test8.in
List of commands:
gdb ./myProgram
run - runs the program
run arg1 arg2 arg3
run < test1.in
break file : line-number
break fn-name - sets a breakpoint, allows us to stop program at this point
next - runs one line of the program
layout src - nicer display for source code
print variable - allows you to see value of that variable at that line
refresh - fix the layout src display
step - runs one instruction of the program
list - show surrounding lines of the program
backtrace - lists the sequence of function calls that got you to your current location
up/down - change the function in the call stack we are observingso as to inspect other variables.
set var - set variable to something else at run-time
continue - runs the program to the next breakpoint
display var - prints the value of that variable after each next step
undisplay 1 - stops displaying first var set with display
watch var - breaks the program whenever var is changed
delete breakPoint Number - remove breakpoint from use (watch and break)
Lecture 9
Last Time: Agile vs. Waterfall, gdb/valgrind This Time: Source Control, Modelling, Class Relationships, Inheritance revisited.
Source Control
- manage source code for your program (version1, version2)
- Wants: collaborate with others, have a version history
- uploading code to dropbox so coworkers can see
- see changes, collaborate w coworkers, track changes over time
- GIT - Linus Torvalds
- historically, people have used SVN, subversion, mercurial.
- most people nowadays just use git.
- git init from within a directory
- git has a staging area - prepare what files which parts will be committed.
- commit unit of work - commit early and commit often to understand how it is developed over time
Basic Commands
git add
- allows you to add first to the staging areagit status
- let’s us see which branch we’re goinggit add -p
- let’s you stage interactivelygit diff
- to see the difference between all (unstaged AND staged) files and the most recent commitgit diff --staged
- to see the difference between staged files and recent commit
One thing we may want to do is develop features or work refactors separately from other development.
git branch branchName
git checkout branchName
- to switch between branchesgit merge branchName
- creates a new commit with parents that are the latest commits on the current branch and branchNamegit merge feature
git log
- shows a history of commitsgit push
- send work to a remote branchgit pull
- pulls work from remote branch, merge as well
Example: Currently on master branch
git branch feature
git checkout feature
git commit
git checkout master
git commit
System Modelling
Goal: Visualize classes and the relationships between them at a high level.
Popular Standard: UML - Unified Modeling Language
Modelling Class Relationships:
Composition, “owns-a” relationship. If A “owns-a” B, then generally:
- B does not have an independent existence from A
- If A dies, then B dies
- If A is copied, B is copied as well (deep copy)
Typically, this is implemented in C++
via object fields.
Aggregation, “has-a” relationship. If A “has-a” B, then generally:
- B has an independent existence outside of A
- If A dies, B lives on
- If A is copied, B isn’t (shallow) Generally Implemented via references or non-owning pointers. (one you don’t delete)
Example: Quest-like System
- If a student dies, university lives on. If we copy a student, don’t copy whole uni.
Finally, specialization: “is-a” relationship
- One class is a “version” of another class.
- Typically subclassing or subtyping relationship
- If B “is-a” A, then we should be able to substitute something of type B, wherever we expect something of type A.
In C++
, achieve this via public Inheritance
Why Book{title ,author, length}
? Recall: 4 steps of object construction (from Constructor)
- Space is allocated
- Superclass constructor runs
- Initialize fields via a member initialization list (MIL)
- Constructor body runs
We cannot set title, author, length because they are private to Book. And if we do not specify how or initialize the Book portion, it will use Book’s default constructor, won’t compile if default constructor does not exist.
C++
Virtual Method and Overriding
virtual
keyword is used for achieving dynamic dispatch for polymorphic types.
- Dynamic dispatch: Choosing which function to run at run-time as opposed to compile time.
- Polymorphic types: one type that can take on multiple different object types.
Consider the following:
b
has “two types” associated with it. Static vs. Dynamic Type Pointer
- Static type:
b
is aBook*
, compiler knows this. - Dynamic type: describes what type
b
is pointing to at run-time. May depend on user-input, cannot be determined by the compiler.
Lecture 10
Last Time: Git, UML, Inheritance This Time: virtual, override, pure virtual, destructors
How do we know which method call in the hierarchy is invoked for b.isHeavry()
or b->Heavy()
?
(Notes copied into Virtual Method)
- Is the method being called on an object? If so, always use the static type to determine which method is called.
- First example: In the first example,
b.isHeavy()
, the objectb
is of typeBook
, and sinceisHeavy()
is a non-virtual method, the method called is determined by the static type ofb
, which isBook
. Therefore, it callsBook::isHeavy()
. - If we call
t.isHeavy()
, exhibits dynamic dispatch. Dynamic dispatch, also known as runtime polymorphism, is the mechanism in C++ by which the appropriate method to be executed is determined at runtime based on the actual type of the object being referred to. This is achieved through the use of virtual functions and the virtual function table (vtable). When you callt.isHeavy()
wheret
is an object of theText
class, the method that gets executed depends on the actual type of the object, which isText
in this case. SinceisHeavy
is declared asvirtual
in the base classBook
and is overridden in the derived classText
, the method call is dynamically dispatched to the version ofisHeavy
defined in theText
class. - In the case of
Book b = Text{...};
, theisHeavy
function call onb
would indeed call theBook::isHeavy()
method. This is because the dynamic type ofb
isBook
even though it was originally assigned aText
object. This scenario demonstrates the principle of using the static type to determine which method is called.Text{...}
creates aText
object. TheText
object is used to initializeb
, which is of typeBook
. This involves a process called object slicing, where only the base class part of the derived object is used to initialize the base class object. Whenb.isHeavy()
is called, it’s based on the static type ofb
, which isBook
. Sinceb
is of typeBook
, theBook::isHeavy()
method is called. Even thoughb
was originally created as aText
object, the static type determines which method is called, and it calls the version ofisHeavy
defined in theBook
class.
When a method is called on an object, the determination of which method to invoke is based on the static type of the object.
- Is the method called via pointer or reference?
a. Is the method NOT declared as
virtual
? Use the static type to determine which method is called:
b. Is the method virtual
? Use the dynamic type to determine which method is called:
We can support:
Each iteration calls a different isHeavy()
method.
What about
(*book).isHeavy()
?
(*book),isHeavy()
calls the correct version/method as well. Why? Because*book
yields aBook&
(i.e. a reference).
What is the purpose of the override
keyword?
- It has no effect on the executable that is created! (WHAT DOES THAT MEAN????????)
- However, it can be helpful for catching bugs.
isHeavy()
is missing a const
. This won’t override Book’s virtual isHeavy
because the signatures do not match.
Specifying override will have the compiler warn you if the signature does not match a superclass’s virtual method.
Compiler will always choose to call a const method to guarantee optimization and correctness.
Why not just declare everything as
virtual
for simplicity?Declaring
doSomething
as virtual doubles the size of our Vec object, program consumes more RAM, slower in general. This extra 8 bytes is storing the vptr - virtual pointer. vptr allows us to achieve dynamic dispatch with virtual functions.
- Declaring
doSomethin()
asvirtual
doubles the size of our Vec object. Program consumes more RAM, slower in general. - This extra 8 bytes is storing the
vptr
- virtual pointer.vptr
allows us to achieve dynamic dispatch withvirtual
functions.
Remember: In MIPS, function calls use the JALR instruction, it saves a register, jumps PC to a specific memory address, hardcoded in the machine instruction.
With dynamic dispatch, which function to jump to could depend on user input. Cannot be hardcoded.
Depending on the dynamic type of the v vptr
it will call Vec2
or Vec3
’s doSomething()
.
When we create a Vec2
or Vec3
, we know what type of object we’re creating, so we can fill in the appropriate vptr
for that object.
vptr
always points to the vtable
which points to the function address.
Now, in either case, we can simply follow the vptr
, get to the vtable
, and find the function address for the doSomething()
method.
Extra running time cost in the time it takes to follow the vptr
and access the vtable
.
C++
philosophy: Don’t pay for costs unless you ask for it.
Destructors Revisited
Which of these leaks memory?
Because the destructor is non-virtual, for pxy
, we invoke ~X
, not the ~Y
, so this array b
is leaked, since the Y
object does not get destroyed.
Solution: declare virtual ~X();
, so delete pxy
will call ~Y()
.
Unless you are sure a class will never be subclassed, then always declare you destructor virtual
.
If you are sure, enforce it via the final
keyword.
Now, the program won’t compile if anyone tries to subclass it.
Object destruction sequence:
- Destructor body runs
- Object fields have their destructors run in reverse declaration order
- Superclass destructor runs
- Spare is reclaimed
Lecture 11
Last Time: Virtual, Override, destructors, vptr, vtables This Time: Pure virtual, polymorphic arrays, Polymorphic Big 5
Very Interesting!!:
If we do not provide an implementation for Shape::getArea()
, code won’t link “undefined reference to vtable error”.
We could make Shape::getArea()
return 0, or -1 to indicate “no area”, but it’s not natural. Really we want to avoid a definition for Shape::getArea()
entirely.
Solution: Declare Shape::getArea()
as a pure virtual function. A pure virtual function is allowed to not have an implementation. See pure virtual method.
Declares getArea()
as pure virtual by adding = 0
. Classes that declare a pure virtual function are called Abstract Class. Abstract classes cannot be instantiated as objects.
A class that overrides all of its parents pure virtual functions is called a concrete class. Concrete classes can be instantiated.
The purpose of abstract classes is to provide a framework to organize and define subclasses.
Polymorphic Arrays
What does f expect:
What it actually looks like:
All of Vec2{7,8}
is written into myArray[0]
. Half of Vec2{9,10}
is written into myArray[0]
. the other half is myArray[1]
.
Note
Lesson: Be very careful when using arrays of objects polymorphically. Use an array of pointers:
Vec3* myArray[2];
Other solution: vector ofVec3* s
Polymorphic Big 5
Let’s consider Book hierarchy again.
Compiler still provides us a copy constructor, that works as expected.
Let’s look at copy/move constructor and assignment operator to see their definition.
Copy Constructor:
Calls the copy constructor for the Book portion of Text. t
is a const Text&
, this is implicitly converted to a const Book&
.
Book does not have a default copy constructor. So we need it in the MIL.
Move Constructor:
t
and t.topic
are lvalues, so we’d invoke the copy constructor if we didn’t use std::move
. So we use move!
t is an rvalue reference so we know it is safe to steal title, author, length and topic via using std::move
to invoke move constructors. (Since rvalue will be gone, ok to modify them and return a random thing.)
Copy Assignment:
Book::operator=(t);
calls the Book assignment operator for the Book portion of this
Move Assignment:
All of these implementations are what the compiler gives by default.
Customize as necessary - for example, if doing manual memory management, you will need to write your own versions of these.
BUT: Are the compiler provided definitions actually that good?
Title, author and length are set, but topic remains unchanged
is defined as non-virtual. We’re calling operator=
method on a reference. We use the static type, call Book::operator=
, even though br1
and br2
are referencing texts. ????????????????
Some fields are copied, but not all. This is the partial assignment problem.
Our usual signature for Text is the following:
Can we just slap an override on the end of this? NO: signature don’t match in 2 places: return type, parameter type.
- Return type: this is actually okay: A subclass’s override method can return a subclass of the virtual function’s return type (if it’s a pointer or reference)
- Parameter type for overridden functions must match exactly
Signature must be the following:
Problem 1) can’t access other’s topic, because it’s a Book
, and Book
don’t have topics, only Texts
do.
Problem 2) other is a Book&
, so now this is legal:
We can set a Text to a Comic on RHS can be implicitly converted to a const Book&
.
This is the mixed assignment problem, which is where you can set subclass siblings to each other.
Non-virtual operator=
leads to partial assignment
virtual operator=
mixed assignment
To fix this, restructure the book hierarchy. Arrow to AbstractBook UML: Abstract classes and virtual methods are italicized (with stars).
Post Midterm!!
Lecture 12
Last Time: Pure virtual, polymorphic arrays, polymorphic Big 5 This Time: Polymorphic assignment problem finished, exceptions
Continuing our implementation of AbstractBook to fix both the mixed and partial assignment problem.
To make this abstract, we need a pure virtual method. If no other methods make sense to be pure virtual, we can always use destructor.
How does this fix the two problem?
- Mixed assignment:
operator=
is non-virtual and the implicitly provided copy assignment operator only acceptsText
. - Partial assignment problem:
Note: only works since AbstractBook
is an abstract class. If we set Book::operator=
to be protected
then we couldn’t assign books to one another. ??????????? (What does this sentence mean)
Consider Text’s destructor. Implicitly, the following happens
- Destructor body runs (empty)
- Object fields we destructed in reverse decl. order
- Superclass destructor runs
- Space is reclaimed
Because in step 3, Text
’s destructor calls AbstractBook’s destructor, we have a problem: we’ve called a method with no implementation.
Solution: give it an implementation:
Nnote
Pure virtual methods don’t have to be implemented, but they still can be. They require an implementation if they will be called.
AbstractBook
is still abstract, only subclasses of classes which define a pure virtual method may be concrete. (Doesn’t mean that the destructor that is pure virtual has an implementation will make it concrete. It is still an abstract class)
One possible recommendation: If you care about assignment, consider making your superclasses abstract. (What does care about assignment mean??)
Error Handling
Example from STL vector
Vectors: dynamically allocated resizing arrays. Handles memory management so we don’t screw it up.
Unfortunately, we can’t idiot proof everything.
- Ross complains about cs138
v.at()
andv[i]
being checked or unchecked.
How to handle errors:
Option 1: Sentinel values. Reserve some values, -1
, INT_MIN
to signal errors:
- Problem: reduces what we can return, can’t return
-1
in a regular scenario. Not clear for a general type what values we should pick as sentinels.
Option 2: global variables. Create some global variable that it set when an error occurs (in C, int errno
, which can be queried for errors with standard functions).
- Also not ideal: Limited to the number of errors, might be overwritten.
Option 3: Bundle in a struct
:
- Best so far, but still not ideal. Wrap our return types in this
struct
, all return types are larger than needed. Awkward to access data field.
These are all approaches that C
users end up using. C++
however has a language feature for dealing with errors: exceptions.
v.at(100)
fetches v[100]
if the value exists, otherwise throws an exception.
r
is just an object, class type isstd::out_of_range
(included in<std except>
)- The
.what()
method returns a string describing the exception.
Force the programmer to deal with the error because the control flow jumps. Vector knows the error happened but not how to fix it. We know how to fix it, but not how the error occurred non locality error handling.
To raise an exception ourself, we use the “throw” keyword. We can throw any value, but keep in mind that <stdexcept>
has objects for common scenarios like out_of_range
, logic_error
, invalid_argument
.
When an exception is raised, control flow steps. (control flow is interrupted). Program starts to search through the stack upwards (reverse order) looking for a handler for this type of exception “Stack unwinding”. As the control flow moves up the call stack, destructors are run for objects stored on the stack during the process of stack unwinding. (Ensures that any resources held by those objects are properly released and cleaned up). If a handler is found, we jump to that point. If no handler is found, program crashes.
call stack
A call stack is a data structure that keeps track of function calls and their respective contexts.
Summary
When an exception is thrown in C++, the control flow jumps to find an appropriate handler for the exception. If a handler is found, the program continues execution from that point. If no handler is found, the program crashes. Stack unwinding takes place during the search for a handler, executing destructors to clean up objects on the stack. Exception handling allows programmers to deal with errors and exceptional conditions in a structured and controlled manner.
Example:
Main calls h
, h
calls q
, q
calls f
, throws, stack unwinding through q
, h
, jump to catch block in main.
Multiple errors may be handled via multiple catch blocks:
One handler can also deal with part of an error, re-throw the exception to allow someone else to deal with it.
The design of having multiple handlers allows for different parts of the code to handle specific types of exceptions. In this scenario, the DataStructureHandler()
function deals with errors related to the DataStructure
object, while the main()
function handles errors related to invalid user input. By re-throwing exceptions, the program can delegate the responsibility of handling specific exceptions to different parts of the code.
Lecture 13
Last Time: Polymorphic assignment, exceptions This Time: Exception safety, RAII, smart pointers.
We saw in the last lecture, we can throw different type of exceptions from catch blocks. We can also re-throw the same exception to deal with:
Why here do I just say throw
rather than throw e
?
- “throw e” performs a copy in order to throw this exception
- Remember that copy constructors (any type of constructor) cannot be virtual. Therefore, the static type is used for the copy.
If you throw a range_error
and catch via std::exception&
catch block,
throw
re-throwsrange_error
.throw e
catch astd::exception
copied from therange_error
, we lose the dynamic type.
Summary
In the previous catch block, we are catching exceptions of type
std::exception
by reference, which means it can catch expressions ofstd::exception
type and any of its derived classes. If an exception of typerange_erro
(derived fromstd::exception
) is thrown in the try block, it will catch it.throw vs. throw e
- Using
throw
without specifying the exception object, we are essentially re-throwing the currently caught exception. The dynamic type of the exception is preserved, meaning that the original type of the expression that was thrown is still recognized and can be caught by a matching catch block further up the call stack. Allows the program to handle the specific exception type with the appropriate catch block.- Using
throw e
: we are creating a new exception of the same type ase
(which is of typestd::exception
) and throwing that new exception (we don’t have specific type anymore!). While the exception is derived from the basestd::exception
type, the dynamic type information is lost because we constructed a new object. As a result, only thestd::exception
part of the exception is considered, and we lose access to any additional behaviour or information provided by the derived exception type, such asrange_error
. Can lead to incorrect or incomplete error handling.
Generally, catch blocks should catch by reference to avoid copies.
Never let a destructor throw an Exception Handling!
Default behavior: Program immediately craches. That is if an exception is thrown during the destruction of an object and not caught within the destructor itself, the program terminates abruptly. This behavior is known as stack unwinding.
noexcept
is a tag associated with methods (like const), which states the method will never raise an exception.
By default, destructors are implicitly tagged with noexcept
, meaning that they are expected not to throw any exceptions. That is throwing an exception during stack unwinding can lead to multiple exception being active at the same time, making program’s state unpredictable and hard to handle correctly.
We can allow exceptions thrown from destructors by tagging them noexcept (false)
. This can happen when you want to provide more information about an exceptional condition occurring during the destruction process or when you have specific error handle mechanism in place to handle such exceptions.
For example:
Danger
If we throw an exception, stack unwinding occurs, during this process destructors are running for stack allocated objects. If one of these destructors throws, now we have 2 active exceptions!
- 2 active exceptions = program crash (this behavior cannot be changed)
Exception Safety
Under normal circumstances, f
does not leak memory. But, if g
throws, then we do not execute delete p
, so we do leak memory!
Thing to recognize: exceptions significantly change control flow! We no longer have the guarantee fo sequential execution.
Let’s fix f
, so we can handle its memory leak.
Complaints about the fix above:
- Repeated code.
delete p
twice, a little annoying. - Could get complicated for many pointers, many function calls, etc.
In other languages, some have “finally” clause, which always runs, either after a successful try, a successful catch, or before a catch returns. Unfortunately, C++ doesn’t have support for this.
- The only guarantee is that during stack unwinding, stack allocated objects have their destructors run.
- Therefore, use stack allocated objects instead no leak
What if we need a pointer? For example, to achieve Polymorphism
One solution: wrap the pointer in a stack allocated object that will delete it for us during stack unwinding. ???????? How do I do that??????? (Smart pointers)
C++ Idiom: RAII (resource acquisition is initialization)
Example:
Resource (file handler) is acquired in the constructor (initialization). Resource is freed (closing the file) in the destructor.
Apply this RAII concept to dynamic memory. std::unique_ptr<T>
(in <memory>
library)
- contains a
T*
passed via the constructor - deletes the
T*
in the destructor
If we leave f
normally, or if we leave f
during stack unwinding, either way, p
’s destructor runs, deletes heap memory (because of implement of unique_ptr
).
In between, we can use p
like a regular pointer thanks to operator overload.
????? Why thanks to operator overload??????
A: It’s just that we can use p
as a regular pointer thanks to the fact that std::unique_ptr
overloads operators such as ->
and *
to mimic pointer behaviour.
Generally, we don not call the constructor directly with a pointer, because of the following issues:
std::unique_ptr<T> p{newT{}};
new is not paired with a delete (not one that we see, at least) If an exception is thrown in this scenario after thenew
but before thestd::unique_ptr
is constructed, the dynamically allocated memory might not get properly deallocated, leading to memory leak. We can avoid this by directly initializing thestd::unique_ptr
with anew
-allocated object. (Initialize astd::unique_ptr
usingstd::make_unique
):
- Causes a double delete:
If we use raw pointer to create an object and then use that pointer to initialize more than one std::unique_ptr
, we’ll end up with a double deletion problem because each std::unique_ptr
will try to delete the same memory. So it is unsafe.
g()
can potentially throw in the example below, heap-allocated objects does not get deleted
If g()
throws an exception, the std::unique_ptr
created with new T()
would be lost, and the memory allocated by new
would not be properly deallocated, leading to memory leak. This can be avoided by using std::make_unique
or manually handling exceptions and memory cleanup.
This is because of the order of operations and exception handling. If g()
throws an exception:
- If
g()
throws an exception before the temporarystd::unique_ptr<T>
object is constructed, then no memory leak will occur because the memory allocated bynew T()
will not be lost; it has never been assigned to astd::unique_ptr
yet. - If
g()
throws an exception after the temporarystd::unique_ptr<T>
object is constructed but beforef
is called, then thestd::unique_ptr<T>
object will go out of scope and properly delete the dynamically allocated memory (because its destructor will be called during the stack unwinding caused by the exception). - However, if
g()
throws an exception after thef
function is called and the temporarystd::unique_ptr<T>
object is passed tof
, then the dynamically allocated memory will be leaked because the exception prevents proper cleanup. thestd::unique_ptr<T>
object’s destructor will not be called because the exception unwinds the stack. The temporarystd::unique_ptr<T>
object’s destructor will not be called automatically because it goes out of scope when the full expression is evaluated (afterg()
is called), not at the exact point whereg()
throws. Since we throw wheng()
is called, we will never get to finish and call the destructor.
Summary
If we use
new
to allocate memory separately and then assign the raw pointer to astd::unique_ptr
, you could encounter memory leaks or double-deletion issues if the code is not designed carefully. Directly initializing astd::unique_ptr
using its constructor or usingstd::make_unique
is recommended
One potential ordering (in C++14) (obscure scenario)
new T()
g()
unique_ptr constructor
f()
Preferred alternative: std::make_unique<T>
(…)
- Constructs a
T
object in the heap with arguments …, and returns aunique_ptr<T>
pointing at this object.
Something else to consider:
Call copy constructor to copy p
into q
.
What happens? Doesn’t compile. Copying is disabled for unique_ptrs
. They can only be moved. (Compile time error if it does happen)
This is achieved by setting copy constructor / assignment operator to = delete
.
C++ ensures that we cannot copy std::unique_ptr
instances by implementing = delete
(which is a trick we can also use). Only able to move them, allowing proper ownership transfer instead.
unique_ptr
are good for representing ownership, since when one object dies, its unique_ptr
fields will run and clean up the associated object.
Has-a relationship: Use raw pointers or references.
- You can access the underlying raw pointer of a smart pointer via
.get()
.
Note
The
.get()
method is a member function provided by C++ smart pointer classes, such asstd::unique_ptr
,std::shared_ptr
, etc. This method allows you to retrieve the raw pointer that the smart pointer is managing. In other words,.get()
returns the actual pointer value stored by the smart pointer.
Lecture 14
Last Time: Exception safety, RAII, Smart pointers This Time: Shared_ptrs, Decorator Pattern, Observer Pattern
What if we want shared ownership, where same memory is deleted only if all the pointers pointing to it are deleted?
We can achieve this via std::shared_ptr<T>
.
shared_ptr
maintains a reference count, which keeps track of how many shared pointers point at this memory. When it hits 0, memory is freed.
Shared Pointers are not perfect.
There are still problems with shared_ptr
s:
- Susceptible if you use the constructor
q
has no knowledge ofp
’s existence
Danger
The problem here is that you’re using the constructor of
std::shared_ptr
to initialize bothp
andq
with the same raw pointercp
. This means bothp
andq
think they own the same object. When bothp
andq
go out of scope, they will each try to delete the object, leading to a double delete and undefined behaviour.To avoid this issue, you should construct
std::shared_ptr
usingstd::make_shared
or by directly passing the constructor arguments:
- Cyclical reference issue
- If we have multiple shared pointers that point at each other, we have the issue that some pointers may never be deleted
myEdges
: Thisshared_ptr
points to a collection ofEdge
objects associated with the currentNode
. TheseEdge
objects might also contain references to otherNode
objects, creating a graph-like structure.neighbors
: Thisshared_ptr
points to otherNode
objects that are considered neighbors of the currentNode
. This establishes a connection between differentNode
instances in the graph.
full example through ChatGPTt:
In general, shared ownership is somewhat rare. Take care in constructing your programs. Usually, it suffices to use the following:
unique_ptr
or object field - ownership/Composition- raw pointer or reference - Aggregation/has-a
shared_ptr
- shared ownership
Exercise: Implement shared_ptr
CS247 final practice
We’ll come back to exception safety later.
- Provide some standard “good” solution to a common problem
- One common problem: adding / removing behaviour at run-time with a polymorphic class
- One example: Windowing system - GUI
If we attempt to make a class for each possible feature, for features, there are possible configurations. Combinatorial explosion!
Solution: Decorator Pattern (Linked List of functionalities)
Abstract Component - Gives the interface for our classes Concrete Component - Implements the “basic” version of the interface
Abstract Decorator - Organize decorators and has-a decorator or the concrete component.
Concrete Decorators - Implement the interface, call operation()
on the next object in this linked list.
ScrollBar Tabbing Basic Window
Example: Pizza
Lecture 15
Last Time: Shared_ptr, Decorator pattern This Time: Observer Pattern, Casting
Design Patterns: Effective solutions to common problems.
Observer pattern: Publisher / Subscriber relationship.
Publisher: Generates some data, change in state Subscriber: Can dynamically subscribe or unsubscribe for various publishers
Should react when data is changed.
Example: Spreadsheet application.
Publishers: Cells. (Subjects) Subscribers: Charts (Observers)
When a cell changes, charts that are observing that cell must re-render, display new information.
May have different types of publishers / subscribers, e.g. - different charts require different logic for re-rendering.
- Subject has Observers
- Observer has a reference to subject
Steps:
ConcreteSubject
has its state updatednotifyObservers
is called - either byConcreteSubject
or some controller (like themain
function)- notify is called on each observer in Subject’s observer list
- This calls
ConcreteSubject::notify
, by the assumption it is pure virtual ConcreteObserver
callsgetState
onConcreteSubject
(notify observers), uses info as necessary
Example: Twitter clone. Tweeters are subjects. Followers which are observers, can only follow one Tweeter.
Note: this only works because destruction happens in reverse order!
Also Note: Design patterns we demonstrate - simplistic implementations. Only get value at scale - multiple types of subjects/observers.
Other variants also exists - for example: passing a Subject&
via notify method. (LOOK INTO THAT)
That’s all you need to complete assignment 3. Review assignment 3 question 3, with the song library. Implemented with Observer pattern.
Casting
Casting allows us to take one type, treat as another.
In C:
(int*)
is a casting operator - treat this as anint*
This is dangerous., generally avoid casting unless necessary, subverts type system.
If necessary, use one of the 4 C++ casts instead of C-style casting.
static_cast
- “well-defined” conversions between two types e.g:
g
callsint
version ofg
Another example:
- Trust me bro, I know it’s a
Text
. - Undefined behaviour if it’s not actually pointing at a
Text
reinterpret_cast
- allows for poorly-defined, implementation dependent casts. Most uses ofreinterpret_cast
are undefined behaviour
-
const_cast
-
dynamic_cast
Lecture 16
Last Time: Observer Pattern, Casting This Time: Casting, Coupling vs. Cohesion, SOLID
const_cast
- the only type of cast that can “remove” constness. Example: using some library that gives us:
Let’s say we know g
doesn’t modify the int
pointed to by p
in any way. Also, I have a const int*
I’d like to call g
with.
Compiler will prevent us from calling g
, because it might modify p
. (that is it might modify our const int*
, which is p
, since we pass it in as an argument).
I can use const_cast
to call g
in the following way:
Generally, const_cast
should be avoided.
Another example, working on legacy codebase that doesn’t use const anywhere. We want to add consts
, make our program const-correct.
Issue: const- poisoning 🍎👎. Adding const
in one location often means we must add it to other locations to allow it to continue to compile.
We can use const-cast to bridge between const-correct and non-const-correct parts of our program. Make small independent parts of the program const-correct, use const-cast to allow the program to compile as the work is done.
- Dynamic_cast: Used for safely casting between pointers/references in an inheritance hierarchy.
Only safe if pb
actually points to a Text
.
Instead,
If the cast succeeds (i.e dynamic type is Text
), then pt
points at the Text.
Otherwise, pt
is set to nullptr
.
Also can be used on references. (Not just on pointers)
What if br
is actually referencing a comic?
Cannot set it to null, no such thing as a null reference.
If dynamic_cast
fails with a reference: throw a std::bad_cast
exception.
There exist smart pointer versions of each of these casts:
Cast shared_ptrs
to other shared_ptrs
. ??????? Don’t really understand ?????
Those allow us to make decisions based on RTTI (run-time type information).
Note*: This function is poor OOD. Don’t do this.
Note
Also Note:
dynamic_cast
only works if you have at least one virtual method.
Recall: polymorphic assignment problem. We considered making operator=
virtual.
- operator= non-virtual: partial assignment
- operator= virtual: mixed assignment
Let’s make operator=
virtual in Book. ???????? WTF
Measures of Design Quality
Question: How can we evaluate the quality of our code - what is good, what is bad, when we’re not just following a particular design pattern?
We’ll discuss a number of ways:
- code smells - refactoring
- coupling/cohesion - how do my classes interact?
- SOLID design principles
First: Coupling + cohesion
Coupling: Description of how strongly different modules depend on one another.
Low:
- Function calls with primitive args/results
- Function calls with structs/arrays as results
- Modules affect each other’s control flow
- Modules sharing global data
High:
- Modules have access to each other’s implementation (friend)
Ideally, we desire low coupling:
- easier to reason about our program
- easier to make changes
We can cheat: If I put everything in one class, super low coupling!
Counterbalancing force: Cohesion: How closely do the parts of my module relate to one another?
Low:
- parts of module are completely unrelated, e.g.
<utility>
- module has a unifying common theme, e.g.
<algorithm>
- Elements manipulate state over lifetime of an object, e.g.
<fstream>
didn’t understand this at first, but Ross went over this again in the Final Review CS247
High:
- Elements are cooperating to perform exactly on a task
Cheat at cohesion:
- put every function in its own class, then they each do only one thing! ?????? What does that mean????
The Goal
We strive for: low coupling, high cohesion
Ex: whatIsIt
function - exhibits high coupling with the Book
hierarchy.
Any new classes in Book
hierarchy necessitate changes to this function.
Which is generally not good right?
SOLID Design Principles
Acronym of 5 principles: purpose is to reduce software rot (the property that long lasting codebase become more difficult to maintain)
- Single Responsibility Principle
- Open-Closed Principle
- Liskov Substitution Principle
- Interface Segregation Principle
- Dependency Inversion Principle
Single Responsibility Principles (SRP): “A class should only have one reason to change”.
i.e. a class should do one thing, not several.
A change to a program spec requires a change to the source.
If changes to different parts of the spec require changes in the same class, then SRP is violated.
Example: Consider ChessBoard
Class
This is actually somewhat bad design. ChessBoard should not print to the screen.
What if I instead want to print to a file? Cannot use the same class without replacing cout
s with printing to a file.
Little more flexible, but still has issues.
What if we want a graphic display? Or to change the language, or to communicate over the internet?
We have low cohesion, violating SRP. ChessBoard is doing multiple things: manipulate state, implement logic, control rendering to the screen.
Instead: ChessBoard should communicate via results, parameters, exceptions. Allow other classes to handle communication with the user.
Lecture 17
Last Time: Casting, coupling/cohesion, S in SOLID This Time: SOL in SOLID
Should main be performing communication? No - limits reusability.
Another possibility: Use the MVC architecture well-suited for SRP.
MVC: Model-View-Controller
- Model: Handles logic + data of your program.
- View: Handles program output / display
- Controller: Handles input, facilitates control flow between classes
Model: May have multiple views, or just one - different types of views: text, graphical view, etc. Very good to use it in a chess engine for the text and graphic display, something similar to the Observer pattern which is a good place to use it for.
- Model doesn’t need to know anything about the state of the views and implementation of the views.
- Picture the Observer pattern Model as the Subject and View being the Observer.
- Sometimes we structure interactions between Model and View via an Observer pattern.
Controller: Mediates control flow.
- May encapsulate things like turn-taking or some portion of the game rules.
- Some of this logic may go in the Model instead: judgement call.
- May communicate with user for input(sometimes done by the View).
By decoupling presentation, input, and data / logic we follow SRP, and promote re-use in our programs.
On the other hand, there maybe parts of this specifications that are unlikely to change, and so adding layers of abstraction just to follow SRP may not be worthwhile.
Note: Avoiding needless complexity is also worthwhile your best judgement.
Open / Closed Principle
Classes should be open to extension, but closed to modification.
Changes to a program should occur primarily by writing new code, not changing code that already exists.
Example: Hero has Sword.
- What if we wanted our Hero to use a different type of weapon? Bow or magic wand?
- Requires us changing all the places we referenced the sword class - lots of modification, and not much extension.
Fix: Hero has an abstract Weapon Class, deriving from abstract weapon, we have concrete Sword, Bow, Want. And we can add as much weapons as we want without changing the weapon class.
Strategy Pattern
Open / closed principle: Closely related to the concept of inheritance and virtual functions.
For example:
This function is open to extension: we can add more functionality by defining new types of books, and closed to modification. Since isHeavy() is overloaded.
Compare this to WhatisIt
:
Not open to extension: Cannot get new behaviour without modifying the source code not closed to modification either.
Open/closed principle is an ideal = writing a program will require modifications at some point.
Plan for things that are most likely to change, account for those.
High Cohesion vs. SRP
High Cohesion: High cohesion refers to the degree to which the responsibilities and functionalities of a software component are related and focused. In simpler terms, it means that a component should have a single, well-defined purpose and should perform tasks that are closely related to that purpose.
Single Responsibility Principle (SRP): The Single Responsibility Principle is one of the SOLID principles of object-oriented design. It states that a class should have only one reason to change, or in other words, a class should have only one responsibility. This principle aligns closely with the idea of high cohesion but is applied at the class or module level. Each class should have a single, well-defined responsibility and should encapsulate only one aspect of functionality.
Liskov Substitution Principle
Enforces something discussed so far strictly:
- public inheritance should model an is-a relationship.
If class B
inherits from class A
: we can use pointers / references to B
objects in the place of pointers / references to A
objects: C++ gives us this.
Liskov substitution is stronger: not only can we perform the substitution, but we should be able to do so at any place in the program, without affecting its correctness.
So precisely:
- If an invariant is true for class
A
, then it should also be true for classB
. - If an invariant is true for a
method A::f
, andB
overridesf
, then this invariant must be true forB::f
. - If
B::f
overridesA::f
:- If
A::f
has a precondition P and a post condition Q, thenB::f
should have a precondition P’ such that PP’ and a post condition Q’ such that Q’ Q.B::f
needs a weaker precondition and a stronger postcondition.
- If
Example: Contra-variance problem https://piazza.com/class/lh4qjengjtl5zy/post/736
- Happens wherever we have a binary operator where the other parameter is the same type as
*this
.
As we’ve seen before, we must take in the same parameter when overriding. C++ enforces this, which actually enforces LSP (Liskov Substitution Principle) for us.
- A circle is-a shape
- A shape can be compare with other Shapes.
- A circle can be compared with any other Shape.
To satisfy LSP, we must support comparison between different types of shapes.
We already performed typeid on circle, so we know that it is a circle, don’t need dynamic_cas
t so we use static_cast
typeid
returns std::type_info
objects that can be compared.
dynamic_cast
: tells us Is other
a Circle
or a subclass of Circle
?
typeid
: Is other exactly a Circle
.
typeid uses the dynamic-type so long as the class has at least one virtual method.
Ex: Is a Square
a Rectangle
?
4th grade will tell us yes, a square is a rectangle.
A square has all the properties of a rectangle.
But:
Issue: we expect our postcondition Rectangle setWidth
to be that the width is set and nothing else changes. Violated by our Square
class.
We basically violate the postcondition of the Rectangle::setWidth
method, which specifies that only the width should change.
Violates LSP, does not satisfy an is-a relationship. Conclusion: Rectangle is not a square. Squares are not Rectangles.
Summary
The Liskov Substitution Principle states that objects of a superclass should be replaceable with objects of its subclasses without affecting the correctness of the program. In other words, if you have a derived class, it should behave in a way that is consistent with the base class.
In the example, we have a base class
Rectangle
and a derived classSquare
. Correct to say that all squares are rectangles. However, problem arises when you try to model this relationship using inheritance and override the methods in the derived class.Main issue is violation of the LSP. The overridden
setWidth
andsetLength
methods in theSquare
class modify both the width and the length, which is not consistent with the behaviour expected from aRectangle
. Leads to unexpected behaviour when you pass aSquare
object to thef
function as observed.In the LSP-compliant code, a
Square
should not inherit directly fromRectangle
, because the specific behaviour ofSquare
’s setters conflicts with the behaviour expected fromRectangle
. One way to design this relationship correctly would be to create an abstract base class that defines the interface for a quadrilateral and then have separate classes forRectangle
andSquare
that adhere to their respective behaviours: (code provided below)
Note
By separating the behavior of
Rectangle
andSquare
into their own classes and avoiding the problematic overrides, you ensure that you’re adhering to the Liskov Substitution Principle. This design also makes it clear that aSquare
is not a special case of aRectangle
in terms of behavior, and theQuadrilateral
base class captures the common interface for both shapes.
Possibility: Restructure inheritance hierarchy to make sure LSP is respected. (basically same thing as in the code above)
In general: How can we prevent LSP violations?
Restrain what our subclasses can do, only allow them to customize what is truly necessary.
We can use a design pattern: Template method pattern or Visitor pattern https://piazza.com/class/lh4qjengjtl5zy/post/736. We’ll provide a “structure” or “recipe” or “template” for what this method will do, allow subclasses to customize portions of this method by performing overrides.
General: When a subclass should be able to modify some parts of a method’s behaviour, but not all. (All Template Method Pattern is about)
Lecture 18
Last Time: SOL in SOLID This Time: LI in SOLID -Template Method Pattern, NVI, Adapter pattern, Multiple Inheritance
Template Method Pattern: In video game, I have two types of Turtles: Red Turtle + Green Turtle
Note: draw method is public and non-virtual. Nobody who subclasses Turtle can override draw.
Note
But whenever a Turtle calls draw(), it will always call from the base class, and notice how the other function drawHead, drawShell, and drawLegs are virtual.
(GreenTurtle and RedTurtle can define draw, but it won’t be an override
, won’t be called if we use Turtle’s polymorphically.) Which makes total sense, since we didn’t specify virtual
for the draw()
method.
Note as well: drawShell()
is private, but we can still override it! Access specifiers (public, protected, private) only describe when methods can be called, not how they may be overwritten. So it sill can be overridden.
NVI - Non-Virtual Idiom (Highly related to Template Method Pattern)
Addresses the challenges posed by public virtual methods by separating the public interface of a class from its customizable behaviour.
Key Idea
To use non-virtual public methods to define the public interface and use private or protected virtual methods to provide customizable behaviour for subclasses. It helps maintain invariants, pre-conditions, and post-conditions while supporting customization.
public virtual methods are trying to do two things at once:
public: Providing an interface to the client:
- will uphold invariants
- Respect pre-conditions / post-conditions
virtual: Providing an interface to subclasses:
- Virtual methods are those that may be customized while an object is used polymorphically
- Provide “hooks” to allow customization of behaviour
- Satisfying the responsibilities of “public” and “virtual” simultaneously is difficult
- How are we sure public virtual method will satisfy its invariants, pre-conditions / post-conditions. Will it even do its job?
- What if we want to add more code to the public virtual method without changing the interface?
Non-Virtual Idiom states the following:
- All public methods are non-virtual
- All virtual methods should be private (or protected)
- Exception for the Destructor
Example: Digital Media
No NVI here.
With NVI:
Now: We can add code that must run for all types of DigitalMedia before or after the doPlay call.
- We could add copyright checking before
doPlay
, this will always run. - Or: we can update a play count after
doPlay
- now subclasses don’t have to worry about maintaining this invariant.
Flexible: can provide more “hooks” for customization simply by adding more private virtual method calls.
E.g.: showArt()
before doPlay()
- private virtual display poster for movie, album cover for song, etc.
All can be done without changing the public interface. Which is good for minimal recompilation, open / closed principle.
- In general - easier if we constrain what subclasses do from the beginning, as opposed to wrestling back control. Supporting Liskov Substitution principle.
- Any decent compiler will optimize the extra function call out - so no cost at runtime! ?????? What extra function call?
Interface Segregation Principle
No code should depend on methods that it doesn’t use.
- related to cohesion, if our class is cohesive
- We prefer many small interfaces over one larger, more complicated interface
- If a class has many functionalities, each client of the class should only see the functionality that it needs
Example: (No NVI for simplicity) Video Game
Imagine we make a change to our drawing interface
battlefield.cc
must still recompile, even though it’s not using any part of the drawing interface.
Needless coupling between UI and BattleField via Enemy (when both UI and Battlefield contains Enemy
)
Note
The issue arises when the
UI
andBattleField
classes each have a vector of pointers toEnemy
objects. Both classes are now dependent on both methods of theEnemy
class. This creates unnecessary coupling between classes that might not need both methods.Since both UI and BattleField contains a reference to Enemy, if we ever decide to change something in the draw() funciton, we need to recompile both UI and Battlefield.cc. But they don’t even use the draw method. This is because the change to
Enemy
’s interface could potentially affect the way it’s used in other parts of the program, includingbattlefield.cc
.
The Interface Segregation Principle suggests that the Enemy
class should have separate interfaces for its combat-related functionality and its UI-related functionality. This way, UI
and BattleField
would only depend on the methods that are relevant to their contexts, reducing unnecessary coupling and recompilation dependencies.
One solution: Multiple Inheritance
Enemy would need to override both draw and strike in order to become a concrete class.
Somewhat similar to the Adapter Pattern - Used when a class provides an interface different from the one you need.
Setup
Draw and Combat define separate interfaces, and Enemy implements both interfaces. Allows Enemy to provide distinct implementations for both drawing and combat.
UI and Battlefield only depend on the specific interface they need (Draw or Combat). Reduces unnecessary coupling.
Multiple Inheritance can help avoid the issues of violating Interface Segregation Principle.
For Example: Library for our window - expects objects to inherit from “Renderable” and provide a “render” method.
- Don’t want to change all of our inheritance hierarchy, or all our draw calls to render - violates open / closed, and also pain. Use an adapter class.
GO READ ON ADAPTER DESIGN PATTERN!!! ????????????
- Satisfy the Renderable interface by calling the methods we’ve already defined.
- We might not even need our Adapter to provide this “draw” method anymore, just render. In which case, we could use private inheritance. If the adapter only needs to use the methods from your class hierarchy but not expose them publicly, using private inheritance is a good choice.
What is private inheritance?
Under protected inheritance: class B: protected A{...}
- x remains inaccessible
- y remains protected
- z becomes protected - can only be accessed in B and subclasses of B B is not an is-a relationship with A.
Under private inheritance (class B: private A{..}
or class B: A{..}
)
- x remains inaccessible
- y and z becomes private - can only be accessed in B methods. No subclasses of B can access y and z.
Protected and private inheritance are not is-a (specialization) relationship.
Lecture 19
Last Time: LI in SOLID, Multiple inheritance This Time: More multiple inheritance, ID in SOLID
As such, we cannot use classes with private or protected inheritance polymorphically.
Multiple inheritance CAN have some tricky semantics.
Ambiguous - it’s not clear whether A1::foo
or A2::foo
should be called - b.food()
doesn’t compile.
Disambiguate: b.A1::foo()
or b.A2::fooo();
Question
Can you do this if
b
inherits from a single class, say class B: public A, can you doB.A::foo
?Although not needed, yes you can do this! It’s using the Scope Resolution Operator.
Another tricky aspect of multiple inheritance - shared Base class. “Diamond Problem”, “deadly diamond of death”.
Because a D
is-a B
and a C
, it ends up having 2 a
fields - one from the B
portion of the object and one from the C
portion of the object.
Must instead disambiguate - which a
field are we talking about? The one from the B
portion, or the one from the C
portion?
Wha if we wanted to disable this (somewhat strange) behaviour, and have only copy of A
in our hierarchy?
Solution: Use Virtual Inheritance.
Syntax:
struct B: virtual A{...}
class C: virtual public A{...}
"virtual" inheritance
The keyword “virtual” indicates that this base class will be shared among all others who inherit from it virtually in the inheritance hierarchy.
Now: dObj.a
→ unambiguous, a
field is now shared between both portions of the object.
Constructors for virtual bases must run before any other constructors. (Just remember this.)
Note
In this example, class
A
is virtually inherited by both classesB
andC
. When a class virtually inherits from a base class, the virtual base class is constructed before any non-virtual derived class constructors. This ensures that there is only one instance of the virtual base class shared among all the classes that virtually inherit from it.Why do constructors for virtual bases run before any other constructors? This is because the compiler needs to guarantee that the shared base class is initialized properly and that its members are ready to be used by the constructors of all the classes that virtually inherit from it.
Typically:
Virtual inheritance:
- This shows the order in which the objects is constructed for virtual inheritance.
Virtual inheritance (idk what’s going on…)
- Call’s D constructor first
- D calls A’s constructor
- Return to D
- Call’s B constructor
- Return to D
- Call C
- Return to D look at picture above.
Fixed:
In this corrected version:
- Class
B
andC
are derived virtually fromA
, ensuring that only a single instance ofA
is present in the inheritance hierarchy. - In class
B
andC
, the constructors call the constructor ofA
using their member initialization lists. - In class
D
, the constructor initializes the virtual base classA
explicitly withA{5}
, and the default constructors ofB
andC
are called usingB{}
andC{}
respectively.
This code follows the correct order of constructor execution, ensuring that the virtual base class A
is constructed before the constructors of derived classes (B
and C
) are executed. This maintains the integrity and correctness of the inheritance hierarchy.
Destruction sequence/steps
The destruction steps for the classes
A
,B
,C
, andD
would follow the reverse order of construction, with the destructor of each class being called in the reverse order of their constructor execution. Here’s the order in which the destructors would be called:
- Destructor of class
D
.- Destructor of class
B
(via the virtual inheritance fromA
).- Destructor of class
C
(via the virtual inheritance fromA
).- Destructor of class
A
.
What about object layout with multiple virtual inheritance???
B object:
To treat a B*
like an A*
, simply create a pointer copy, prints at the same location, ignore B
fields. Same strategy cannot be used for multiple virtual inheritance.
C++/clang
:
Note
virtual Base(A) is actually stored at the end of the object! Not the beginning.
If bp
points at a D
object, see above memory diagram.
If bp
points at a B
object,
If w’re pointing at a D
object, then bp->a
is 40 bytes below the pointer address.
If we’re pointing at a B
object, then bp->a
is 24 bytes below the pointer address.
Note
Now, finding superclass fields depends on the dynamic type.
Solution: Store the offset to the superclass fields inside the vtable
.
This is where the name “virtual” inheritance comes from.?
Note
The
D
object does not look like anA
object, aC
object or aB
object simultaneously, but portions of it do.
Doing pointer assignment can change where the pointer is pointing to.
- This is really IMPORTANT, be very careful
static_cast
/ dynamic_cast
will also adjust pointer locations for you as necessary.
reinterpret_cast
won’t - underscores danger associated with the cast.
Finally, both Google C++
guide, and ISOCPP both recommend when using multiple inheritance, interface classes are best.
Interface classes are those that:
- Define all methods as pure virtual
- Contain no state (fields)
Lecture 20
Last Time: Multiple Inheritance This Time: D in SOLID, Visitor Pattern
Dependency Inversion Principle: High level modules (closer to main) should not depend on low-level modules (simpler tasks)- instead, each should depend on an abstraction.
Abstract classes should not depend on concrete classes.
Example: Microsoft word. Word counter rely on some sort of keyboard input. Breaks dependency inversion principle because counting words is “high-level”, getting input “low-level”.
Another example: Timer, lights/bell that goes off
Apply dependency inversion once ⇒ Timer should no longer rely directly on Bell.
Abstract Responder class shouldn’t depend on concrete Timer. So apply dependency inversion again.
This is simply the observer pattern - where Source
s are subjects, Responder
s are observers.
We could apply dependency inversion again introduce another layer of abstraction between Timer and bell/lights - but at some point, usefulness runs out. Too many layers of abstraction creates needless complexity.
Visitor Design Pattern (AMAZON INTERVIEW QUESTION) Problem: How do we write a method that depends on two polymorphic types?
Example:
We might want different behaviours for each combination of these concrete classes.
We can override this in Rock
and Stick
- but we don’t know which enemy we’re hitting. Same problem exists in reverse, if we implement it inside Enemy
. We don’t know what kind of weapons we are using.
Solution: Visitor Pattern - Combine overloading with overriding.
beStruckBy
is a virtual function → either callingMonster::beStruckBy
orTurtle::beStruckBy
(runtime).- Call
w.strike
on theWeapon&
. Strike is virtual→Rock::Strike
orStick::Strike
(runtime). - Are we calling
Rock::Strike(Turtle&)
orRock::Strike(Monster&)
? We know the type of this. Ifthis
is aTurtle*
→ call turtle version, if this is aMonster*
→ call monster version. (compile time) This achieves double dispatch. Very powerful technique that combines overloading with overriding.
Basically the dynamic type of Weapon
and Enemy
will both decide which exact function will be called.
Double Dispatch
Double dispatch is a way to dynamically determine the operation to perform based on both the runtime types of two objects. In your example, it’s used to determine the correct operation (
strike
) to perform based on the combination of anEnemy
and aWeapon
.Another use of the Visitor Pattern - add behaviour and functionality to classes, without changing the classes themselves.
Compile-time Dispatch
Compile-time Dispatch: The compile-time dispatch is achieved by calling the correct
Weapon
’sstrike
method based on the dynamic type ofthis
. Ifthis
is aTurtle*
, theTurtle
version ofstrike
is called, and if it’s aMonster*
, theMonster
version is called.
Example: Compiler Design
I might want to collect information about my AST. I could add a virtual method in ASTNode, override in concrete classes. … But → lots of changes scattered over many classes. Difficult to understand, harder to implement incrementally.
Instead: provide nodes an accept method, taking in a polymorphic AST Visitor.
Now, we can write different types of visitors to work on the parse tree. For example:
- String Visitor Collect string literals in program, that must go in the DATA segment.
- Scope Visitor Match variables with associated scope.
- CodeGeneration Visitor Generates assembly for each given Node.
Solution improves cohesiveness all code for a particular visiter is located in one class. New visitors are easy to create, and can be made with zero changes to the AST.
Lecture 21
Last Time: D in SOLID, Visitor Pattern This Time: CRTP, Polymorphic Cloning
I’m still not satisfied with this visitor pattern. Too much code!
There is going to be a lot of methods: If we have subclasses of and subclasses of different methods to write.
Annoying part: Boilerplate - write the following in each DerivedEnemy
:
- Must be done for every single
DerivedEnemy
- Cannot put it in
Enem
y:
Doesn’t work - type of *this
is wrong. It’s not telling us what the dynamic type is.
Solution to fixing the boilerplate code: CRTP Curiously Recurring Template Pattern
Template our superclass with a type parameter - inherit AND substitute the derived class type.
How do we use this to fix the boilerplate code? ()
- created a template base class
Enemy
that takes the derived class type as a template parameter. - then derive enemy classes from the
Enemy
template and provide the derived class type as template argument.
This sort of works:
- when we call
beStruckBy
on an enemy instance, the CRTP pattern ensures that the correctw.strike
method is called on the dynamic type of the enemy.
Cast *this
from type Enemy<Turtle>*
to Turtle*
allows us to override into Rock::Strike(Turtle&)
(or stick).
Explanation
Key here is that by using CRTP, the
this
pointer static type isEnemy<Turtle>*
but its dynamic type isTurtle*
. The caststatic_cast<T*>(this)
effectively changes the static type toTurtle*
, which allows the correctw.strike
method to be called based on the dynamic type.
Issue: Now, we have different superclasses for each Enemy
:
Because Enemy<Turtle>
and Enemy<Monster>
are different classes, we can no longer use Enemys polymorphically.
- No
vector<Enemy*>
allowed!
Solution: Add another layer of inheritance. This solution aims to provide a common interface for all concrete enemy types while maintaining the desired behaviour of the Visitor Pattern.
- The
Enemy
class defines a pure virtual functionbeStruckBy
, creating a common interface for all enemies. - The
EnemyBeStruck<T>
class inherits publicly fromEnemy
and implements the virtualbeStruckBy
function by using CRTP. It also provides a virtual destructor, which should be defined outside the class as you’ve shown. - The concrete enemy classes (
Turtle
andMonster
) specialize theEnemyBeStruck
template class by providing the derived type (Turtle
orMonster
) as the template argument. This allows them to inherit the behaviour ofEnemyBeStruck
and provide their own specific implementations.
Now we have a public interface by which all our concrete enemies follow: they can all beStruckBy
weapons.
We use this virtual method in Enemy
to resolve beStruckBy
to either EnemyBeStruck<Turtle>
or EnemyBeStruck<Monster>
(when we have a pointer to Enemy)
Then just static_cast
to T*
- and we’re good.
Another problem CRTP can solve: Polymorphic cloning
Recall abstract book hierarchy:
Say I have:
I want a deep copy of whatever b
points to. I cannot just do this:
This attempts to create an AbstractBook
by invoking its constructor. Wrong for 2 reasons:
AbstractBook
is abstract, we cannot instantiate those objects- Ignoring what we’re actually pointing at, we actually want to invoke a constructor that depends on the dynamic type of
b
.
We can provide a virtual clone method for the purpose of solving this.
Instead use the copy constructor in each of our clone methods to simplify the implementation.
Once again, we can use CRTP.
b->clone()
is virtual, so if b
points at a Comic, we call BookClonable<Comic>::clone
- static_cast
this into a Comic*
and invoke the Comic copy constructor with the Comic&
.
Provided for all subclasses - reduces boilerplate code.
Command Pattern
About using objects to encapsulate behaviour of some “action” within a system.
Example: Consider writing an IDE. Using MVC - we might have Model/Control relationship where Controller calls Model methods, like :
This is a decent decision - but it doesn’t allow for some features we might desire:
- Macros - replaying sequences of instructions
- Undo / Redo
Command Pattern: Instead of manipulating the model directly - we pass Commands
to an Invoker
.
Command object has whatever information it needs to perform its given action - maybe the Model, maybe sub objects within the model
- Controller would create command objects, supply with info they need
- Sends abstract commands to the Invoker for processing
- Invoker calls each action method to execute the command.
Invoker can hold additional information!
Invoker can maintain additional state - stack of Commands for undo / redo, or mapping of a macro keyword to a list of commands.
Lecture 22
Last Time: CRTP, Polymorphic cloning This Time: Exception Safety, Template functions?
Recall: We saw that reasoning about programs with exceptions is tricky.
Why do we care if the program is going to crash anyways?
We care about memory leaks even when a program is going to crash due to factors like:
- Resource Management: The operating system might not reclaim all resources, affecting other program’s performance.
- Debugging and Maintenance: Memory leaks can be indicative of other code problems.
- Reliability and Robustness: Repeated crashes and memory leaks in larger system can cause system-wide issues.
- Graceful Failure: Even in a crash, proper resource cleanup provides better debugging information and minimizes impact on the system.
To avoid such leaks, use exception-safe practices like smart pointers in C++ (std::unique_ptr
, std::shared_ptr
), which manage memory automatically even when exceptions occur.
We can be more specific about how safe our program is given an exception is thrown.
4 Levels of exception safety:
- No exception safety - If an exception is thrown - no guarantees about program state - object has invalid memory, memory is leaked, program crashes
- Basic guarantee - If an exception is thrown - program is in a valid, but unspecified state. Ex: Class invariants are maintained, no memory is leaked.
- Strong guarantee - If an exception is thrown, the program reverts back to its state before the method was called. Ex:
vector::emplace_back
- Either it succeeds, or if exception is thrown, vector is in its prior state
- Nothrow guarantee - exceptions are never propagated outside of the function call - always does its job. Ex:
vector::size
givesnothrow
guarantee, always returns size of vector without fail.
What is the exception safety of f
?
A::g
and B::g
both provide the string guarantee.
f
- only satisfies the basic guarantee
If a.g()
throws, then because it provides strong guarantee - it’s as if it was never called
- when exception propagates from
f
,f
has not done anything
If b.h()
throws - it provides the strong guarantee, so it does undo any of its effects. BUT: it has no knowledge that a.g()
ran just prior. As a result, when exception propagates from f
, the effects of a.g()
remain in place. ???????? (what does it mean effects of a.g()
remain in place??????)
Program is in a valid yet unspecified state basic guarantee
Right now, there may be effects of a.g()
that are impossible to undo writing to a file, printing to the screen, sending a network request.
For simplicity, we’ll assume A::g
and B::h
have only local side effects (i.e. only the a
and b
objects and any associated memory may change).
Now, how may we try to rewrite f
to achieve the strong guarantee?
Idea: Use Copy-and-Swap Idiom, only work on temporary objects rather than real ones.
If aTemp.g()
or bTemp.h()
throw - no changes are made to the C
object, it’s as if f
was never called.
This is unfortunately, still the basic guarantee. Because, if b = bTemp
throws, we propagate an exception having modified a
.
What we really need for this is a non-throwing swap or assignment. Assigning a pointer will never throw. One possible solution: PImpl Idiom, “pointer to implementation” idiom
Dereferencing pImpl
gives us a CImpl&
. Then we call make_unique<CImpl>
with this CImpl&
. make_unique
creates an entirely new unique pointer for us, and it will invoke the copy constructor for CImpl
(because make_unique
takes a CImpl
’s constructors arguments, and the constructor that takes in a const CImpl&
is the copy constructor).
Remember for unique_ptr
The copy ctor and copy assignment operator of unique_ptr is disabled by the standard library.
This is the strong guarantee - if a.g()
or b.h()
throw - all work is done on temp, so we’re fine.
Otherwise, swap will always succeed f
will do its job.
Note
In this version of
C::f()
, you create a temporaryCImpl
object (temp
) and perform all modifications on it. This ensures that no changes are made to the actualC
object if an exception occurs. Thestd::swap
operation at the end is guaranteed not to throw exceptions, and it swaps the contents oftemp
andpImpl
, effectively committing the changes only if everything succeeds without exceptions.This code achieves the strong exception safety guarantee because if
a.g()
orb.h()
throws, theC
object remains unchanged, and if the swap operation throws, it’s done after the modifications are isolated in the temporary objecttemp
.
PImpl Idiom can also be used in some scenarios for reducing compilation dependencies.
typically:
C.h
On the other hand:
CImpl.h
C.h
Now, if cImpl
changes only C.cc
must recompile. Because we only use a forward declaration in C.h
no recompilation.
Explanation
With the PImpl idiom, you encapsulate the private implementation details in a separate structure (
CImpl
) that’s defined in its own header file (CImpl.h
). You only forward declareCImpl
inC.h
, which means you’re not exposing the actual implementation details to the code includingC.h
.
Finally - we could make CImpl
a polymorphic class - swap out implementation at runtime - the Bridge Pattern.
Back to Exception Safety - vector::emplace_back
. Gives us the strong guaranteed. How?
Easy case: No resize, just put the object in the array. When resizing:
- Allocate a new, larger array.
- Invoke copy constructor objects (of type T) to new array.
- If copy constructor throws: delete the new array, old array is left intact.
- Then delete old array, and set array pointer to new array (nothrow).
Complaint: This is a lot of copies when all I really wanted was to resize my array.
Better, would be to move:
- Allocate a new array
- Move objects from the old array to the new array
- If a move throws, move all our objects back from new array to old array, delete new array.
- Delete old array, set pointer to new array.??? Wrong right
Problem: If move throws once, it might also when moving objects back. Once we’ve modified our old array, no guarantee we can restore it.
Solution: If move constructor is declared as “noexcept”, then emplace_back
will perform moves. Otherwise, it will copy over every item, and do this try-catch block to delete old items if necessary.
If possible, moves and swaps should provide the nothrow guarantee - and you should make this explicit to the compiler via the noexcept tag.
If you know a function will never propagate an exception - declare it noexcept to facilitate optimization.
Moves and swaps at the minimum should be noexcept.
Lecture 23
I skipped this class, will update the notes. This lecture is not on the final.
- Last Time: Exception Safety
- This Time: Multithreading
This isa topic discussed in SE350, CS343.
Thread: A sequence of instructions to be executed. Typically: just the program.
Most CPUs in the modern day have more than one core.
This can sometimes allow extraction of extra efficiency.
Example: Matrix Multiplication
Typically, compute entries in order:
In a multithreaded program: Create separate threads to perform the computations simultaneously.
This allows us to multiply each row for our product .
Why do we need ref/cref, the thread constructor implicitly copies / moves its parameters.
cref, ref: create reference-wrapper object types, which can be copies / moved.
- distributed the workload. Each thread works independently on a separate part of the problem.
This is an example of an “embarrassingly parallelizable” problem.
What about synchronization? . First: Compute . Then compute gives us .
We need to compute in entirety, before we may compute .
Simply:
- Spawn threads for , join them.
- Spawn new threads for , join them.
But: creating threads is expensive. What if we reuse the same threads?
Let’s do a barrier example: 2 functions -f1
and -f2
: both must complete phase 1, then they can proceed to phase 2.
Why are the outputs different?
You should run this yourself, sometimes the output will be differing. This is because
<<
is actually a function call. Since the threads run simultaneously, you actually don’t have the guarantee that an entire line runs alone and prints together (because there are actually two function calls in that single line).
Is there a solution? Yes, use \n
instead of endl
.
Use std::atomic<bool>
to prevent incorrect compiler optimizations with multithreaded code.
Conclusion, what was CS247 about?
This is not actually a course about C++. Purpose?
High-level Thinking:
- How well do my programs accommodate changes?
- How do I balance simplicity vs. abstraction?
- Where is abstraction useful, and where it is cumbersome?
Low-level Thinking?
- What is the compiler doing?
- What are the efficiency tradeoffs in abstractions.
- Do I have a clear understanding of what my language compiles.
Learning more program paradigms in detail will make you a better programmer.
THE END.