# Chapter 1 ## Efficiency Concerns: 1. Number of operations 2. Processor speeds 3. Storage space ## Interfaces * Interface / abstract data type ### Queue interface * `add(x)` (aka `queue`): add `x` to the queue * `remove()` (aka `dequeue`): remove the next value from queue and return it * Normal queue: the first element inserted is removed first * Priority queue: elements are inserted with a priority, and the smallest element is removed. This function is usually called `deleteMin`. * LIFO queue: a stack; add and remove are called `push` and `pop`. * Deque: generalisation of these * `addFirst(x)` * `removeFirst(x)` * `addLast(x)` * `removeLast(x)` * Stack: addFirst, removeFirst * Queue: addLast, removeFirst ### List interface The List interface subsumes the Queue interface. A list is just a sequence of values, and a Queue becomes a special case of it. Interface: * size() * get(i): get i'th element * set(i, x): set the i'th element to x * add(i, x): insert x at position i * remove(i): remove the i'th element ### USet (unordered sets) USets are a collection of unique items in no particular order; this mimics a mathematical set. Interface: * `size()`: returns the number of elements in the set * `add(x)`: add x to the set if it doesn't already exist * `remove(x)`: remove x from the set if it doesn't already exist * `find(y)`: membership test Note that y and x may be distinct objects, and only need to satisfy an equality test. For example, a dictionary or hashmap is created using a tuple `(key, value)`; `find` compares on `key` and two objects are considered equal if their keys match. ### SSet (sorted set) A USet where order matters. Its interface only changes in the `find` function: * `find(x)`: find the smallest y s.t. y >= x. thereby returning a useful value even if x isn't in the set. AKA successor search. Difference between USet and SSet: sorting requires more steps (run time) and complexity. A USet should be used unless an SSet is explicitly required. ## Mathematical background (See notebook). ## The model of computation Proper analysis requires a mathematical model of computation. The model in the book is on a w-bit word-RAM model. * we can access cells of memory, each of which stores a w-bit word * basic operations (arithmetic and logical) take constant time * cells can be read or written in constant time * the memory manager allows allocating a block of k cells of memory in O(k) time * size constraint: w >= log(n) where n is the number of elements stored in a data structure * data structures use a generic type T such that T occupies one word ## Correctness, time complexity, and space complexity Three factors for analysing a data structure: * correctness: data structure must implement the interface * time complexity: run times of operations on the data structure should be as small as possible * space complexity: the storage space used by a data structure should be as small as possible Run times come in three flavours: 1. Worst-case: an operation never takes longer than this 2. Amortized: if a data structure has an amortized run time of f(n), then a sequence of m operations takes at most m f(n) time. 3. Expected: the actual run time is a random variable, and the expected value of this run time is at most f(n). ## Exercises 1. See src/ch01ex01.cc --- note that the last three exercises were skipped for time. 2. A Dyck word is a sequence of +1’s and -1’s with the property that the sum of any prefix of the sequence is never negative. For example, +1,−1,+1,−1 is a Dyck word, but +1,−1,−1,+1 is not a Dyck word since the prefix +1 − 1 − 1 < 0. Describe any relationship between Dyck words and Stack push(x) and pop() operations. A +1 corresponds to a push, and a -1 corresponds to a pop. At any point, the stack must not overflow. 3. A matched string is a sequence of {, }, (, ), [, and ] characters that are properly matched. For example, “{{()[]}}” is a matched string, but this “{{()]}” is not, since the second { is matched with a ]. Show how to use a stack so that, given a string of length n, you can determine if it is a matched string in O(n) time. The program should push each opening character onto a stack. When a closing character is encountered, the top of the stack should be the matching opening character. See src/ch01ex03.cc. 4. Suppose you have a Stack, s, that supports only the push(x) and pop() operations. Show how, using only a FIFO Queue, q, you can reverse the order of all elements in s. See src/ch01ex04.cc: you just pop each element from the stack → queue, then pop the elements off the queue. If you wanted to reverse the stack itself, you'd just push the elements back onto the stack. 5. Using a USet, implement a Bag. A Bag is like a USet—it supports the add(x), remove(x) and find(x) methods—but it allows duplicate elements to be stored. The find(x) operation in a Bag returns some element (if any) that is equal to x. In addition, a Bag supports the findAll(x) operation that returns a list of all elements in the Bag that are equal to x. In progress: src/ch01ex05.cc. 6. From scratch, write and test implementations of the List, USet and SSet interfaces. These do not have to be efficient. They can be used later to test the correctness and performance of more efficient implementations. In progress: src/ch01ex06.cc, src/ods/simplist.h, src/ods/uset.h. 7. Work to improve the performance of your implementations from the previous question using any tricks you can think of. Experiment and think about how you could improve the performance of add(i,x) and remove(i) in your List implementation. Think about how you could improve the performance of the find(x) operation in your USet and SSet implementations. This exercise is designed to give you a feel for how difficult it can be to obtain efficient implementations of these interfaces Some initial ideas: 1. A linked list could improve add --- instead of having to move elements in place, you would just insert in between. For a forward-iterating insertion (e.g. for j := 0; j < i; j++), there are decreasing benefits as you insert elements farther in the list. A potential improvement might be a doubly-linked list with the iteration direction determined by distance from the centre, though this comes at the cost of additional complexity. The same improvements would apply to remove. 2. For the sorted list, a basic improvement for find would be to pick the middle of the list and do an equality check, basically bisecting to find where the value *would* be.