sandbox/ods/notes/chapter1.txt

# Chapter 1

## Efficiency

Concerns:
1. Number of operations
2. Processor speeds
3. Storage space

## Interfaces

* Interface / abstract data type

### Queue interface

* `add(x)` (aka `queue`): add `x` to the queue
* `remove()` (aka `dequeue`): remove the next value from queue and return it

* Normal queue: the first element inserted is removed first
* Priority queue: elements are inserted with a priority, and the smallest
  element is removed. This function is usually called `deleteMin`.
* LIFO queue: a stack; add and remove are called `push` and `pop`.
* Deque: generalisation of these
  * `addFirst(x)`
  * `removeFirst(x)`
  * `addLast(x)`
  * `removeLast(x)`
* Stack: addFirst, removeFirst
* Queue: addLast, removeFirst

### List interface

The List interface subsumes the Queue interface. A list is just a sequence of
values, and a Queue becomes a special case of it.

Interface:

* size()
* get(i): get i'th element
* set(i, x): set the i'th element to x
* add(i, x): insert x at position i
* remove(i): remove the i'th element

### USet (unordered sets)

USets are a collection of unique items in no particular order; this mimics a
mathematical set.

Interface:

* `size()`: returns the number of elements in the set
* `add(x)`: add x to the set if it doesn't already exist
* `remove(x)`: remove x from the set if it doesn't already exist
* `find(y)`: membership test

Note that y and x may be distinct objects, and only need to satisfy an
equality test. For example, a dictionary or hashmap is created using a tuple
`(key, value)`; `find` compares on `key` and two objects are considered equal
if their keys match.

### SSet (sorted set)

A USet where order matters. Its interface only changes in the `find` function:

* `find(x)`: find the smallest y s.t. y >= x. thereby returning a useful value
  even if x isn't in the set. AKA successor search.

Difference between USet and SSet: sorting requires more steps (run time) and
complexity. A USet should be used unless an SSet is explicitly required.

## Mathematical background

(See notebook).

## The model of computation

Proper analysis requires a mathematical model of computation. The model in the
book is on a w-bit word-RAM model.

* we can access cells of memory, each of which stores a w-bit word
* basic operations (arithmetic and logical) take constant time
* cells can be read or written in constant time
* the memory manager allows allocating a block of k cells of memory in O(k)
  time
* size constraint: w >= log(n) where n is the number of elements stored in a
  data structure
* data structures use a generic type T such that T occupies one word

## Correctness, time complexity, and space complexity

Three factors for analysing a data structure:

* correctness: data structure must implement the interface
* time complexity: run times of operations on the data structure should
  be as small as possible
* space complexity: the storage space used by a data structure should be
  as small as possible

Run times come in three flavours:

1. Worst-case: an operation never takes longer than this
2. Amortized: if a data structure has an amortized run time of f(n), then
   a sequence of m operations takes at most m f(n) time.
3. Expected: the actual run time is a random variable, and the expected
   value of this run time is at most f(n).

## Exercises

1. See src/ch01ex01.cc --- note that the last three exercises were skipped for
   time.

2. A Dyck word is a sequence of +1’s and -1’s with the property that the
   sum of any prefix of the sequence is never negative. For example,
   +1,−1,+1,−1 is a Dyck word, but +1,−1,−1,+1 is not a Dyck word since the
   prefix +1 − 1 − 1 < 0. Describe any relationship between Dyck words and
   Stack push(x) and pop() operations.

   A +1 corresponds to a push, and a -1 corresponds to a pop. At any point,
   the stack must not overflow.

3. A matched string is a sequence of {, }, (, ), [, and ] characters that are
   properly matched. For example, “{{()[]}}” is a matched string, but this
   “{{()]}” is not, since the second { is matched with a ]. Show how to use a
   stack so that, given a string of length n, you can determine if it is a
   matched string in O(n) time.

   The program should push each opening character onto a stack. When a closing
   character is encountered, the top of the stack should be the matching
   opening character. See src/ch01ex03.cc.

4. Suppose you have a Stack, s, that supports only the push(x)
   and pop() operations. Show how, using only a FIFO Queue, q, you can
   reverse the order of all elements in s.

   See src/ch01ex04.cc: you just pop each element from the stack → queue,
   then pop the elements off the queue. If you wanted to reverse the stack
   itself, you'd just push the elements back onto the stack.

5. Using a USet, implement a Bag. A Bag is like a USet—it supports the add(x),
   remove(x) and find(x) methods—but it allows duplicate elements to be
   stored. The find(x) operation in a Bag returns some element (if any) that
   is equal to x. In addition, a Bag supports the findAll(x) operation that
   returns a list of all elements in the Bag that are equal to x.

   In progress: src/ch01ex05.cc.

6. From scratch, write and test implementations of the List, USet and SSet
   interfaces. These do not have to be efficient. They can be used later to
   test the correctness and performance of more efficient implementations.

   In progress: src/ch01ex06.cc, src/ods/simplist.h, src/ods/uset.h.

7. Work to improve the performance of your implementations
   from the previous question using any tricks you can think of. Experiment
   and think about how you could improve the performance of add(i,x) and
   remove(i) in your List implementation. Think about how you could improve
   the performance of the find(x) operation in your USet and SSet
   implementations. This exercise is designed to give you a feel for how
   difficult it can be to obtain efficient implementations of these interfaces

   Some initial ideas:
   1. A linked list could improve add --- instead of having to move elements
      in place, you would just insert in between. For a forward-iterating
      insertion (e.g. for j := 0; j < i; j++), there are decreasing benefits
      as you insert elements farther in the list. A potential improvement
      might be a doubly-linked list with the iteration direction determined by
      distance from the centre, though this comes at the cost of additional
      complexity. The same improvements would apply to remove.

      On a randomised benchmark (random operation, random value, random index),
      the array-backed list still outperforms a from-scratch linked list and
      an STL vector wrapper. The linked list performs the worst, with wildly
      varying run times.

   2. For the sorted list, a basic improvement for find would be to pick the
      middle of the list and do an equality check, basically bisecting to
      find where the value *would* be.