sandbox/ods/notes/chapter1.txt

177 lines
6.9 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Chapter 1
## Efficiency
Concerns:
1. Number of operations
2. Processor speeds
3. Storage space
## Interfaces
* Interface / abstract data type
### Queue interface
* `add(x)` (aka `queue`): add `x` to the queue
* `remove()` (aka `dequeue`): remove the next value from queue and return it
* Normal queue: the first element inserted is removed first
* Priority queue: elements are inserted with a priority, and the smallest
element is removed. This function is usually called `deleteMin`.
* LIFO queue: a stack; add and remove are called `push` and `pop`.
* Deque: generalisation of these
* `addFirst(x)`
* `removeFirst(x)`
* `addLast(x)`
* `removeLast(x)`
* Stack: addFirst, removeFirst
* Queue: addLast, removeFirst
### List interface
The List interface subsumes the Queue interface. A list is just a sequence of
values, and a Queue becomes a special case of it.
Interface:
* size()
* get(i): get i'th element
* set(i, x): set the i'th element to x
* add(i, x): insert x at position i
* remove(i): remove the i'th element
### USet (unordered sets)
USets are a collection of unique items in no particular order; this mimics a
mathematical set.
Interface:
* `size()`: returns the number of elements in the set
* `add(x)`: add x to the set if it doesn't already exist
* `remove(x)`: remove x from the set if it doesn't already exist
* `find(y)`: membership test
Note that y and x may be distinct objects, and only need to satisfy an
equality test. For example, a dictionary or hashmap is created using a tuple
`(key, value)`; `find` compares on `key` and two objects are considered equal
if their keys match.
### SSet (sorted set)
A USet where order matters. Its interface only changes in the `find` function:
* `find(x)`: find the smallest y s.t. y >= x. thereby returning a useful value
even if x isn't in the set. AKA successor search.
Difference between USet and SSet: sorting requires more steps (run time) and
complexity. A USet should be used unless an SSet is explicitly required.
## Mathematical background
(See notebook).
## The model of computation
Proper analysis requires a mathematical model of computation. The model in the
book is on a w-bit word-RAM model.
* we can access cells of memory, each of which stores a w-bit word
* basic operations (arithmetic and logical) take constant time
* cells can be read or written in constant time
* the memory manager allows allocating a block of k cells of memory in O(k)
time
* size constraint: w >= log(n) where n is the number of elements stored in a
data structure
* data structures use a generic type T such that T occupies one word
## Correctness, time complexity, and space complexity
Three factors for analysing a data structure:
* correctness: data structure must implement the interface
* time complexity: run times of operations on the data structure should
be as small as possible
* space complexity: the storage space used by a data structure should be
as small as possible
Run times come in three flavours:
1. Worst-case: an operation never takes longer than this
2. Amortized: if a data structure has an amortized run time of f(n), then
a sequence of m operations takes at most m f(n) time.
3. Expected: the actual run time is a random variable, and the expected
value of this run time is at most f(n).
## Exercises
1. See src/ch01ex01.cc --- note that the last three exercises were skipped for
time.
2. A Dyck word is a sequence of +1s and -1s with the property that the
sum of any prefix of the sequence is never negative. For example,
+1,1,+1,1 is a Dyck word, but +1,1,1,+1 is not a Dyck word since the
prefix +1 1 1 < 0. Describe any relationship between Dyck words and
Stack push(x) and pop() operations.
A +1 corresponds to a push, and a -1 corresponds to a pop. At any point,
the stack must not overflow.
3. A matched string is a sequence of {, }, (, ), [, and ] characters that are
properly matched. For example, “{{()[]}}” is a matched string, but this
“{{()]}” is not, since the second { is matched with a ]. Show how to use a
stack so that, given a string of length n, you can determine if it is a
matched string in O(n) time.
The program should push each opening character onto a stack. When a closing
character is encountered, the top of the stack should be the matching
opening character. See src/ch01ex03.cc.
4. Suppose you have a Stack, s, that supports only the push(x)
and pop() operations. Show how, using only a FIFO Queue, q, you can
reverse the order of all elements in s.
See src/ch01ex04.cc: you just pop each element from the stack → queue,
then pop the elements off the queue. If you wanted to reverse the stack
itself, you'd just push the elements back onto the stack.
5. Using a USet, implement a Bag. A Bag is like a USet—it supports the add(x),
remove(x) and find(x) methods—but it allows duplicate elements to be
stored. The find(x) operation in a Bag returns some element (if any) that
is equal to x. In addition, a Bag supports the findAll(x) operation that
returns a list of all elements in the Bag that are equal to x.
In progress: src/ch01ex05.cc.
6. From scratch, write and test implementations of the List, USet and SSet
interfaces. These do not have to be efficient. They can be used later to
test the correctness and performance of more efficient implementations.
In progress: src/ch01ex06.cc, src/ods/simplist.h, src/ods/uset.h.
7. Work to improve the performance of your implementations
from the previous question using any tricks you can think of. Experiment
and think about how you could improve the performance of add(i,x) and
remove(i) in your List implementation. Think about how you could improve
the performance of the find(x) operation in your USet and SSet
implementations. This exercise is designed to give you a feel for how
difficult it can be to obtain efficient implementations of these interfaces
Some initial ideas:
1. A linked list could improve add --- instead of having to move elements
in place, you would just insert in between. For a forward-iterating
insertion (e.g. for j := 0; j < i; j++), there are decreasing benefits
as you insert elements farther in the list. A potential improvement
might be a doubly-linked list with the iteration direction determined by
distance from the centre, though this comes at the cost of additional
complexity. The same improvements would apply to remove.
On a randomised benchmark (random operation, random value, random index),
the array-backed list still outperforms a from-scratch linked list and
an STL vector wrapper. The linked list performs the worst, with wildly
varying run times.
2. For the sorted list, a basic improvement for find would be to pick the
middle of the list and do an equality check, basically bisecting to
find where the value *would* be.