-
Notifications
You must be signed in to change notification settings - Fork 0
Exercises on Data Structures
Rough notes about data structures to look at, to see if it is worth while to make exercises about them.
There are basically four "families" of abstract data types, and a number of interesting data structures which are members of the families.
- Arrays
-
Bit Array, pretending an
int64is an array of 64 bits, for example - Circular Buffer
- Dynamic Arrays when you need to add an unknown amount of elements to an array "on the fly"
-
Bit Array, pretending an
- Linked Structures, when arrays just aren't good enough
- Linked List
- Association List, a linked list of key-value pairs
-
Trees
-
Search Trees useful for searching for stuff fast
- B-Tree used all the time in database implementations
- Binary Search Tree
-
Trie
- Hash Trees -- c.f., paper on Ideal Hhttps://en.wikipedia.org/wiki/Binary_decision_diagramash Trees
- Hash Array Map Trie -- c.f., paper
- Heap
-
Search Trees useful for searching for stuff fast
-
Graphs
- Binary Decision Diagram, very useful for automated theorem proving
There are, of course, abstract data types which do not fall under these four families. The exceptions are:
- List
- Stack
- Queue
- Set
I suppose one approach is to notice how useful arrays are.
First, we move away from the concrete syntax of array[index] to the more abstract array_get(array, index) and array_set(array, index, value). (Exercise 1)
But it would be nice if the array could resize automatically as needed, which is precisely the dynamic array data structure. (Exercise 2)
We observe how nice it is to have an array, and treat the index as somehow meaningful (e.g., in a census program, the index could be the age in years, and the value would be the number of people at this age). It would be nice if we could let the index be structured, in the sense of making it a string or a custom data structure. This gives us association lists (Exercise 3)
But then we can do some formal analysis of the association list (exercise 4) and notice how terrible it is. This gives motivation for...trying anything else.
This gives a natural motivation for hash tables (Exercise 5), tries (exercise 6), and so forth (exercises 7 through infinity).
Remark: Sets for Free. Actually, what's really interesting is...once we allow "anything" to be used as the index ("key"), the array is called an "associative array" (to each "key" is an associated "value"), a "dictionary", or more generally "map". The interesting thing is we can use this to implement a set, by making the key equal to the value we guarantee there are no duplicate entries.
So, whenever discussing a new data structure, the basic questions one should ask are:
- What are the basic operations on it? (What are the signatures of those operations, i.e., what are the inputs and outputs?)
- What are the axioms for those operations?
The next questions one should ask, when implementing the thing:
- What does the
structlook like? (What fields does it have, and what type are those fields?) - What functions implement the operations defined on the type?
- Can we implement the axioms as preconditions/postconditions?
The first batch of questions (what operations are defined, and what are their axioms) are hard to jump into...even asking what the struct looks like, and what functions are defined on it is hard.
So it seems to me the best approach is to gradually get the student to think about such things. Initially, a generic exercise should look like: given the data structure
typedef struct { /* ... */ } fooPlease implement the following functions
-
foo* new_foo()creates a newfooobject in memory, and returns a pointer to it (ornullif there is no memory left) -
void free_foo(foo *obj)will delete thefooobject in memory, and haveobjbe a null pointer; ifobjis null already, do nothing -
char* foo_toString(foo *f)will produce a string representation off -
bool foo_equals(foo *lhs, foo *rhs)tests if the twofooobjects are equal, intuitively iflhs == rhs; if either point isnull, returns false always -
hash_t foo_hashCode(foo *obj)returns a hash code forobj; null pointers return 0 by convention - etc.
Basically, give the function signature, and specify the contract expected. Perhaps add some more axioms, like
- if
foo_equals(x, y)thenfoo_hashCode(x) == foo_hashCode(y)(equivalentfooobjects have equivalent hash codes) - if
foo_equals(x, y)thenstrcmp(foo_toString(x), foo_toString(y)) == 0(equalfooobjects have identical string representations) - etc.
After a few of these, perhaps the struct details can be left to the reader to decide, but more explanation of the data structure be given. (Having diagrams like the ones used in the interpreter handbook may be really useful, too.)