# Searching trees

Find tree structures and use them to access data more elegantly

But first:

possible solutions for the last assignment

## An array of strings

```1 char *month_names[]= {
2   "January",
3   "February",
4   "March",
5   ...
6 }
```

## An array of structs

```1 struct month_info_t {
2   char *name;
3   uint8_t num_days;
4 };
5
6 struct month_info_t month_info[]= {
7   {.name= "January",   .num_days= 31},
8   …
9 }
```

## Birthday countdown

```1 struct date_t pivot;
2 struct date_t birthday;
3
4 for(;;) {
5   uint16_t days_next_bd=
6     date_days_between(&pivot, &birthday);
7   date_print(&pivot);
8   date_increment(&pivot);
9 }
```

Pointerized countdown

```1 struct date_t {
2   uint8_t day_of_month;
3   struct month_info_t *month;
4 };
```

Back to the trees

# Guess game

```1 if(num_cur_guess == num_to_guess)
3
4 if(num_cur_guess < num_to_guess)
5   Serial.print("too small");
6
7 if(num_cur_guess > num_to_guess)
8   Serial.print("too big");
```

Upload the example above and follow the instructions in the serial monitor

Hint: Add a `.` or another symbol after the numbers you enter

Hint: The number is between 0 and 31

# Strategy #1

```Enter an estimate for the random number:
Your guess (1) is too small try again
Enter an estimate for the random number:
Your guess (2) is too small try again
Enter an estimate for the random number:
Your guess (3) is too small try again
Enter an estimate for the random number:
Your guess (4) is too small try again
…
```

# Strategy #1

Can take up to `n` tries for a range of `n` numbers.

Can we do better?

# Strategy #2

```Your guess (16) is too big try again
Enter an estimate for the random number:
Your guess (8) is too big try again
Enter an estimate for the random number:
Your guess (4) is too small try again
Enter an estimate for the random number:
Your guess (6) is too small try again
Enter an estimate for the random number:
Your guess is correct! The number was:7
You needed 5 steps to find the number
```

Split the interval into two subintervals per step

# Strategy #2

First Guess: 16

``` ╭─(>16)─┄
─┤
╰─(<16)─┄
```

# Strategy #2

`Your guess (16) is too big, try again`

Second Guess: 8

``` ╭─(>16)─┄
─┤
│       ╭─(>8)─┄
╰─(<16)─┤
╰─(<8)─┄
```

# Strategy #2

`Your guess (8) is too big, try again`

Third Guess: 4

``` ╭─(>16)─┄
─┤
│       ╭─(>8)─┄
╰─(<16)─┤
│      ╭─(>4)─┄
╰─(<8)─┤
╰─(<4)─┄
```

# Strategy #2

`Your guess (4) is too small, try again`

Fourth Guess: 6

``` ╭─(>16)─┄
─┤
│       ╭─(>8)─┄
╰─(<16)─┤             ╭─(>6)─┄
│      ╭─(>4)─┤
╰─(<8)─┤      ╰─(<6)─┄
│
╰─(<4)─┄
```

# Strategy #2

`Your guess (6) is too small, try again`

Fifth Guess: 7

``` ╭─(>16)─┄
─┤
│       ╭─(>8)─┄
╰─(<16)─┤             ╭─(>6)─╼ 7
│      ╭─(>4)─┤
╰─(<8)─┤      ╰─(<6)─┄
│
╰─(<4)─┄
```

Complete decision tree

```                      ╭─(<)──╼ 1
╭─(<)──┤ 2
│      ╰─(>)──╼ 3
╭─(<)──┤ 4
│      │      ╭─(<)──╼ 5
│      ╰─(>)──┤ 6
│             ╰─(>)──╼ 7
╭─(<)──┤ 8
│      │             ╭─(<)──╼ 9
│      │      ╭─(<)──┤ 10
│      │      │      ╰─(>)──╼ 11
│      ╰─(>)──┤ 12
│             │      ╭─(<)──╼ 13
│             ╰─(>)──┤ 14
│                    ╰─(>)──╼ 15
─┤ 16
│                    ╭─(<)──╼ 17
│             ╭─(<)──┤ 18
│             │      ╰─(>)──╼ 19
│      ╭─(<)──┤ 20
│      │      │      ╭─(<)──╼ 21
│      │      ╰─(>)──┤ 22
│      │             ╰─(>)──╼ 23
╰─(>)──┤ 24
│             ╭─(<)──╼ 25
│      ╭─(<)──┤ 26
│      │      ╰─(>)──╼ 27
╰─(>)──┤ 28
│      ╭─(<)──╼ 29
╰─(>)──┤ 30
╰─(>)──╼ 31
```

# Strategy #2

Observation: By splitting the interval in half for each iteration we are actually following a binary tree structure

For `n` elements we need at most `log_2(n)` steps to find the correct element

Using that observation

# Student register

As part of a student database we need a way to search for names by matriculation number

# Student register (flat)

```1 struct student_t {
2   char *name;
3   uint32_t mat_num;
4 };
5
6 struct student_t student_register[MAX_STUDENTS];
```

One way to structure the data is to use a simple array

This is a flat structure

# Student lookup (flat)

```1 struct student_t *
2 sr_flat_lookup(struct student_t *flat,
3                uint32_t mat_num)
```

Download the database sourcecode and find the `sr_flat_lookup` function

Use a `for` loop to iterate through all elements of `flat[]` while checking if the matriculation number matches `mat_num`

Return the current array element if the matriculation number matches

Test your implementation in the serial monitor

Hint: Add a `.` or another symbol after the numbers you enter into the monitor

# Student lookup (flat)

```1 for(size_t i=0; flat[i].name; i++) {
2   if(flat[i].mat_num == mat_num)
3     return(&flat[i]);
4 }
```

In the worst case (student not in database) we would have to look at every single entry

Can we do better?

# Student register (tree)

```1 struct student_t {
2   char *name;
3   uint32_t mat_num;
4   struct student_t *child_lt;
5   struct student_t *child_gt;
6 };
```

To make lookups more efficient we can hook the database entries into a tree structure

# Student register (tree)

```1 void sr_print_subtree(struct student_t *root,
2                       uint16_t indent)
3 {
4   sr_print_subtree(root->child_lt, indent+1);
5   Serial.print(root->name);
6   sr_print_subtree(root->child_gt, indent+1);
7 }
```

Change the `DEBUG_TBUILD` and `DEBUG_LOOKUP` `#define` at the top of the file from `false` to `true`

Watch the output in the serial monitor and try to find out how new elements are inserted into the tree

# Finding elements

Now enter some numbers into the serial monitor and try to find out how elements are looked up in the tree

Hint: Add a `.` or another symbol after the numbers you enter

# Comparing speed

Change the `DEBUG_TBUILD` and `DEBUG_LOOKUP` `#define` at the top of the file back from `true` to `false`

Copy your implementation of `sr_flat_lookup` into the file

Compare the displayed runtimes for the two lookup algorithms

How does the position of an entry in the array influence the lookup times?

# Tree balance

```           ╭──┾ Denver Dolph(4037371)
╭──┾ Angela Applewhite(4034097)
╭──┾  Cyrus Carbonell(4031913)
│  │        ╭──┾ Elois Edmond(4030462)
│  │        │  ╰──┾ Norman Napoleon(4030020)
│  │        │     │  ╭──┾ Tracey Thibodeau(4029685)
│  │        │     ╰──┾ Rea Reineck(4027146)
│  │     ╭──┾ Joshua June(4026817)
│  │     │  ╰──┾ Matilde Micheal(4026688)
│  │     │     ╰──┾ Magnolia Meader(4025231)
│  │  ╭──┾ Clarisa Coulson(4023839)
│  │  │  ╰──┾ Alix Alvarenga(4019661)
│  ╰──┾ Janet Joynes(4015054)
╭──┾ Ona Odonoghue(4013096)
│  ╰──┾ Song Sandler(4011005)
──┾ Clifford Calahan(4009336)
│  ╭──┾ Juliet Jaquez(4007416)
│  │  │  ╭──┾ Shelton Sarkisian(4006478)
│  │  │  │  ╰──┾ Marilee Meng(4006268)
│  │  ╰──┾ Tyree Troche(4004535)
╰──┾ Paulette Petrone(4001246)
```

In the tree above you can see that there are far more elements behind the `child_lt`-branch of "Clifford Calahan" than behind the `child_gt`-branch

This indicates that the tree is not well balanced

Tree balance influences the lookup speed

# Tree balance

```1 {.name= "Paulette Petrone",  .mat_num= 4001246, …
2 {.name= "Juliet Jaquez",     .mat_num= 4007416, …
3 {.name= "Clifford Calahan",  .mat_num= 4009336, …
4 …
```

Edit the `student_register` array so that the matriculation numbers of the first eight entries are in ascending order

Do not change the other elements

`#define` `DEBUG_TBUILD` to be `true`

Does sorting the elements before inserting them improve the balance of the tree?