Searching trees

Find tree structures and use them to access data more elegantly

But first:

possible solutions for the last assignment

Task 1

An array of strings

1 char *month_names[]= {
2   "January",
3   "February",
4   "March",
5   ...
6 }

Task 2

An array of structs

1 struct month_info_t {
2   char *name;
3   uint8_t num_days;
4 };
5 
6 struct month_info_t month_info[]= {
7   {.name= "January",   .num_days= 31},
8   
9 }

Task 3

Birthday countdown

1 struct date_t pivot;
2 struct date_t birthday;
3 
4 for(;;) {
5   uint16_t days_next_bd=
6     date_days_between(&pivot, &birthday);
7   date_print(&pivot);
8   date_increment(&pivot);
9 }

Task 4

Pointerized countdown

1 struct date_t {
2   uint8_t day_of_month;
3   struct month_info_t *month;
4 };

Back to the trees

Guess game

1 if(num_cur_guess == num_to_guess)
2   Serial.print("Your guess is correct!");
3 
4 if(num_cur_guess < num_to_guess)
5   Serial.print("too small");
6 
7 if(num_cur_guess > num_to_guess)
8   Serial.print("too big");

Upload the example above and follow the instructions in the serial monitor

Hint: Add a . or another symbol after the numbers you enter

Hint: The number is between 0 and 31

Strategy #1

Enter an estimate for the random number:
Your guess (1) is too small try again
Enter an estimate for the random number:
Your guess (2) is too small try again
Enter an estimate for the random number:
Your guess (3) is too small try again
Enter an estimate for the random number:
Your guess (4) is too small try again
…

Strategy #1

Can take up to n tries for a range of n numbers.

Can we do better?

Strategy #2

Your guess (16) is too big try again
Enter an estimate for the random number:
Your guess (8) is too big try again
Enter an estimate for the random number:
Your guess (4) is too small try again
Enter an estimate for the random number:
Your guess (6) is too small try again
Enter an estimate for the random number:
Your guess is correct! The number was:7
You needed 5 steps to find the number

Split the interval into two subintervals per step

Strategy #2

First Guess: 16

 ╭─(>16)─┄
─┤
 ╰─(<16)─┄

Strategy #2

Your guess (16) is too big, try again

Second Guess: 8

 ╭─(>16)─┄
─┤
 │       ╭─(>8)─┄
 ╰─(<16)─┤
         ╰─(<8)─┄

Strategy #2

Your guess (8) is too big, try again

Third Guess: 4

 ╭─(>16)─┄
─┤
 │       ╭─(>8)─┄
 ╰─(<16)─┤
         │      ╭─(>4)─┄
         ╰─(<8)─┤
                ╰─(<4)─┄

Strategy #2

Your guess (4) is too small, try again

Fourth Guess: 6

 ╭─(>16)─┄
─┤
 │       ╭─(>8)─┄
 ╰─(<16)─┤             ╭─(>6)─┄
         │      ╭─(>4)─┤
         ╰─(<8)─┤      ╰─(<6)─┄
                │
                ╰─(<4)─┄

Strategy #2

Your guess (6) is too small, try again

Fifth Guess: 7

 ╭─(>16)─┄
─┤
 │       ╭─(>8)─┄
 ╰─(<16)─┤             ╭─(>6)─╼ 7
         │      ╭─(>4)─┤
         ╰─(<8)─┤      ╰─(<6)─┄
                │
                ╰─(<4)─┄

Complete decision tree

                      ╭─(<)──╼ 1
               ╭─(<)──┤ 2
               │      ╰─(>)──╼ 3
        ╭─(<)──┤ 4
        │      │      ╭─(<)──╼ 5
        │      ╰─(>)──┤ 6
        │             ╰─(>)──╼ 7
 ╭─(<)──┤ 8
 │      │             ╭─(<)──╼ 9
 │      │      ╭─(<)──┤ 10
 │      │      │      ╰─(>)──╼ 11
 │      ╰─(>)──┤ 12
 │             │      ╭─(<)──╼ 13
 │             ╰─(>)──┤ 14
 │                    ╰─(>)──╼ 15
─┤ 16
 │                    ╭─(<)──╼ 17
 │             ╭─(<)──┤ 18
 │             │      ╰─(>)──╼ 19
 │      ╭─(<)──┤ 20
 │      │      │      ╭─(<)──╼ 21
 │      │      ╰─(>)──┤ 22
 │      │             ╰─(>)──╼ 23
 ╰─(>)──┤ 24
        │             ╭─(<)──╼ 25
        │      ╭─(<)──┤ 26
        │      │      ╰─(>)──╼ 27
        ╰─(>)──┤ 28
               │      ╭─(<)──╼ 29
               ╰─(>)──┤ 30
                      ╰─(>)──╼ 31

.py

Strategy #2

Observation: By splitting the interval in half for each iteration we are actually following a binary tree structure

For n elements we need at most log_2(n) steps to find the correct element

Using that observation

Student register

As part of a student database we need a way to search for names by matriculation number

Student register (flat)

1 struct student_t {
2   char *name;
3   uint32_t mat_num;
4 };
5 
6 struct student_t student_register[MAX_STUDENTS];

One way to structure the data is to use a simple array

This is a flat structure

Student lookup (flat)

1 struct student_t *
2 sr_flat_lookup(struct student_t *flat,
3                uint32_t mat_num)

Download the database sourcecode and find the sr_flat_lookup function

Use a for loop to iterate through all elements of flat[] while checking if the matriculation number matches mat_num

Return the current array element if the matriculation number matches

Test your implementation in the serial monitor

Hint: Add a . or another symbol after the numbers you enter into the monitor

Student lookup (flat)

1 for(size_t i=0; flat[i].name; i++) {
2   if(flat[i].mat_num == mat_num)
3     return(&flat[i]);
4 }

In the worst case (student not in database) we would have to look at every single entry

Can we do better?

Student register (tree)

1 struct student_t {
2   char *name;
3   uint32_t mat_num;
4   struct student_t *child_lt;
5   struct student_t *child_gt;
6 };

To make lookups more efficient we can hook the database entries into a tree structure

Student register (tree)

1 void sr_print_subtree(struct student_t *root,
2                       uint16_t indent)
3 {
4   sr_print_subtree(root->child_lt, indent+1);
5   Serial.print(root->name);
6   sr_print_subtree(root->child_gt, indent+1);
7 }

Download the tree based database sourcecode

Change the DEBUG_TBUILD and DEBUG_LOOKUP #define at the top of the file from false to true

Watch the output in the serial monitor and try to find out how new elements are inserted into the tree

Finding elements

Now enter some numbers into the serial monitor and try to find out how elements are looked up in the tree

Hint: Add a . or another symbol after the numbers you enter

Comparing speed

Change the DEBUG_TBUILD and DEBUG_LOOKUP #define at the top of the file back from true to false

Copy your implementation of sr_flat_lookup into the file

Compare the displayed runtimes for the two lookup algorithms

How does the position of an entry in the array influence the lookup times?

Tree balance

           ╭──┾ Denver Dolph(4037371)
        ╭──┾ Angela Applewhite(4034097)
     ╭──┾  Cyrus Carbonell(4031913)
     │  │        ╭──┾ Elois Edmond(4030462)
     │  │        │  ╰──┾ Norman Napoleon(4030020)
     │  │        │     │  ╭──┾ Tracey Thibodeau(4029685)
     │  │        │     ╰──┾ Rea Reineck(4027146)
     │  │     ╭──┾ Joshua June(4026817)
     │  │     │  ╰──┾ Matilde Micheal(4026688)
     │  │     │     ╰──┾ Magnolia Meader(4025231)
     │  │  ╭──┾ Clarisa Coulson(4023839)
     │  │  │  ╰──┾ Alix Alvarenga(4019661)
     │  ╰──┾ Janet Joynes(4015054)
  ╭──┾ Ona Odonoghue(4013096)
  │  ╰──┾ Song Sandler(4011005)
──┾ Clifford Calahan(4009336)
  │  ╭──┾ Juliet Jaquez(4007416)
  │  │  │  ╭──┾ Shelton Sarkisian(4006478)
  │  │  │  │  ╰──┾ Marilee Meng(4006268)
  │  │  ╰──┾ Tyree Troche(4004535)
  ╰──┾ Paulette Petrone(4001246)

In the tree above you can see that there are far more elements behind the child_lt-branch of "Clifford Calahan" than behind the child_gt-branch

This indicates that the tree is not well balanced

Tree balance influences the lookup speed

Tree balance

1 {.name= "Paulette Petrone",  .mat_num= 4001246, 
2 {.name= "Juliet Jaquez",     .mat_num= 4007416, 
3 {.name= "Clifford Calahan",  .mat_num= 4009336, 
4 

Edit the student_register array so that the matriculation numbers of the first eight entries are in ascending order

Do not change the other elements

#define DEBUG_TBUILD to be true

Does sorting the elements before inserting them improve the balance of the tree?