Fixing the Firmware
Ryan Torvik - July 10, 2023 - 6 minute read
Now that we are able to run our tests in an automated way, we can fix our algorithm that uses a weird calculator device to compute a Fibonacci number.
This is the 7th part in a series demonstrating what we can do with our full system emulation framework. Our overall goal is to show how, by using our emulator, you can take advantage of advances in modern software development to write better firmware faster.
The original firmware
Here is the original firmware.
#include "lib.h"
#define COUNT 10
int main(void) {
int nums[10] = {1};
int last = 0, current = 1;
const int LEN = sizeof(nums)/sizeof(int);
for (int i = 0; i < LEN - 1; i++) {
load_register(nums[i]);
select_add();
nums[i+1] = read_result();
}
return nums[LEN - 1];
}
It is returning the number 256 instead of 55. Either the device is wrong, our test is wrong, or our firmware is wrong. Let’s take another look at the device.
The device
We described our calculator device in our previous post here.
It keeps an internal accumulator, takes an operation, and finally takes in a number to run the calculation. The device calculates the new accumulator value by combining the existing accumulator value with the given number using the given operation.
This is not how we do math in our heads or on paper. But, it is often how physical devices do operations. This difference in method is exactly why it’s so important to have robust testing when developing firmware.
It looks like our firmware is computing F[n+1] = 2 * F[n]
instead of F[n+1] = F[n] + F[n-1]
The fix
Here’s an update to the code that uses the device correctly.
#include "lib.h"
#define COUNT 10
int main(void) {
int nums[10] = {0,1};
const int LEN = sizeof(nums)/sizeof(int);
for (int i = 1; i < LEN - 1; i++) {
load_accumulator(nums[i-1]);
select_add();
load_accumulator(nums[i]);
nums[i+1] = read_result();
}
return nums[COUNT-1];
}
In each iteration of the loop, we send F[n-1]
to the device. This uses the selected operation to set the accumulator. Since we have not selected an operation, it just sets the accumulator to F[n-1]
. Next we select the add operation, this leaves the accumulator alone. Then we send F[n]
to the device which adds that value to previous value and stores it in the accumulator. This also resets the selected operation to just do assignment next time. Finally, we call read_result
which just gets the value of the accumulator.
This looks like it’s going to work!
We use our compilation method from our [previous post] to get an executable to run.
Fixing the test
Since we have rewritten the code, we need to update our test to set a breakpoint at the appropriate location and check the value in the appropriate register.
We use the same methods to disassemble the compiled firmware as we did in this previous blog.
Here is binja’s High Level Intermediate Language. You can see that we are going to care about the value in var_10
Here is the Assembly of the end of the main function. You can see that the high level IL hides the instructions at the beginning of the function, so main actually starts at 0x630
. Also, at 0x6d4
, the value of var_10
gets put into $v0
.
Let’s set our breakpoint before the return happens. Binja loads the main function at 0x630
. The last load instruction before the jr ra
instruction is at 0x6e0
, the difference is 0xb0
bytes. Since Emerson is configured to load the main function at 0x1fc0_0000
we will set our breakpoint at 0x1fc0_00b0
and then check the value of v0
.
def test_tenth_fibonacci_number(machine: Machine):
'''
set a breakpoint, run the fibonacci generator, verify the value at the end is correct
'''
expected_value = 55
assert "Paused" == machine.state
# set a breakpoint on the address where we are finished running our fibonacci algorithm
debug_point = machine.break_on_address(0x1fc000b0)
assert debug_point.id == 0
# run the machine
assert machine.go()
assert "Paused" == machine.state
assert machine.pc == 0x1fc000b0
# since we're paused at the breakpoint, check the value in v0
assert machine.get_register("v0") == expected_value`
Rerunning the test
Now we can rerun the test just like we did in the last blog. We have a docker compose file that starts the Emerson server docker image, waits for it to be up and running, then launches the Emerson python docker image with our test script inside of it.
$ docker compose up
[+] Building 0.0s (0/0)
[+] Running 2/0
✔ Container tenkey-server-1 Created 0.0s
✔ Container tenkey-python_tester-1 Created 0.0s
Attaching to tenkey-python_tester-1, tenkey-server-1
tenkey-server-1 | ackackack
tenkey-server-1 | [2023-07-06T17:39:43Z INFO actix_server::builder] starting 12 workers
tenkey-server-1 | [2023-07-06T17:39:43Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
tenkey-python_tester-1 | wait-for-it.sh: waiting 60 seconds for server:10314
tenkey-server-1 | [2023-07-06T17:39:44Z INFO actix_web::middleware::logger] 172.18.0.3 "GET / HTTP/1.1" 200 621 "-" "curl/7.74.0" 0.000400
tenkey-python_tester-1 | wait-for-it.sh: server:10314 is available after 0 seconds
tenkey-python_tester-1 | ============================= test session starts ==============================
tenkey-python_tester-1 | platform linux -- Python 3.9.17, pytest-7.3.2, pluggy-1.0.0
tenkey-python_tester-1 | rootdir: /tenkey_test
tenkey-python_tester-1 | collected 1 item
tenkey-python_tester-1 |
tenkey-server-1 | [2023-07-06T17:39:44Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /is_project_running HTTP/1.1" 200 5 "-" "-" 0.000042
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /run_project/tenkey HTTP/1.1" 200 17 "-" "-" 0.394373
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /is_project_running HTTP/1.1" 200 4 "-" "-" 0.000038
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /is_project_running HTTP/1.1" 200 4 "-" "-" 0.000040
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 73 "-" "-" 0.000198
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 34 "-" "-" 0.000182
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 49 "-" "-" 0.000158
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 73 "-" "-" 0.000163
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 42 "-" "-" 0.000201
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 34 "-" "-" 0.000154
tenkey-server-1 | [2023-07-06T17:39:45Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /stop HTTP/1.1" 200 9 "-" "-" 0.000131
tenkey-python_tester-1 | test_tenkey.py F [100%]
tenkey-python_tester-1 |
tenkey-python_tester-1 | =================================== FAILURES ===================================
tenkey-python_tester-1 | _________________________ test_tenth_fibonacci_number __________________________
tenkey-python_tester-1 |
tenkey-python_tester-1 | machine = <builtins.Machine object at 0x7f761857aa30>
tenkey-python_tester-1 |
tenkey-python_tester-1 | def test_tenth_fibonacci_number(machine: Machine):
tenkey-python_tester-1 | '''
tenkey-python_tester-1 | set a breakpoint, run the fibonacci generator, verify the value at the end is correct
tenkey-python_tester-1 | '''
tenkey-python_tester-1 |
tenkey-python_tester-1 | expected_value = 55
tenkey-python_tester-1 |
tenkey-python_tester-1 | assert "Paused" == machine.state
tenkey-python_tester-1 | # set a breakpoint on the address where we are finished running our fibonacci algorithm
tenkey-python_tester-1 | debug_point = machine.break_on_address(0x1fc000b0)
tenkey-python_tester-1 | assert debug_point.id == 0
tenkey-python_tester-1 |
tenkey-python_tester-1 | # run the machine
tenkey-python_tester-1 | assert machine.go()
tenkey-python_tester-1 | assert "Paused" == machine.state
tenkey-python_tester-1 | assert machine.pc == 0x1fc000b0
tenkey-python_tester-1 |
tenkey-python_tester-1 | # since we're paused at the breakpoint, check the value in v0
tenkey-python_tester-1 | > assert machine.get_register("v0") == expected_value
tenkey-python_tester-1 | E AssertionError: assert 0 == 55
tenkey-python_tester-1 | E + where 0 = <built-in method get_register of builtins.Machine object at 0x7f761857aa30>('v0')
tenkey-python_tester-1 | E + where <built-in method get_register of builtins.Machine object at 0x7f761857aa30> = <builtins.Machine object at 0x7f761857aa30>.get_register
tenkey-python_tester-1 |
tenkey-python_tester-1 | test_tenkey.py:44: AssertionError
tenkey-python_tester-1 | =========================== short test summary info ============================
tenkey-python_tester-1 | FAILED test_tenkey.py::test_tenth_fibonacci_number - AssertionError: assert 0...
tenkey-python_tester-1 | ============================== 1 failed in 0.54s ===============================
tenkey-python_tester-1 exited with code 1
Gracefully stopping...
Aborting on container exit...
[+] Stopping 2/2
✔ Container tenkey-python_tester-1 Stopped 0.0s
✔ Container tenkey-server-1 Stopped
Oh no! It still failed!!
Let’s take a look. After commenting out the second container in my docker compose file, I can run just the server and connect to it with my browser. I set the breakpoint, ran until the breakpoint hit, and am looking at the values on the stack.
Here you can see that we have computed these values 0
1
1
2
3
5
8
13
21
and 34
! Those are the first 10 fibonacci numbers!
000006d4 8fbf0054 lw $ra, 0x54($sp) {__saved_$ra}
000006d8 8fa20044 lw $v0, 0x44($sp) {var_14}
000006dc 8fb10050 lw $s1, 0x50($sp) {__saved_$s1}
000006e0 8fb0004c lw $s0, 0x4c($sp) {__saved_$s0}
000006e4 03e00008 jr $ra
But we are setting v0
to esp+0x44
which is the 4 byte value AFTER the 0x22.
return nums[COUNT];
OH, we are returning the element AFTER the end of our array! We can change that to
return nums[COUNT-1];
And, it turns out that 55 is the 11th fibonacci number, not the 10th. So, we can update our test to look for 34 instead of 55.
Let’s make those mods and rerun!
$ docker compose up
[+] Building 0.0s (0/0)
[+] Running 2/0
✔ Container tenkey-server-1 Created 0.0s
✔ Container tenkey-python_tester-1 Created 0.0s
Attaching to tenkey-python_tester-1, tenkey-server-1
tenkey-server-1 | ackackack
tenkey-server-1 | [2023-07-06T20:59:37Z INFO actix_server::builder] starting 12 workers
tenkey-server-1 | [2023-07-06T20:59:37Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
tenkey-python_tester-1 | wait-for-it.sh: waiting 60 seconds for server:10314
tenkey-server-1 | [2023-07-06T20:59:37Z INFO actix_web::middleware::logger] 172.18.0.3 "GET / HTTP/1.1" 200 621 "-" "curl/7.74.0" 0.000455
tenkey-python_tester-1 | wait-for-it.sh: server:10314 is available after 0 seconds
tenkey-python_tester-1 | ============================= test session starts ==============================
tenkey-python_tester-1 | platform linux -- Python 3.9.17, pytest-7.3.2, pluggy-1.0.0
tenkey-python_tester-1 | rootdir: /tenkey_test
tenkey-python_tester-1 | collected 1 item
tenkey-python_tester-1 |
tenkey-server-1 | [2023-07-06T20:59:37Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /is_project_running HTTP/1.1" 200 5 "-" "-" 0.000073
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /run_project/tenkey HTTP/1.1" 200 17 "-" "-" 0.399656
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /is_project_running HTTP/1.1" 200 4 "-" "-" 0.000042
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /is_project_running HTTP/1.1" 200 4 "-" "-" 0.000025
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 73 "-" "-" 0.000166
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 34 "-" "-" 0.000392
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 49 "-" "-" 0.000188
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 73 "-" "-" 0.000168
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 42 "-" "-" 0.000153
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "POST /run_command HTTP/1.1" 200 35 "-" "-" 0.000208
tenkey-server-1 | [2023-07-06T20:59:38Z INFO actix_web::middleware::logger] 172.18.0.3 "GET /stop HTTP/1.1" 200 9 "-" "-" 0.000095
tenkey-python_tester-1 | test_tenkey.py . [100%]
tenkey-python_tester-1 |
tenkey-python_tester-1 | ============================== 1 passed in 0.54s ===============================
tenkey-python_tester-1 exited with code 0
^CGracefully stopping... (press Ctrl+C again to force)
Aborting on container exit...
[+] Stopping 2/2
✔ Container tenkey-python_tester-1 Stopped 0.0s
✔ Container tenkey-server-1 Stopped
It worked!! The test passed!! We now have a functioning piece of firmware and an automated test to verify it’s functionality!
Conclusion
We finally have a piece of firmware that we could deploy to a piece of hardware and have faith that it will function just as we expect it to. All of this development has been done WITHOUT any hardware. Next, we’ll create a pipeline in a CI/CD. Then we can run our test, and any other tests, each time the firmware is updated and make sure the functionally stays the same.