固件汇编器 ASM 初学者 C 中级开发

8051 的快速整数平方根

CPallini

5.00/5 (7投票s)

2020 年 12 月 28 日

CPOL

2分钟阅读

11224

8051 汇编语言中的快速整数平方根计算

下载汇编和 C 源代码 - 4 KB

引言

一些单片机 (MCU) 应用需要快速 (即避免除法) 且使用少量指令来计算整数平方根 (sqrt) 函数。本技巧展示了 Ross M. Fossler 在 Microchip 的应用笔记 TB040 中描述的“快速整数平方根”算法的实现。

虽然该应用笔记报告了 Microchip P18C252 MCU 的源代码，但此处显示了 8051 MCU 的实现。

请注意，这里仅介绍算法的草图，因为确切的规范由（提供的）C 源代码给出。无论如何，我强烈建议您阅读原始的 Fossler 论文。

目标

目标是找到一个 16-bit 无符号数的整数平方根的近似值，将其表示为一个无符号 8-bit (byte) 数。

也就是说，16 位数字将是汇编程序的参数，数字平方根的近似值将是程序的返回值。

算法

该算法非常简单：从最高有效位（字节的位 7）开始，迭代地将“附加”位添加到 guess 数字中，然后将 squared guess 与程序参数（例如 arg）进行比较。

如果 squared guess小于 arg，那么我们将添加的位保留在 guess 中，将附加位右移并转到下一次迭代。
另一方面，如果 squared guess大于 arg，那么我们从 guess 中删除添加的位，将附加位右移并转到下一次迭代。

当 squared guess等于 arg 或没有更多附加位时（它已移出其字节），该过程终止。

代码

可以使用 C 代码轻松地制定该算法。

uint8_t fast_sqrt_c(uint16_t n)
{
  uint8_t g = 0x80; // initial guess
  uint8_t b = 0x80; // 'additional' bit is in position 7
  for (;;)
  {
    uint16_t g2 = g*g;
    if ( g2 == n )
      break;        // an exact match is found
    if ( g2 > n )
      g ^= b;       // here g*g > n, remove the added bit
    b >>= 1;        // shift right the 'addional' bit
    if ( b == 0 )
      break;        // the 'additional' bit was shifted out, the iteration is complete
    g |= b;         // add the current 'additional bit'
  }
  return g;
}

C 很有用，因为它

表示该算法的精确规范。
可以与标准 C 库的 sqrt 函数进行比较，以便完全确定该算法的正确性。
可以使用“合适的”编译器进行编译，以生成 8051 汇编。然后，可以将生成的汇编与手工制作的算法实现进行比较（以了解正确性、速度和 MCU 使用情况）。

实际上，步骤 2 是在 Linux box 上使用 GCC 执行的（提供了源代码 fast_sqrt_c_test.c 代码）。

实际上，步骤 3 是使用 SDCC 编译器执行的。

生成的代码（作为 fast_sqrt_c_sdcc.asm 提供）有点混乱，因此这里报告了一个“重新排列”的列表，在某种程度上进行了清理

; Implementation of Ross M. Fossler 'Fast Integer Square Root' algorithm 
; (Microchip's application note TB040)
;------------------------------------------------------------
;Allocation info for local variables in function 'fast_sqrt_c'
;------------------------------------------------------------
;n                         Allocated to registers r6-r7 
;g                         Allocated to registers r5 
;b                         Allocated to registers r4 
;g2                        Allocated to registers r2-r3 
;------------------------------------------------------------
;  function fast_sqrt_c
; -----------------------------------------
_fast_sqrt_c:
  mov r6,dpl      ; function argument low byte
  mov r7,dph      ; function argument high byte
  mov r5,#0x80    ; g = 0x80
  mov r4,#0x80    ; b = 0x80

C_Loop:
  mov a,r5
  mov b,a
  mul ab          ; a-b = g*g

  mov r2,a        ; here start the comparison of g2 and n ( g2 == n)
  mov r3,b        ; g2 = g*g
  cjne  a,ar6,C_NotEqual
  mov a,r3
  cjne  a,ar7,C_NotEqual
  sjmp  C_TheEnd  ; break if g2 matches n

C_NotEqual:
  clr c           ; here start the check to establish if g2 is bigger or smaller than n
  mov a,r6
  subb  a,r2
  mov a,r7
  subb  a,r3
  jnc C_NotGreater
  mov a,r4
  xrl ar5,a       ;  here (g2 > n), emove the 'added' bit

C_NotGreater:
  mov a,r4
  clr c
  rrc a           ; shift the additional bit right
  mov r4,a
  jz  C_TheEnd    ; no more 'additional' bit to add, exit
  mov a,r4
  orl ar5,a       ; here the 'additional' bit is actually added
  sjmp  C_Loop
C_TheEnd:

  mov dpl,r5
  ret ; return g;
;------------------------------------------------------------

我认为 SDCC 的实现非常好。无论如何，我们可以使用手工制作的汇编做得更好（源代码作为 fast_sqrt_hand_crafted.asm 提供）

; Hand crafted Implementation of Ross M. Fossler 'Fast Integer Square Root' algorithm 
; (Microchip's application note TB040)
; function fast_sqrt_hc
; r6-r7 is n, the function argument
; r5 is g, the guess
; r4 is b, the 'additional' bit
; r2-r3 is g2 = g*g
;
_fast_sqrt_hc:
mov r6, dpl
mov r7, dph
mov r4, #0x80
mov r5, #0x80     ; initialization: function argument n = r6-r7, g = 0x80, b = 0x80

mov a,r5

HC_Loop:
  mov b,a
  mul ab          ; a-b is g2
  clr c
  subb a,r6
  xch a,b
  subb a,r7
  jc HC_Less      ; this means g2 < n
  orl a,b
  jz HC_TheEnd    ; jump to end if (g2 == n)

  ; here g2 > n, remove the added bit
  mov a, r5
  xrl a, r4       ; the XOR operration removes the bit
  mov r5, a
HC_Less:
  clr c
  mov a,r4
  rrc a           ; here the shift of the 'additional' bit is performed
  jz HC_TheEnd    ; if a is 0 then there is no more 'additional' bit in the byte
  mov r4,a 
  orl a, r5       ; here the 'additional bit' is added
  mov r5,a
  sjmp HC_Loop
HC_TheEnd:
  mov dpl, r5
  ret

一些数字

使用 EdSim51DI 模拟器对 SDCC 生成的代码和手工制作的代码进行汇编和比较。

结果显示在下表中

程序	代码大小	平均 MCU 周期数
SDCC 生成	48 字节	170.9
手工制作	39 字节	140.4

有用的 8051 资源

历史

2020 年 12 月 27 日：首次发布