Skip to content

analogRead - reading ADC result faster #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
VitSh7 opened this issue May 23, 2020 · 18 comments · Fixed by #345
Closed

analogRead - reading ADC result faster #344

VitSh7 opened this issue May 23, 2020 · 18 comments · Fixed by #345

Comments

@VitSh7
Copy link
Contributor

VitSh7 commented May 23, 2020

In current core adc result is read by combining two bytes ADCH and ADCL, that way adds two unnecessary commands in assembler code. But avr-gcc compiler provides a way to access 16-bit ADC register through ADC or ADCW as long as it is defined. avr/io libraries provide those defines.

#ifndef __ASSEMBLER__
#define ADC     _SFR_MEM16(0x78)
#endif
#define ADCW    _SFR_MEM16(0x78)

It is shown below the difference in reading adc result as a combining two bytes or reading directly. As you can see there are two more command LDI and OR when combining two bytes, which are completely unnecessary.

x= (ADCH << 8) | ADCL; 
000004B0  LDS R24,0x0078	Load direct from data space 
000004B2  LDS R18,0x0079	Load direct from data space 
000004B4  LDI R25,0x00		Load immediate 
000004B5  OR R25,R18		Logical OR
00000086  STS 0x0123,R25	Store direct to data space 
00000088  STS 0x0122,R24	Store direct to data space 

x=ADC; 
000004B0  LDS R24,0x0078		Load direct from data space 
000004B2  LDS R25,0x0079		Load direct from data space
00000086  STS 0x0123,R25		Store direct to data space 
00000088  STS 0x0122,R24		Store direct to data space

I can take care of this fix, if it is needed.

@matthijskooijman
Copy link
Collaborator

Would be nicer if the compiler would just optimize it. I wonder if the volatile nature of these variables somehow prevent this...

One thing to consider: Is this register protected against partial reads? Some registers require that you first read the lower half and then the upper (or the other way around, dunno) to prevent updates while reading. If so, using separate registers might have more guarantees (though maybe only when both are read in two different statements, dunno what guarantees volatiles give exactly).

@VitSh7
Copy link
Contributor Author

VitSh7 commented May 23, 2020

Well, in analogRead variables "high" and "low" are not declared as a volatile. But the compiler doesn't optimize the equation (high << 8) | low anyway. You can make sure of it by disassembling the output .elf file.

And yeah, ADCL must be read first, but the compiler deals with it good when using macros _SFR_MEM16(0x78). You can see it from the code above, first instruction is LDS R24,0x0078 which loads in R24 data stored in register with address 0x78 (which is the ADCL in case of atmega328). All needed defines for each avr mc have already been provided by avr/io library. So only wiring_analog.c should be changed.

@matthijskooijman
Copy link
Collaborator

Well, in analogRead variables "high" and "low" are not declared as a volatile.

Yeah, they are. _SFR_MEM16 resolves to _MMIO_WORD, which is:

#define _MMIO_WORD(mem_addr) (*(volatile uint16_t *)(mem_addr))

See https://www.nongnu.org/avr-libc/user-manual/sfr__defs_8h_source.html

And yeah, ADCL must be read first, but the compiler deals with it good when using macros _SFR_MEM16(0x78). You can see it from the code above,

Yeah, it does it correctly in this particular compilation, but I' wondering if it makes any guarantees (I suspect not). If not, it might end up loading them in a different order on another sketch, or a future compiler version maybe. Though maybe the language does not make any guarantees, but gcc could intentionally implement it like this to make these macros work as expected, of course, so maybe that's enough. Just saying that this is something that might need to be verified.

@VitSh7
Copy link
Contributor Author

VitSh7 commented May 24, 2020

Oh, I'm afraid we were talking about different variables. Macros like ADC, ADCL, ADCH are indeed volatile pointers to memory area. But I was talking about variables high and low declared in analogRead function as uint8_t. So in function analogRead these variables contain data from ADCL and ADCH. And it would be great if the compiler optimize the equation (high << 8) | low, but he doesn't, even though variables are non-volatile.

uint8_t low, high;

low  = ADCL;
high = ADCH;

return (high << 8) | low;

I have investigated and found out that gcc has had feature of proper reading and writing access order to 16-bit registers for AVR-architecture since 2005. Here is the topic related with this fix. According to the ChangeLog:

Modified files:
gcc : ChangeLog
gcc/config/avr : avr.c avr.md
Log message:
PR target/20288
*config/avr/avr.c (print_operand): Add 'p' and 'r'.
(out_movhi_r_mr): Read low byte of volatile MEM first.
(out_movhi_mr_r): Write high byte of volatile MEM first.

In file avr.c in the definition of the function out_movhi_r_mr is said

"volatile" forces reading low byte first, even if less efficient,
for correct operation with 16-bit I/O registers

I have not dug deeper, but I think, it's clear that as long as the pointer to 16-bit register declared as volatile there would be proper order of reading.

@matthijskooijman
Copy link
Collaborator

Hm, thanks for digging up that bug report. It does seem that gcc intentionally gets the ordering right, but it also clearly shows what I was afraid of: That the gcc folks are not really keen on having such exceptions in the codebase, so there seems a small but realistic risk of this guarantee being removed (or forgotten) at some point...

Also, I wondered if we could not just get the optimizer to handle this correctly somehow. I tested with gcc 5.4.0 (Debian default), which indeed seems stubborn, but it seems that gcc 7.3.0 (used by current avr core) does get it right here, without any additional massaging (though with just a minimal example):

matthijs@grubby:~$ cat foo.c
#include <avr/io.h>

unsigned int foo() {
    unsigned char h = ADCH;
    unsigned char l = ADCL;
    return (h << 8 | l);
}
matthijs@grubby:~$ ~/.arduino15/packages/arduino/tools/avr-gcc/7.3.0-atmel3.6.1-arduino5/bin/avr-gcc -mmcu=atmega328p -Os foo.c -c && avr-objdump -d foo.o

foo.o:     file format elf32-avr


Disassembly of section .text:

00000000 <foo>:
   0:   90 91 79 00     lds     r25, 0x0079     ; 0x800079 <__SREG__+0x80003a>
   4:   80 91 78 00     lds     r24, 0x0078     ; 0x800078 <__SREG__+0x800039>
   8:   08 95           ret

What compiler are you using for testing?

@VitSh7
Copy link
Contributor Author

VitSh7 commented May 25, 2020

I'm afraid that my first example was from atmel studio where there is gcc 5.4.0. But then i tested the output elf. files from arduino IDE with gcc 7.3.0 and this version actually does make it worse! See what happened with your example if you try to read this expression into the variable. Here is the code

#include <avr/io.h>

unsigned int x=0;

int main() {
    unsigned char l = ADCL;
    unsigned char h = ADCH;
    x=(h << 8 | l);
}

And here is the output, see that part with tripple eor, that's what I'm talking about.

C:\Users\Vitve>"C:\\Users\\Vitve\\AppData\\Local\\Arduino15\\packages\\arduino\\tools\\avr-gcc\\7.3.0-atmel3.6.1-arduino5/bin/avr-gcc" -mmcu=atmega328p -Os "D:/foo.c" -c && avr-objdump -d foo.o

foo.o:     file format elf32-avr


Disassembly of section .text.startup:

00000000 <main>:
   0:   90 91 78 00     lds     r25, 0x0078     ; 0x800078 <__SREG__+0x800039>
   4:   80 91 79 00     lds     r24, 0x0079     ; 0x800079 <__SREG__+0x80003a>
   8:   89 27           eor     r24, r25
   a:   98 27           eor     r25, r24
   c:   89 27           eor     r24, r25
   e:   90 93 00 00     sts     0x0000, r25     ; 0x800000 <__SREG__+0x7fffc1>
  12:   80 93 00 00     sts     0x0000, r24     ; 0x800000 <__SREG__+0x7fffc1>
  16:   90 e0           ldi     r25, 0x00       ; 0
  18:   80 e0           ldi     r24, 0x00       ; 0
  1a:   08 95           ret

Actually function like x=foo() works fine. But anyway it's better to see how it works on real application. I have next sketch

void setup() {
  Serial.begin(9600);
}

void loop() {
  int x=analogRead(0);
  Serial.println(x); 
}

I have compiled this sketch in arduino IDE 1.8.12 then disassembled output .elf file with following commands

set PATH=%PATH%;C:\Program Files (x86)\Arduino\hardware\tools\avr\bin\

avr-objdump -C -d "C:\\Users\\Vitve\\AppData\\Local\\Temp\\arduino_build_488828/sketch_1.ino.elf"

Here is the piece of the output code.

 54c:   90 91 78 00     lds     r25, 0x0078     ; 0x800078 <__TEXT_REGION_LENGTH__+0x7e0078>
 550:   80 91 79 00     lds     r24, 0x0079     ; 0x800079 <__TEXT_REGION_LENGTH__+0x7e0079>
 554:   89 27           eor     r24, r25
 556:   98 27           eor     r25, r24
 558:   89 27           eor     r24, r25
 55a:   2c 01           movw    r4, r24
 55c:   09 2e           mov     r0, r25

As you can see between reading ADC values and putting them somewhere else there are three eor instructions. These three instructions swap the contents of the r24 and r25 that seems pretty pointless. It make even less sense than it was by gcc 5.4.0 with those instructions

000004B0  LDS R24,0x0078	Load direct from data space 
000004B2  LDS R18,0x0079	Load direct from data space 
000004B4  LDI R25,0x00		Load immediate 
000004B5  OR R25,R18		Logical OR
00000086  STS 0x0123,R25	Store direct to data space 
00000088  STS 0x0122,R24	Store direct to data space

@matthijskooijman
Copy link
Collaborator

And here is the output, see that part with tripple eor, that's what I'm talking about.

Wow, that's quite horrible. I think the triple eor is a way to swap two registers without needing a third one, too bad that there is absolutely no need...

Hm, so much for my optimism. And using the ADC register makes all of this go away? :-S

@VitSh7
Copy link
Contributor Author

VitSh7 commented May 25, 2020

Yeah, with ADC this works fine. I have changed function analogRead to return ADC and disassembly looks good - it reads value properly without any eor. To make sure I have made another example with sketch like that

void setup() {
}

volatile int x=0;

void loop() {
  x=analogRead(0);
}

So I have declared variable as volatile to make sure that compiler writes ADC value somewhere at the data area. Here is the output result with original wiring_analog.c.

 1c6:   90 91 78 00     lds     r25, 0x0078     ; 0x800078 <__TEXT_REGION_LENGTH__+0x7e0078>
 1ca:   80 91 79 00     lds     r24, 0x0079     ; 0x800079 <__TEXT_REGION_LENGTH__+0x7e0079>
 1ce:   89 27           eor     r24, r25
 1d0:   98 27           eor     r25, r24
 1d2:   89 27           eor     r24, r25
 1d4:   90 93 01 01     sts     0x0101, r25     ; 0x800101 <_edata+0x1>
 1d8:   80 93 00 01     sts     0x0100, r24     ; 0x800100 <_edata>

And here it is with changed analogRead to return ADC instead of return (high << 8) | low.

 1c6:   80 91 78 00     lds     r24, 0x0078     ; 0x800078 <__TEXT_REGION_LENGTH__+0x7e0078>
 1ca:   90 91 79 00     lds     r25, 0x0079     ; 0x800079 <__TEXT_REGION_LENGTH__+0x7e0079>
 1ce:   90 93 01 01     sts     0x0101, r25     ; 0x800101 <_edata+0x1>
 1d2:   80 93 00 01     sts     0x0100, r24     ; 0x800100 <_edata>

@ArminJo
Copy link

ArminJo commented Oct 30, 2020

On my windows PC with current Arduino 1.8.13
analog read compiles to:

	low  = ADCL;
 1b6:	80 91 78 00 	lds	r24, 0x0078	; 0x800078 <__DATA_REGION_ORIGIN__+0x18>
C:\Users\Armin\AppData\Local\Arduino15\packages\arduino\hardware\avr\1.8.3\cores\arduino/wiring_analog.c:89
	high = ADCH;
 1ba:	90 91 79 00 	lds	r25, 0x0079	; 0x800079 <__DATA_REGION_ORIGIN__+0x19>
C:\Users\Armin\AppData\Local\Arduino15\packages\arduino\hardware\avr\1.8.3\cores\arduino/wiring_analog.c:98
	high = 0;
#endif
	// combine the two bytes
	return (high << 8) | low;
}
 1be:	08 95       	ret

geenrated with ...avr-gcc\\7.3.0-atmel3.6.1-arduino7/bin/avr-objdump" --disassemble --source --line-numbers --demangle --section=.text ".../AnalogInOutSerial.ino.elf" > ".../AnalogInOutSerial.ino.lss"

So what is the problem?

Please close the issue and the PR.

@matthijskooijman
Copy link
Collaborator

Hm, it might be that the sketch around it makes a big difference? I.e. depending on how you use the result of analogRead, the problem might or might not occur? This is likely when it is inlined, since then the surrounding code can make a big difference in register usage and might force the register allocator to do weird things maybe?

When I compile the AnalogInOutSerial example like @ArminJo (using a fairly recent arduino IDE from git and the same 1.8.3 AVR core and 7.3.0-atmel3.6.1-arduino7 gcc as @ArminJo), I get:

        low  = ADCL;
 70c:   80 91 78 00     lds     r24, 0x0078     ; 0x800078 <__DATA_REGION_ORIGIN__+0x18>
        high = ADCH;
 710:   a0 91 79 00     lds     r26, 0x0079     ; 0x800079 <__DATA_REGION_ORIGIN__+0x19>
        low  = 0;
        high = 0;
#endif

        // combine the two bytes
        return (high << 8) | low;
 714:   ba 2f           mov     r27, r26
 716:   a8 2f           mov     r26, r24
  Serial.begin(9600);
}

void loop() {
  // read the analog in value:
  sensorValue = analogRead(analogInPin);
 718:   b0 93 38 01     sts     0x0138, r27     ; 0x800138 <sensorValue+0x1>
 71c:   a0 93 37 01     sts     0x0137, r26     ; 0x800137 <sensorValue>

Which is slightly better than the original cases (no 3x eor), but still swaps registers using mov.

@ArminJo, I'm surprised that you're not seeing the same result. Also, in your case I see a ret instruction, which suggests analogRead was not inlined. Did you modify the AnalogInOutSerial example in any way? I see only one analogRead call in it here, which should pretty much guarantee to get it inlined, which happens here, but not for you for some reason?

When I compile the last example from @Vitve4, I get the same 3xeor output as them. That's this example:

void setup() {
}

volatile int x=0;

void loop() {
  x=analogRead(0);
}

@ArminJo
Copy link

ArminJo commented Oct 30, 2020

Sorry, yes you are right, I also had the same result with the original AnalogInOutSerial but if you add another analogRead, the code will not be inlined and then you get my result.
Inlined code generated by the GCC often looks strange, but I doubt, the proposed change will cure this.

@matthijskooijman
Copy link
Collaborator

Sorry, yes you are right, I also had the same result with the original AnalogInOutSerial but if you add another analogRead, the code will not be inlined and then you get my result.

Well, it does seem to. And I can imagine it does, since using the ADC register does a single read from a volatile uint16_t*, rather than two volatile uint8_t* reads that are then combined into a uint16_t (and the former probably makes it easier for gcc to figure out what's happening and remap the registers to make things simpler).

Note that I think gcc should just be able to figure this out and optimize this, so in essence this is a missed optimization in gcc. But if using the ADC register makes it better, and the correctness is still (de facto) guaranteed, then changing this code to use ADC instead doesn't sound bad to me (in essence it's just using this 16-bit version of the register access as intended and the code does not become less readable IMHO).

@matthijskooijman
Copy link
Collaborator

I just reviewed the PR, looks good to me.

In addition to fixing this here, it would still be good if gcc could fix this as well. Anyone care to look through the gcc bugzilla to see if this issue is maybe already reported and/or report it if not? And if anyone has a newer avr-gcc version to test, that would also be valuable.

@ArminJo
Copy link

ArminJo commented Oct 31, 2020

I would like to put my veto on it.
What is the sense of saving 4 to 6 bytes for programs, that use only one analogRead. I can not imagine a huge program that has only 1 analogRead, but really requires the 6 bytes we save with this PR.

@per1234
Copy link
Contributor

per1234 commented Oct 31, 2020

It's common for Arduino sketches to use a call to analogRead() as a seed for the pseudo-random number generator:

randomSeed(analogRead(A0));

That sketch may well have no other use of analogRead(), and just as well may be huge (keeping in mind that it doesn't take much to be "huge" relative to the amount of memory on some of the parts supported by this core).

@matthijskooijman
Copy link
Collaborator

matthijskooijman commented Oct 31, 2020

I would like to put my veto on it.

@ArminJo Do you see any downside for this change? You seem to be heavily against this, but I see only the argument that the gain would little, but if there is no cost, then why not make the change? Is it just the cost of making the change that you see, or is there something else?

@ArminJo
Copy link

ArminJo commented Oct 31, 2020

After forther testing and with per1234's argument in mind and sleeping on it I changed my mind and think it is a useful change!
So I hope that it will be accepted by the gods of Arduino. 😀

VitSh7 added a commit to VitSh7/ArduinoCore-avr that referenced this issue Nov 1, 2020
7.3.0-atmel3.6.1-arduino7 gcc fails to optimize separate reading from ADCL and ADCH. It produces additionally three eor commands or in some cases two mov commands in the assembly code (see discussion arduino#344). These commands swap register contents before store them to data area. So they are completely unnecessary. Reading ADC result with ADC macro fixes it and gcc generates the right code..
VitSh7 added a commit to VitSh7/ArduinoCore-avr that referenced this issue Nov 1, 2020
7.3.0-atmel3.6.1-arduino7 gcc fails to optimize separate reading from ADCL and ADCH. It produces additionally three eor commands or in some cases two mov commands in the assembly code (see discussion arduino#344). These commands swap register contents before store them to data area. So they are completely unnecessary. Reading ADC result with ADC macro fixes it and gcc generates the right code..
@matthijskooijman matthijskooijman linked a pull request Nov 2, 2020 that will close this issue
@matthijskooijman
Copy link
Collaborator

Fixed by #345

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants