Don't worry, it'll be just as easy. Each controller has three bytes of memory defined as labels; NewButtons, OldButtons & JustPressed for controller 1 and NewButtons2, OldButtons2 & JustPressed2 for controller 2. Each controller is done the same way, however, so I'll explain it for a single controller first.
NewButtons = $41
OldButtons = $42
JustPressed = $43
NewButtons2 = $46
OldButtons2 = $47
JustPressed2 = $48
The very first thing he does in the controller_test routine is to copy NewButtons over to OldButtons. This happens each time at the very start of the routine - before polling for new values from the controller - because OldButtons represents the buttons pressed last time the routine was checked.
controller_test:
LDA NewButtons
STA OldButtons
Next, he polls the first controller 8 times to get each serial bit and stores all 8 bits as a single byte in NewButtons. He does the same for the second controller and calls it "NewButtons2". (This is what my example above shows; I poll each controller 8 times and store those values in memory locations labeled con1Buttons, con2Buttons, con3Buttons, and con4Buttons.)
ConLoop:
LDA $4016
LSR
ROR NewButtons
INX
CPX #$08
bne ConLoop
He then takes the values from OldButtons, inverts them, and does a logical AND operation with NewButtons. This produces a value - stored at JustPressed - that shows only buttons that have been pressed since the last time the routine was run.
LDA OldButtons
EOR #$FF
AND NewButtons
STA JustPressed
Finally, it applies a bitmask to this JustPressed value and checks each bit individually to see if the corresponding button has been pressed. If it has been pressed then execution will fall through to the "do stuff here" area, otherwise it will jump to the next check.
CheckRight:
LDA #%10000000
AND JustPressed
BEQ CheckDown
; Do stuff here
CheckDown;
If you look at the code and compare the code for the first and second controllers you'll realize it's identical. So, to adding polling for 2 more controllers you just need to duplicate all that twice more. Obviously, you'll need to define 6 more bytes in memory, for example:
NewButtons3 = $58
OldButtons3 = $58
JustPressed3 = $5a
NewButtons4 = $5b
OldButtons4 = $5c
JustPressed4 = $5d
And you'll need to initialize each OldButtonX at the start of the controller_test routine:
controller_test:
LDA NewButtons
STA OldButtons
LDA NewButtons2
STA OldButtons2
LDA NewButtons3
STA OldButtons3
LDA NewButtons4
STA OldButtons4
Etc, etc.