Example with MIPS, Pipelining and Branch Delay Slot

J. Doe Published at Dev

J. Doe

I am preparing for a test and have such example. Following code:

1: SLL $1, $1, 2
2: LW $2, 1000($1)
3: BEQL $2, $0, END
4: ADDI $3, $2, 1
5: MULT $3, $2
6: MFLO $4
END:
7: J QUIT
...
QUIT:
100: NOP

is executed on RISC processor (with quasi MIPS instruction set) with

five-stage pipeline
no bypassing
no dynamic scheduling
Branch Delay Slot
Additionally we know, that branch won't be taken

My task is to understand how the Branch Delay Slot works in this situation and build the correct Pipeline Diagram.

I have an official solution and it gives the following diagram with no explanation:

1: SLL $1, $1, 2         IDEMW  
2: LW $2, 1000($1)        I---DEMW  
3: BEQL $2, $0, END           I---DEMW  
4: ADDI $3, $2, 1                 IDx
5: MULT $3, $2                       IDEMW
6: MFLO $4                            I---DEMW

As far as I understand, ADDI is executed in Branch Delay Slot and is stopped after processor understands, that branch is not taken, what leads us to wrong result. My Questions here are

Am i right?
When yes, why is ADDI executed in Branch Delay Slot and not Jump?

Ped7g

The CPU keeps reading instructions sequentially, i.e. during execution (was already fetched, decoded and the remaining phases are now being processed, I don't know your exact phases, so this is just general description) of beql it will get the other part of pipe free to fetch next instruction, but the branch was not finished yet, so the PC is still pointing to the next instruction after branch -> that's the "branch delay slot".

On classic MIPS this next instruction is fetched, decoded, and executed, and meanwhile the branch may or may not modify the PC to the branch target, so the branch-delay slot instruction will get executed every time. The next instruction after it gets executed only when branching didn't happen, i.e. the PC continues after the "branch delay slot" position sequentially. In case the branch did modify the PC, the fetch+decode will take notice and decode the next instruction from new destination, so on classic MIPS the branch delay slot is only 1 instruction "big" (I have no idea if more complex MIPS CPUs can have more stages and more delay slots available, technically with 5 stage pipeline even 5 instructions delayed sounds HW possible, but it would be probably very difficult to use practically and sounds like it would create more problems than help).

The BEQL is more complex instruction, killing the delay slot instruction halfway into execution if the branching condition fails.

See http://math-atlas.sourceforge.net/devel/assembly/mips-iv.pdf page 45 for detailed description of BEQL.

So that "NullifyCurrentInstruction()" is probably that "x" in the diagram. Remaining things in diagram, I'm just guessing as I didn't study your 5 stage details, but second LW after fetching and decoding(?) finds out it depends on $1, so it waits in depend-stage for previous instruction W phase. Etc...The ADDI doesn't depend on anything, so it is executed almost parallel to BEQL, and gets killed toward the end.

But I don't understand why there's no "I" phase every time the "I" stage gets freed, looks like the "I" waits for something and in the end you have like at most 2 instructions going on at the same time.

Anyway, this is quite undecipherable without studying the technical details of the CPU used in your question, and I don't want to study it, I'm not even sure what kind of CPU you have, and where to get it's technical documentation.

edit: I will try to extract relevant part of pdf also here, to make this answer not "just link", but copying pdf may be tricky...

BEQL instruction docs of MIPS IV CPU:

Description: if (rs = rt) then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs and GPR rt are equal, branch to the target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Operation:
I:
  tgt_offset ← sign_extend(offset || 0²)
  condition ← (GPR[rs] = GPR[rt])
I+1:
if condition then
  PC ← PC + tgt_offset
else
  NullifyCurrentInstruction()
endif

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-11-27

Comments

0 comments

TOP Ranking

Article

Example with MIPS, Pipelining and Branch Delay Slot

Example with MIPS, Pipelining and Branch Delay Slot

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?