Sunday, December 21, 2008

What ARM Instruction Helps Atomic Operations?

See answers at:

http://www.arm.com/support/faqdev/14979.html

SWP has been deprecated since architecture v6. On a v6 or later core (ARM11 onwards) you should use the LDREX/STREX instructions instead.

LDREX and STREX: load and store exclusive see: http://www.keil.com/support/man/docs/armasm/armasm_cihbghef.htm

SWP (v3 onwards) and LDREX/STREX (v6 onwards) are instructions designed to support multi-master systems e.g. systems with multiple cores or systems with other bus masters such as a DMA controller. Their primary purpose is to maintain the integrity of shared data structures during inter-master communication by preventing two masters making conflicting accesses at the same time.

SWP provides an atomic load and store operation which can be used as the basic building block for mutexes, semaphores etc.

LDREX/STREX allow a bus master to detect that another master has written to an address it wanted exclusive access to. Again this can be used to build various higher level locks. The advantage of LDREX/STREX is that it does not prevent other transactions on the bus (a core executing SWP takes over the bus until the instruction is completed, by asserting the HLOCK signal)

These instructions are also useful on a single master system to implement mutexes, semaphores, etc. without needing to disable interrupts. In the same way they are also useful for multi-threaded systems

Examples

    MOV r1, #0x1                ; load the ‘lock taken’ value
try
LDREX r0, [LockAddr] ; load the lock value
CMP r0, #0 ; is the lock free?
STREXEQ r0, r1, [LockAddr] ; try and claim the lock
CMPEQ r0, #0 ; did this succeed?
BNE try ; no – try again
.... ; yes – we have the lock

E.g.

This code segment adds 'delta' to the value stored in shared memory at 'ptr'. It yields the original value at 'ptr'.


; r0 = result
; r1 = ptr
; r2 = delta
spin:
ldrex r0,[r1]
add r3,r0,r2
strex r4,r3,[r1]
cmp r4,0
b.ne spin

ARM Processor Architecture Knowledge

From http://www.cse.unsw.edu.au/~cs9244/06/seminars/08-leonidr.pdf

One major concern associated with memory protection is the cost of address space switching. On ARM a context switch requires switching page tables. The complete cost of page table switch includes the cost of flushing page tables, purging TLBs and caches and then refilling them. Two mechanisms were introduced to enable operating system designers eliminate this cost in some cases.
  1. The first mechanism is protection domains. Every virtual memory page or sec-
    tion belongs to one of sixteen protection domains. At any point in time, the
    running process can be either a manager of a domain, which means that it can
    access all pages belonging to this domain bypassing access permissions, a client
    of the domain, which means that is can access pages belonging to the domain
    according to their page table access permission bits, or can have no access to
    the domain at all. In some situations, it is possible to do context switch by
    simply changing domain access permissions, which means simply writing a new
    value to the domain access register of coprocessor 15.
  2. The second mechanism present in newer ARM cores is the fast context switch
    extension (FCSE) that allows multiple processes to use identical address ranges,
    while ensuring that the addresses they present to the rest of the memory system
    differ. To that end, virtual addresses issued by a program within the first 32
    megabytes of the address space are effectively augmented by the value of the
    process identifier (PID) register. FCSE allows to avoid the overhead of purging
    caches when performing a context switch; however it is still necessary to flush
    TLBs.