It is an extension of a neural network into a differential computer by giving it read/write access to external memory.
Controller
It is a neural network usually an LSTM. It gets an input and produces an output like a normal NN but also deals with a memory.
Architecture
Memory
It is a large NxM matrix of real numbers where N is the number of memory locations and M is the size of vector stored at each location.
Read/write heads
Read takes data from memory and returns to controller while write head takes data from controller to modify memory
Reading/Writing
Read vector
Where Mt is the contents of the N ×M memory matrix at time t and wt be a vector of weightings over the N locations emitted by a read head at time t.
Write vector- erase vector
where 1 is a row-vector of all 1-s, and the multiplication of erase vector(whose M elements lie in range (0,1)) against the memory location acts point-wise.
add vector
Parts of memory is erased according to the weighting and then new information is added to location specified by the weighting using the add vector.
Addressing
Content Based Addressing
Each head first produces a length M key vector k that is compared to each vector Mt(i) by a similarity measure K[.,.].
where β is a positive scalar called key strength.
Location Based Addressing
Interpolation: The idea is to bled wt-1 and wc t.
where gt is a scalar parameter between 0 and 1.
Shifting
After interpolation, each head emits a shift weighting st that defines a normalised distribution over the allowed integer shifts N:
Sharpening
Each head emits one further scalar γt ≥ 1 whose effect is to sharpen the final weighting as follows: