I have a numpy array(A) and a weights matrix(say m, which is sort of filter). I want to apply this filter at each element of A and get an array of neighbors multiplied by m for each element of A.
For example, if m is a 3x3 kernel, then we get:
for each (i,j), A[i,j] > array([A[i1,j1]*m[0,0], A[i1,j]*m[0,1],...,A[i+1,j+1]*m[2,2]])
So, output will have one dimension more than A.
Preferably for border cases, I need to consider partial filter(equivalent to padding with zeros). Is there any way to do this efficiently?
Here's an approach using skimage's view_as_windows
that gives us sliding windows of required kernel shape 
from skimage.util import view_as_windows as viewW
# Pad with one layer of zeros around input array
a1 = np.lib.pad(a, (1,1), 'constant', constant_values=0)
# Create 3x3 sliding windows for each elem and multiply with m.
# Reshape each window as a 9 elem list as per requirement.
out = (viewW(a1,[3,3])*m).reshape(a.shape + (9,))
Sample run :
1] Input array 
In [64]: a
Out[64]:
array([[75, 46, 74, 72, 96],
[44, 72, 41, 81, 50],
[16, 70, 22, 19, 49],
[87, 74, 78, 66, 49]])
2] Input array padded 
In [65]: a1
Out[65]:
array([[ 0, 0, 0, 0, 0, 0, 0],
[ 0, 75, 46, 74, 72, 96, 0],
[ 0, 44, 72, 41, 81, 50, 0],
[ 0, 16, 70, 22, 19, 49, 0],
[ 0, 87, 74, 78, 66, 49, 0],
[ 0, 0, 0, 0, 0, 0, 0]])
3] 3D Output array 
In [66]: out
Out[66]:
array([[[ 0, 0, 0, 0, 450, 276, 0, 220, 504],
[ 0, 0, 0, 450, 276, 444, 352, 360, 287],
[ 0, 0, 0, 276, 444, 432, 576, 205, 567],
[ 0, 0, 0, 444, 432, 576, 328, 405, 350],
[ 0, 0, 0, 432, 576, 0, 648, 250, 0]],
[[ 0, 300, 276, 0, 264, 432, 0, 80, 490],
[375, 184, 444, 264, 432, 246, 128, 350, 154],
[230, 296, 432, 432, 246, 486, 560, 110, 133],
[370, 288, 576, 246, 486, 300, 176, 95, 343],
[360, 384, 0, 486, 300, 0, 152, 245, 0]],
[[ 0, 176, 432, 0, 96, 420, 0, 435, 518],
[220, 288, 246, 96, 420, 132, 696, 370, 546],
[360, 164, 486, 420, 132, 114, 592, 390, 462],
[205, 324, 300, 132, 114, 294, 624, 330, 343],
[405, 200, 0, 114, 294, 0, 528, 245, 0]],
[[ 0, 64, 420, 0, 522, 444, 0, 0, 0],
[ 80, 280, 132, 522, 444, 468, 0, 0, 0],
[350, 88, 114, 444, 468, 396, 0, 0, 0],
[110, 76, 294, 468, 396, 294, 0, 0, 0],
[ 95, 196, 0, 396, 294, 0, 0, 0, 0]]])
4] Let's verify results. The first sliding window on unpadded region would be a[:3,:3]
. Let's multiply that against m
. After multiplication, it should be same as out[1,1,:]

In [67]: a[:3,:3]*m
Out[67]:
array([[375, 184, 444],
[264, 432, 246],
[128, 350, 154]])
In [68]: out[1,1,:]
Out[68]: array([375, 184, 444, 264, 432, 246, 128, 350, 154])
It's worth mentioning here that the 3D
array of sliding windows are simply views into the array and as such are really efficient on further operations involving those 
In [75]: np.may_share_memory(a1,viewW(a1,[3,3]))
Out[75]: True