Matrix Operations

An interactive guide to understanding fundamental linear algebra operations

Scroll to explore
01 — Scaling

Scalar Multiplication

Multiply every element by a single value. c × A

Scalar (0D) × Any Tensor (nD)Same Shape (nD)
Common in NN/Transformers:
Attention scaling — divide by √d_k to stabilize softmax
Learning rate — weights -= lr × gradients
Dropout scaling — multiply by 1/(1-p) during training
Temperature — logits / T to control softmax sharpness
Rows: 2
Cols: 3
Bij = c × Aij
Scalar c
×
Matrix A
=
Result B
3
6
9
12
15
18
🔥 Used in Transformers: Attention Scaling
In self-attention: scores = (Q @ K.T) / √d_k
The 1/√d_k is scalar multiplication to prevent large dot products from pushing softmax into tiny gradients.
# PyTorch
B = c * A
# Transformer attention scaling
scores = (Q @ K.T) / math.sqrt(d_k)
02 — Hadamard Product

Element-wise Multiplication

Multiply corresponding elements. Shapes must match exactly. A ⊙ B or A * B

Tensor (m×n)Tensor (m×n)Tensor (m×n)
Same shape required (or broadcastable)
Common in NN/Transformers:
Gating mechanisms — LSTM/GRU: gate ⊙ candidate
Dropout — activations ⊙ binary_mask
Attention masking — scores ⊙ causal_mask
GLU/SwiGLU — σ(Wx) ⊙ (Vx) in FFN layers
Rows: 2
Cols: 2
Cij = Aij × Bij
Matrix A
Matrix B
=
Result C
5
12
21
32
Click any result cell to see its calculation
A[0,0]=1 × B[0,0]=5 = 5
# PyTorch
C = A * B
# or
C = torch.mul(A, B)
03 — Inner Product

Dot Product

Multiply elements, then sum all. Returns a scalar. a · b or aᵀb

Vector (n,) · Vector (n,)Scalar
Matrix (1,n) @ Matrix (n,1)Matrix (1,1)
torch.dot() requires 1D; use aᵀ@b for 2D column vectors
Common in NN/Transformers:
Attention score — single q · k pair similarity
Single neuron — weights · inputs + bias
Cosine similarity — (a · b) / (‖a‖ × ‖b‖)
Length: 3
a · b = Σ(ai × bi) = aTb
Vector a
·
Vector b
=
Scalar Result
32
Step-by-step calculation
1 × 4 + 2 × 5 + 3 × 6
= 4 + 10 + 18 = 32
# 1D vectors
result = torch.dot(a, b)
# Column vectors (n,1): aᵀb
result = (a.T @ b).item()
04 — Tensor Product

Outer Product

Every element of a multiplied with every element of b. Creates a matrix. a ⊗ b

Vector (m,)Vector (n,)Matrix (m×n)
1D tensors, lengths can differ
Common in NN/Transformers:
LoRA — low-rank adaptation: W + A ⊗ B
Attention patterns — visualizing q ⊗ k relationships
Embedding lookup — one-hot ⊗ embedding_matrix
Len(a): 3
Len(b): 4
Cij = ai × bj
Vector a (column)
Vector b (row)
=
Result Matrix (3×4)
4
5
6
7
8
10
12
14
12
15
18
21
Hover over result cells to see calculation
Each cell C[i,j] = a[i] × b[j]
# PyTorch
C = torch.outer(a, b)
# or using einsum
C = torch.einsum('i,j->ij', a, b)
05 — Matrix Product

Matrix Multiplication

Each output is a dot product of a row from A and a column from B. A @ B

Matrix (m×n) @ Matrix (n×p)Matrix (m×p)
Inner dimensions must match (n = n)
Common in NN/Transformers (THE core operation!):
Linear layers — y = X @ W + b (every dense layer)
Q, K, V projections — Q = X @ W_q
Attention scores — scores = Q @ K.T
Attention output — output = softmax(scores) @ V
A rows: 2
A cols / B rows: 3
B cols: 2
Cij = Σk Aik × Bkj (row i of A · column j of B)
Matrix A (2×3)
@
Matrix B (3×2)
=
Result C (2×2)
58
64
139
154
Click any result cell to see its calculation
C[0,0] = row 0 of A · col 0 of B
1×7 + 2×9 + 3×11 = 7 + 18 + 33 = 58
# PyTorch
C = A @ B
# or
C = torch.matmul(A, B)
06 — Vector Product

Cross Product

Returns a vector perpendicular to both inputs. a × b

Vector (3,) × Vector (3,)Vector (3,)
3D vectors only!
Use cases (rare in LLMs, common in 3D):
3D graphics/NeRF — surface normals, camera rays
Robotics — torque, angular momentum
Physics simulations — magnetic force F = qv × B
⚠️ Not used in standard transformers/LLMs
a × b = [a₂b₃ - a₃b₂, a₃b₁ - a₁b₃, a₁b₂ - a₂b₁]
Vector a
×
Vector b
=
Result Vector
0
0
1
Step-by-step calculation
x: a[1]×b[2] - a[2]×b[1] = 0×0 - 0×1 = 0
y: a[2]×b[0] - a[0]×b[2] = 0×0 - 1×0 = 0
z: a[0]×b[1] - a[1]×b[0] = 1×1 - 0×0 = 1
# PyTorch
result = torch.cross(a, b)
07 — Transpose

Matrix Transpose

Flip rows and columns. Row i becomes column i. Aᵀ

Matrix (m×n)Matrix (n×m)
2D tensor (or specify dims for higher)
Common in NN/Transformers:
Attention — Q @ KT (transpose K for dot products)
Weight tying — output_embed = input_embed.T
Backpropagation — gradients use W.T
Batch reshaping — swap batch/sequence dims
Rows: 2
Cols: 3
Bji = Aij (swap row and column indices)
Matrix A (2×3)
=
Result Aᵀ (3×2)
1
4
2
5
3
6
Visual mapping
A[0,0]=1B[0,0]=1
A[0,1]=2B[1,0]=2
A[0,2]=3B[2,0]=3
A[1,0]=4B[0,1]=4
A[1,1]=5B[1,1]=5
A[1,2]=6B[2,1]=6
# PyTorch
B = A.T
# or
B = torch.transpose(A, 0, 1)