4강 Backpropagation and Neural Networks part 1

Backpropagation
- RNN
  
  → (FP, Forward Pass)
  
  ←(BP, Backward Pass, Backpropagation)
  
  Chain Rule
  
  df/dy = df/dq(global gradient) x dq/dy(local gradient)
  
  fn : 1 layer, 1 gate
  
  local gradient : FP에서 바로 구할 수 있음. 구하여 메모리에 바로 저장.
  
  global gradient : BP 시에 구할 수 있다.
  
  gradient = global gradient x local gradient (chain rule)
  
  Q. z가 여러 개의 노드라면?
  
  A. 모든 gradient를 더하면 된다.
  - f(w, x)의 형태는 sigmoid function과 거의 유사
  - sigmoid fn(h(x))을 미분하면 (1 - h(x))h(x)
  - 따라서, sigmoid gate 부분의 gradient = (1 - 0.73) x 0.73 = 0.26 x 0.73 = 0.1898
    - 0.2와 거의 유사한 것을 볼 수 있다.
    - 매우 손 쉽게 얻을 수도 있다.
  더하기 연산은 gradient distributor (gradient 전파자)
  
  곱하기 연산은 서로 반대 값(switcher) ex. L = w0 x x0 → dL/dw0 = x0, dL/dx0 = w0
  
  max 연산은 더 높은 input이 들어오는 것에 route gradient
  
  forward에서 미리 local gradient를 구해서 memory에 저장
  
  backward 때 forward 때 채운 memory를 소진한다.
  
  dL/dz = vector
  
  dz/dx = jacobian matrix
  
  Q. what is the size of the jacobian matrix?
  
  A. [4096 x 4096] (input, output : [4096 x 1])
  
  mini-batch size가 100이라면 실제로 jacobian matrix의 크기는 [409,600 x 409,600]일 것이다.
  
  Q. what does it look like?
  
  A. identity matrix와 유사 (sparse structure)
  - Assignment
  - Summary
- NN
  
  x : input layer
  
  W1 : 가중치 1
  
  h : hidden layer
  
  W2 : 가중치 2
  
  s : score, output layer
  
  노드의 개수는 실험에 따라 최적의 값을 알아내야한다.
  - non parametric approach : Nearest Neighbor
    - 하나의 클래스에 하나의 classifier가 존재한다.
    - one class - one classifier
  - parametric approach : Neural Network, CNN
    - 하나의 클래스에 여러 개의 classifier가 존재한다.
    - one class - multi classifier
  - Assignment
  cell body : soma
  
  dendrites : input
  
  axon : output
  
  cell body : 단순한 합 연산 후 activate func을 적용해서 non-linear하게 만들어서 이를 axon을 통해 다음 뉴런으로 전달
  
  w0, w1, w2는 각각의 dendrites에서 들어온 데이터가 얼마나 영향력 있는지 결정한다.
  
  activate func 중 전통적으로 sigmoid activation function이 사용되어 왔다.
  - x의 값에 따라 결국 0과 1 사이의 값으로 특정되기 때문이다.
  - 특정 뉴런의 영향력을 0과 1사이의 값 확률처럼 특정하기 쉽기 때문이다.
  - 코드