Transfer Learning bears the similar idea as Deep Neural Network (DNN). Using a more obvious example that is image processing, what DNN (specifically Convolution Neural Network (CNN)) does at earlier layers (#hidden layers – 1) is extracting features such as edges, colours, combination of colours, combination of edges, area of focus and so on. This process could be thought as preprocessing which resulting “features” are usually very hard to comprehend. With the last layer being the output layer, second last layer works like domain mapping whereas the output layer serves as discriminating layer. What’s domain mapping exactly? For an example, a single person could have features such as age, interests, favorite movies, genre of songs, etc. This information could be used to predict state of emotion, this could also be used to estimate one’s income group. However not all the layers are relevant but in the case image, there’s a very narrow range of exploitable features as aforementioned, i.e.: edges…. These features are highly reusable across different domains of problem. For an instance, to predict types of attire like dress, shorts or to predict female or male attire, fashion-ability and so on. The second last layer would learn how the relationship of these colours, shapes and edges to the domain of problem whereas the output layer learns the decision boundary of these data points in the domain space. The benefit of transfer learning would then be the re-usability of hidden layers which could be very expensive if retrained. One just need to swap out the last 2 layers when applying to different domains of problem. Depending on the fitness of the model, one could actually varies the number of layers to swap out for, the last 2 layers is just the textbook example.