Find feature importance in Deep Learning

2 points

8 years ago

If I train a deep neural network on standard tabular data (csv file etc. with labeled features) is there a good way to gauge how important each feature is in a particular new instance's prediction post training?

What I would like is something similar to a linear regression where you have weights for each feature and a bias perhaps. This would of course have to be local weights etc. in the NN function.

The idea would be to explain the prediction of a specific instance by saying "The network predicted x mostly because of feature y".

Is it possible to do this by taking the partial derivative of the output with respect to each input? If so, then why did these researchers go to the trouble of generating an artificial dataset around a NN prediction to approximate the local NN function with a linear regression? https://github.com/marcotcr/lime

My instinct tells me it's not as simple as just taking the partial derivatives of a certain set of inputs with respect to the output.

What if you were to just nudge each input up and down a little to get its influence on the output and gauge each feature's significance that way?