A pre-print released today entitled "7 Myths in Machine Learning Research"
should give some pause to those who think that the success of AlphaGo/Zero and other
AI systems indicates that The Singularity is nigh.
https://arxiv.org/pdf/1902.06789.pdf
Late response, but I didn't even notice this thread until now (can we not have an AI thread in OT?). Anyway, I don't think the "singularity is nigh" at all, but I also don't think this paper is a very alarming rebuke of the state of ML research.
Myth 1: TensorFlow is a Tensor manipulation library - in the worst case, this is an implementation problem that the TensorFlow and PyTorch teams can fix (and maybe have already fixed). However, the issue also strikes me as greatly overstated. It's extremely rare for anyone to use Newton's method or something else that needs the Hessian. They also mention SVM optimization, but everyone uses Liblinear or LibSVM for that, which are super well-designed and efficient toolboxes. And no one uses deep learning libraries for stuff like Lasso regressions. The severity and relevance of this point seems off to me.
Myth 2: Image datasets are representative of real images found in the wild - They point out some important limitations of deep learning, but this isn't really a myth. Everyone knows this isn't true. And ML people are generally quite concerned with the distributional limitations of their datasets.
Myth 3: Machine Learning researchers do not use the test set for validation - This one's pretty bad and I think it's largely correct. A lot of people, both inadvertently and deliberately, leak information from the test set in one form or another, causing overfitting and optimistic performance.
Myth 4: Every datapoint is used in training a neural network - I guess this one is kind of a myth? But I'm not sure the statement "Shockingly, 30% of the datapoints in CIFAR-10 can be removed, without changing test accuracy by much" is actually shocking. It's also something that a lot of people study. If I recall correctly, the "forgetting" phenomenon they mention has gotten a lot of attention through things like the "information bottleneck" literature. And sample importance is a hot topic. For example, computing Shapley scores for the samples. So this just is not really an "alarming" issue.
Myth 5: We need (batch) normalization to train very deep residual networks - I don't know much about resnets or how important batch normalization is. Overall, this myth seems pretty minor and esoteric to me.
Myth 6: Attention > Convolution - I think you can interpret convolution as a type of attention. But otherwise... so what? It's not like attention isn't a super important feature of recent models (for example, transformers). Or that people don't think convolution is one of the most important features.
Myth 7: Saliency maps are robust ways to interpret neural networks - I think this is one of those cases where by the time someone has written a paper like this, a lot of the field's insiders are well aware of the criticism.