Suppose you and I were talking, face-to-face, and I said I was going down to the bank.
It's very unlikely that you would think I was going to a building to withdraw money if we
were near a river and I was carrying a fishing rod at the time. Nor would you even
bother to ask which I meant - building or river bank. I might not even be carrying the
rod that day, but you saw me with it yesterday. I'm not sure how a machine would
cope with that missing context.
I was thinking about an AI that translates text. Face-to-face conversations are a level above that. However, it would still be (theoretically) possible to pick up these clues. For example I could use videos of conversations from which an image recognition could spot the fishing rod. Or I could collect you movement profile from which I can discern taht you go from this psot to the river almost every time and almost never to a bank building. Such cues can mislead and the algorithm might be wrong, but there are plenty of misunderstandings between humans as well.
I'm beginning to see a misunderstanding here. Data selection isn't about data: it's about selection. (And algorithms don't 'know' anything. Someone needs to tell them what to do. This is called programming.) Similar how a news program isn't about news, but primarily about news selection. There is no shortage of news, but a program needs to select which (minute) part of the overall collection of data that represents news will be shown.
And any AI algorithm that deserves that name makes the selection with data. As I will explain below, the algorithm knows more about how to select than the programmer, becuase it is able to process vast amounts of data.
I gather you've never tried to use Google translate, or worked as a translator. The problem with translations is that most words have multiple meanings and the specific meaning of a word depends on the context. (Also, you may note that a translation program actually uses existing translations. In other words, it uses the work already done by actual translators.)
I have done both, and I know about the problems that exist. But if you take a text and translate it, you have the same input as the AI and the problem is to extract the context fromt he surrounding text. This is a very hard problem and I do not claim that there is an AI that can do this yet, but I see no particular reason that this should be impossible (it might be limited by available computing power, of course).
I'm not even sure what a 'language algorithm' is supposed to be. But you are right that language is a difficult subject. Algorithms can't translate texts; what they can do is select a meaning from a fixed list of meanings. That selected meaning is as likely to be wrong as right. (In fact, more likely to be wrong, but let's leave that out for the sake of argument.) The problem is with understanding the context within which a word is used, as that determines its actual meaning. In other words, the meaning of any given word is determined by the words surrounding it (as well as the order of those words). My best guess is that even a linguist couldn't program a translation algorithm (assuming that linguist has programming skills). Even if the program had a list of most probable outcomes of any given word, that would be not particularly helpful with a translation. In short: in no way would the result be more intelligent than the person programming. (The program might have a wider vocabulary though, since that would be a list, i.e. calculable.)
With language algorithm I mean rules to can compare two texts (in the same language) and say how close they are. That say how you can (or cannot) rearrange words. How to spot the strucure of a sentence and so on.
Anyway, I think you have no idea how AI works. For an AI you do not get a linguist to formalize all he knwos about the language and then put it into an algorithm. Rather you would program an algorithm that can analyze texts and deduce these rules from those. With the former apporach you are obviously limited by the knowledge of the linguist, but with the latter you can feed more and more texts into the language to improve the algorithm beyond my own understanding of language.
That's one. Another is, of course, that languages tend to be 'updated'. Meaning the meaning of actual words tend to shift over time. So any translation program would need regular updates. And again, you'd need an intelligence to execute that (or even find updated word meanings).
Of course you would need to update it (and the date when a text was written would be an important context when trying to translate it). But if you have the learning algorithm in place, there is no additional intelligence needed. You would feed the new text into the same algorithm as the old texts and let the AI gather the new meanings from those.
Lastly, the argument seems to be that the calculus is 'more intelligent' than the inventor of the calculus. That is patently absurd.
It is not, if you think about it: The best chess programm can beat any human. Surely it is more "intelligent" at playing chess than its programmers (who would probably be easily beaten by the world champion). It has not been conclusively proven, yet, but I suspect the situation will be the same soon with Go.
Since my argument about translation has met so many (not entirely invalid) objections, let me make a less hypothetical scenario, from which I actually know that it works:
Suppose I have lot of devices which each provide a bunch of technical parameters and I want to know which are broken. I have no idea about what these parameters mean and how they are connected with broken devices. I have sent someone to look at a fraction of these to check them and he has provided me with a list, which ones are broken and which are not (let's say he checked whether a light was blinking, which cannot be accessed from far away. I can use the parameters and the list of broken devices, feed them into an AI and let it learn from these data sets. If I do this correctly, the AI now has a model which of these parameters signify a broken device. Because I have only supplied it with the learing algorithm, I have no idea about that model. The guy who I sent to check the devices never saw these parameters, so he cannot know anything about this model either. Therefore, the AI is now more "intelligent" at recognizing broken devices from far away than either of us. Of course I can now try to understand the model the AI has generated, but first, I do not have to for the thing to work and second, if the model is complicated enough, I might not even be able to understand it.
So now back to language (ugh):
The difference, of course, is that a child has actual intelligence. Even at 2. a program, however, has no means to determine how a specific meaning derives from a specific context. That's because a program - unlike a child of 2 - lacks the capability of understanding anything. (Simply put: it doesn't see the connection between context and meaning.)
That is a bold statement that is easily disproven: Let's take the "bank" example from above. I feed an AI several texts which use both concepts and have accurate translations for these in a language that uses different words for both concepts. From a dictionary, the algorithm can know which words in the other language can translate "bank". I program the learnign algorithm in such a way that it only considers those usages of the word "bank", where it is clear what the translation is. From a simple word frequency analysis in the surrounding sentences the algorithm will find that one translation of "bank" will come with words like building, money, door, depositions and so on. The other one will be surrounded with river, water, sand, etc. So if it now encounters a text without a translation, it could look at the words surrounding it and then chose the word with which to translate it. The algorithm would not know what a bank actually is, but for a translation this would not be possible. Note that I did not put the words "river" or "money" in the algorithm, but I just instructed it to look at the surrounding words. The same procdure can be applied to any other words as well and also to words, which I do not know myself.
You can program an algorithm that 'understands' a + b = c, because that is not intelligence. It's logic. Oddly, logic derives from language. But language isn't logical. It has rules entirely of its own, completely unguided by logic. Unlike mathematics, the basis of all programming..
The program does not really understand a + b = c (and to be fair, not many humans do). It just follows instructions. But you can make instructions to learn things like language rules. These rules have in no way to be logical - there just have to be rules which can be learned.