The phonological voice is the voice you hear when you 'talk to yourself'. It's distinct from the sound you hear when you 'remember' a sound. You can imagine a song, but you have a much easier time singing a song to yourself.
A lot of our thinking is arranged around either language or visual stimuli, and our short-term memory(*) can contain both visual and verbal components at once. The verbal component (the phonological voice) is capable of 'remembering' a string of words about 4 seconds long easily and so if the words are short, you can remember more of them.
You'd think that remembering a series of numbers would be similarly easy, regardless of what those numbers were, but inter-language studies have shown that it's easier to remember numbers which are 'quick' to say. English has more 'quick' numbers than some other languages, and so English allows english-speakers to remember strings of numbers more easily than some other languages. Ideally, you could replace certain numbers in your own language with shorter words to get even more short-term memory benefits. (replace 'seven' with 'sept' and 'eleven' with 'onze', for example, might have long-term benefits)
*short-term memory being what you 'keep in mind' for just a second, but then would soon forget. Forgetting a phone number while on the way between the phonebook and the phone would be a failure of short-term memory