Answer: We may never know. But there are some intriguing clues, thanks to archaeological linguists and forensic anthropologists. From an article dated March 15, 2019, in Archaeology Magazine:
"...the spread of agriculture and consumption of easier-to-chew foods may have led to changes in human jaws and their arrangement of teeth, which in turn allowed people to make new sounds and create new words....chewing tough, gritty food would have put force on hunter-gatherers’ lower jaws, making the bone grow larger so that the upper and lower teeth aligned in an “edge-to-edge” bite. Such a bite would have made it hard to push the upper jaw forward to make the sounds “f” and “v”...
The linguists speculated that “f” and “v” sounds were first made accidentally by wealthy people who ate soft foods and therefore had weaker lower jaws. (In contrast, the speech sounds made by an "edge-to-edge" bite resembles the clenched-teeth speech I have heard in individuals appearing to be from the US Northeast upper classes. So perhaps these "non-wealthy" ancient hunter-gatherers sounded more like Katherine Hepburn or Thurston Howell III, themselves.)
/s/ webmaster [Image via sciencemag.org]