The other day I was having a conversation with one of the interns who works with me, and he mentioned that he is interested in speech-to-text technologies, and that he is looking to work on projects in that space.. Following that a couple of days later I suggested a project which may be interesting
“Hearing a dead language”
- We can convert speech to text
- We can then record all aspects of that speech – about the speaker, geographic origin of the speaker, etc. Am guessing the way people enunciate / pronounce / emphasize sounds and words is dependent on the kind of environment they are in – languages spoken in the desert areas may sound different from the ones spoken in the rainforest.
- We then identify patterns, based on speech samples, which correlate geo-origin with speech patterns and sounds.
- We then figure out, with some degree of certainty, what a language may sound like if spoken, based on its text representation and its geo-origin.
I know that this approach is simplistic, and there are quite a few “dots that need to be connected”, but then this recent article in the NatGeo seemed encouraging: “Does Geography Influence How a Language Sounds?“. Time to figure out, via a web search, if some of these dots are connected.
Update 19 June 2013: Came across this interesting article on “India becoming a graveyard for languages“
Update 25 June 2013: “Audio Recordings of human languages”
Update 26 June 2013: “Preserving endangered languages before they disappear”