Jan Chipchase » Silence is Subjective

How long before a natural lull in the conversation starts to feel awkward?

There are of course personal, contextual and cultural differences in attitudes to silence. For example in Japan a period of silence at the end of a meeting is more likely to be a sign that everybody is in agreement and speaking into this silence-space resets the is-everyone-in-agreement timer). Compare this to the UK where the same length of time with nothing spoken is likely to be considered an awkward silence. As you might imagine the end of business meetings between less traveled British and Japanese people, with limited awareness of each other’s nuanced attitudes towards silence, can leave a bitter aftertaste. (A related issue is that meetings in Japan are more about confirming what was agreed in private, consensus building conversations prior the meeting then for discussing anything of substance, whereas the assumption of non-Japanese visitors is that work will actually be done in the meeting).

As more of our conversations parse through personally, carried devices and the services they connect to our ability to transform, filter and analyse what is being spoken evolves. Simple examples exist today – mobile phones designed for the elderly in Japan include features that can slow down conversations by up to ~30% to make them easier to follow – the conversation length stays the same compensated by shorter pauses. The iPod shuffle supports podcasts on speed.

In our sonically enhanced future perfect we’ll (naturally) be able to draw on culturally aware realtime babel fish type translation to enable those Japanese and British business persons to fillor zone out the audio as per their cultural norms. What would you do with a realtime analysis of stress levels as an indicator of lying? Technology is merely an arms race, once you know you’re being analysed you can take steps to disguise certain intentions and enhance others. May the best ~voice profiles win.

Which in a very roundabout way brings me to the photos above – taken last Dec 25th on Mt Akadake, close to the summit just before a particularly brutal storm hit the mountain. The boot + crampons belong to that day’s climbing partner a gent who was promised a light Christmas day stroll away from the hubbub of Tokyo, with little more than a light dusting of snow to contend with (sorry Burt, or perhaps more to the point, sorry Hiroko, but he did return, eventually). The night before we’d done a modest hike to the Akadake Kosen mountain hut, which takes just enough of a dent out of the trail to make a dawn ascent on the summit possible and still be back sipping beers in Tokyo by nightfall. Unless the weather closes in. Which it did. Just after we hit the summit. Three hours ahead of schedule. For about 30 – 40 minutes we were exposed struggling to find a the route down the mountain every attempt to descend thwarted by blinding snow being driven up the mountain face. With cold + windchill + exhaustion + worse weather kicking it was probably the closest I’ve been to the edge where you really don’t know how it’s all going to turn out and the clock is ticking. There was a point where I’d trekked off to scout what was to be another failed route, turned around to shout a change in direction to Burt and realised that my voice was being carried off the mountain, such were the acoustic dynamics of the storm.

An equation on this fine Los Angeles evening: what happens when you take the ability to generate directional sound i.e. a pocket around the ears of the hearer; have access to personal voice profiles i.e. start with using voice profiles to authenticate, move onto text-to-sounds-like-you-speech; and can draw on enough processing power/smart algorithms to generate spoken conversations in real time and sufficiently lip-sync what is spoken with what is heard in real time? That last bit is important – understanding is as much about seeing what is heard as hearing it. At what point can we ‘hack’ what is spoken/heard in real time so that the speaker hears their voice, but the listener hears a hacked version of their voice?

Some talk of controlling conversations. At what point is your voice is not, for all intent and purpose, not yours?