Musicians at cocktail parties

Hello Dear Reader,

If you want to follow a conversation at a cocktail party then ask a musician to listen in. If you want to know why, keep reading…

Just before I get into that…

I must tell you that I am excited about ESCOM, the big music psychology conference, which this year is taking place at the Royal Northern College of Music in Manchester. It starts tomorrow (Monday 17th August) though most of the talks begin Tuesday (18th). Lots of my friends and colleagues will be making the trip including my research team mates from Luzern, Switzerland (HSLU).

ESCOM is looking great this year with a huge variety of interesting talks to choose from over the 5 days of the conference. There are four parallel sessions and no doubt I will be running in between many of them!

I am a proud supervisor nowadays so I am looking forward especially to talks by my students Jessica Crich (live music in dementia care – Tuesday at 5pm), Georgina Floridou (A new earworm measurement scale – Wednesday 5.30pm),Tabi Trahan (How does music help us sleep? – Thursday 10.30am) and Elena Alessandri (Beethoven recordings reviewed – Friday 4.30pm). It has been my pleasure and honour to be involved in all their fascinating research studies and I can’t wait to see these women all take to the stage and spread the word about their findings!

I hope to see you there if possible, Dear Reader. If not, I will be blogginh as much as possible for you.

So…back to the question of cocktail parties and why you need a musician to overhear the best gossip!

A new paper appeared in June in Nature Scientific Reports that concerned this very topic. I was alerted to it by the senior author, the lovely Aniruddh Patel. The question at the heart of the paper is whether musicians are better able to focus their auditory attention.

Many papers over the years have suggested that musicians have more finely tuned auditory attention skills. However, apparently things in this research field are not quite as straight forward as I thought. Recent studies have failed to replicate a musician advantage in some tasks that require auditory attention (Ruggles et al., 2014).

The present paper looked into this growing controversy using an important everyday auditory attention challenge.

Our need to listen for a specific voice in a barrage of talker noise has long been termed the ‘cocktail party’ problem (CPP) – of course this problem exists in many walks of life, not just those involving mixed drinks! Following a line of discussion in a noisy classroom or trying to keep up with a conversation in a busy street. Essentially, there are multiple sources of speech all coming from different locations and you need to hone in on one in particular.

The task in a CPP experiment is to try to listen out for a certain voice or message in artificially created background noise or chatter. Patel and his team opted for a multiple-talker masking approach, which I like as it more closely matches the challenges we face in real life than some of the artificial noise masks I have read about in the past.

Wisely the authors shy away from the issue of causality, which requires a longitudinal approach rather than a between groups comparison on this kind. The aim of their paper was to determine if musicians are better able to track a message when it is buried in a series of sentences that vary in their potential to confuse.

In general masking can happen at a sensory level (energy mask – EM) or at a cognitive level (informational mask – IM).

1) An EM overlaps in degrees with the time and frequency information of the message, therefore causing confusing at the auditory periphery (within the ear & auditory cortices).

2) An IM overlaps when the content of the message characteristic (spatial location or intelligibility) causes confusion beyond the auditory periphery, presumably somewhere within cognition.

The interest in the present study was in performance differences as IM was lowered.

METHOD
In each condition the participant was asked to listen for a target short sentence such as “Jane saw two red shoes”.

They were told the target sentence would come from a speaker directly in front of them.

There were also two speakers either side of the target speaker. These was used to play the masking sentences in half of the conditions. There were four conditions, which all had similar EM:

1. Masks were intelligible (High IM) & co-located with the target (High IM) – HARDEST!

2. Masks were intelligible (High IM) & differed spatially from the target (Low IM) -INTERMEDIATE

3. Masks were un-intelligible reversed speech (Low IM) & co-located with the target (High IM) – INTERMEDIATE

4. Masks were un-intelligible reversed speech (Low IM) & differed spatially from the target (Low IM) – EASIEST!

RESULTS

In condition 1 (IM maximum) there was no difference in the performance of musicians and non-musicians. It was hard for everyone to get the message in these circumstances.

However, when the maskers were spatially separated (condition 2) the musicians performed signficantly better.

The musicians’ advantage varied by individuals, but on average was equivalent to a release of around 6 dB (decibels) for the musicians in condition 2

That might not sound like a lot of dBs, but it is a great deal larger than numbers previously reported as signficant in the literature (around 1dB) and represents a substantial advantage when trying to pick out a message in a multiple talker environment.

How about when the mask was senseless?

When the mask contained un-intelligible reversed speech the musicians showed a significant advantage when the stray sentences came from the same speaker as the message (3dB in condition 3).

However, everyone was pretty good at the task where the nonsense speech came from the separate speakers.

CONCLUSIONS

How can we interpret these findings in the real world?

This is just a bit of fun, but here we go…

Imagine you are at a rather noisy cocktail party trying to engage in a conversation with a friend.

If the crowd is reasonably spaced out and all speaking another language then there is no advantage to either of you being a musician: anyone can follow that message pretty well.

If everyone is speaking the same language and (weirdly) the crowd were all located inches/cms away from you then musician/non-musician makes no difference: that task is tough for everyone!

However, if everyone at the cocktail party is standing a reasonable distance apart and speaking the same language as you, or they were weirdly close and speaking another language, then a musician is better able to follow the message. The advantage is slightly bigger for the first of those scenarios.

The key point is that intermediate levels of IM, especially when they come from different spatial locations, do not challenge musicians as much as non-musicians. This suggests musicians have a cognitive advantage in being able to focus their auditory attention.

How do we square this result with previous reports of much smaller or non-existent musicians advantages in auditory attention paradigms? Many of those experiments used EM manipulations or IM manipulations that would have made the task easier for everyone so found no group difference (like condition 4 in the present experiment).

Combined with the present result, all these data suggests that musicians’ advantage is likely to eminate from feedback mechanisms that run from the brain down to the ear.

In effect, they have better brain-based ‘ear tuning’ mechansisms that allows them to suppress irrelevant background sounds.

But musicians are unlikely to have any advantage in processing sounds at the basic ear level where everyone is vulnerable to energy masking.

NEXT STEPS

An interesting next step would be to see how much this ear-tuning advantage is limited to speech. Carey et al. (2015) found that musicians and non-musicians performed similarly on tests that required attention to environmental sounds in noise – so perhaps musicians’ advantage is limited to verbal/musical messages rather than all sound?

If so, this might add support to theories of a neurobiological overlap between music and language processing, such as Patel’s SSIRH theory (2003).

Right – I wish you a good week Dear Reader, and keep tuned for updates from ESCOM and tweets from @drvickyw!