I don’t mean Nielsen. I mean the method.
Nielsen is a superb research company. Luckily for the industry, there are quite a number of superb research companies in the world. They have survived and thrived on the rigorous training they have received from the industry. What has not killed them has made them better. If the rest of industry and government were at the same average level as our research companies, the world would be in wonderful shape.
However even superb research companies have to live with dollars and cents realities. And it costs a fortune (and risks even greater fortunes) to change the methodology of a currency. The industry in general will not pay for this. Even CIMM, an idea that changes the equation – how much is yet to be seen – merely spends six to low seven figures to help along good ideas that could become better currencies because of their changed methodologies. And this process is just getting into gear, building its foundations first.
One question is how the industry can do a better job of helping currencies to improve themselves. How urgent is the need to change the methodology? This is the driving meta-question.
Because of TV Everywhere – the proliferation of other screens on which TV is now being viewed and which must therefore be included in the ratings – the common sense answer for the past few years has been, “Of course it is urgent to change the methodology – so as to capture all screen viewing.” The sellers have a more urgent stake in this than the buyers because the more of the audience that can be measured, the more impressions they can sell. But advertisers and agencies have also been fascinated by the new tech phenomenon to the point that technology has emotively reached mythical proportions in everyone’s mind.
Nielsen’s predictable reaction – which is simply good business sense given the money factors described above – is to adapt what it has to measure the new screens. Silo samples to be currency within each screen medium, and then a small-sample (affordable) overlap panel, allow fusion to put together a simulacrum of a huge singlesource sample. In classical Nielsen style, we can be sure there will be validation proofs that the grand fusion approach duplicates the real singlesource results, and so the added costs of singlesource to replace grand fusion will not be something the industry is willing to pay. This appears to be what Nielsen is planning. The question is, how good is it? Is it good enough?
Let’s leave aside the question of how good the new measurements of the small screens are. They will never in our lifetimes garner more than half the audience. Displacing the large TV screen means getting the entire human race up off its tired derriere at night. This implies a new economic order, and mental/emotional changes. Forecasting the small screens to be over half the audience in our lifetimes would be Pollyannaish given the slow rate of change – the larger the impact, the slower the change can occur – simple math really but let’s leave that for another blog. (We do not preclude the possibility of a disruption that could obliterate the logic in this paragraph. The probability of small screen dominance in our lifetimes may be small, but not zero.)
Let’s instead focus on the big TV screen. How good are the measurements we use as currency today? These methodologies are not slated to change in our lifetimes given the present equilibrium of forces. The core method is to stay the same. Around that core, the 33 other Nielsen panels will cluster, with some small-sample cross-screen singlesource, and major emphasis on fusion, which will combine all Nielsen samples into one output file. It will be as if each panelist in any one panel had given all of the information asked/measured across all silos.
So how good is the core method – the in-home static placement peoplemeter? Let’s review the evidence – some of it for set tuning meters and some of it for pushbutton peoplemeters. Both types of TV meter have Nonresponse Bias, expected to be worse in the P-Meter because work is involved so response will be lower. Of the two, only the P-Meter has Response Bias, i.e. people cannot be expected to be 100% compliant with button pushing. So for two good reasons, the set tuning meter – as Gale Metzger has taught the industry – is expected to be more accurate than the P-Meter. Anything wrong with the set tuning meter method then is a signal that the same thing is probably true to a greater degree in the P-Meter.
In the first half of the 60s, driven by Congress (the Harris Committee hearings) a large Nonresponse study was done by CONTAM (the broadcast networks and NAB). The only released finding was that Nielsen’s set tuning meter panel produced inflated estimates of TV tuning – but the level of inflation was an average of only 10% so it was deemed acceptable.
The inflation was correctly attributed to Nonresponse Bias. This means that the homes refusing the meter were different from those accepting the meter. One characteristic of the refusers was that they watched less TV, thus they didn’t think they were needed in the panel and/or didn’t care enough about TV to want to keep their favorite shows on the air. Even though there would be no human effort on their part, as this was before pushbutton peoplemeters, a significant chunk of the original predesignated sample said no, even at the time. Nielsen’s response rate at the time was higher than it is today – perhaps twice as high – so the degree of Nonresponse Bias has probably gone up.
Today there are even more reasons to not want to be in the NTI panel. You have to push buttons every time you go in and out of the room (theoretically). This is an enormous difference. It changes the nature of the panel dramatically from that of a set tuning meter panel. A set tuning meter panel is far better than a pushbutton peoplemeter panel – as Gale Metzger has observed – because it has lower Nonresponse Bias, and zero Response Bias.
The pushbutton peoplemeter has Response Bias, meaning that people do not always have the buttons in the position that reflects reality. Sometimes a second viewer wanders in without pushing his/her button. There are screen prompts that force the first viewer to log in or have their viewing interrupted. This does not apply to additional viewers. BBM, the industry nonprofit in Canada that measures electronic media as currency there, did an ARF award-winning study a few years ago using the passive Arbitron PPM and comparing its results to pushbutton peoplemeters in the same homes. There was close agreement on the first viewer but seriously lower reporting of secondary viewers in the pushbutton method.
Nielsen’s own validation study shows only about a 10% overall error rate in the position of buttons on the pushbutton peoplemeter. However, this is based on telephone calling the foxes who are guarding the henhouse and asking them to tattle on themselves. If the panelists in effect find themselves holding a smoking gun when called, and admit to button noncompliance at that moment which calculates as about 10% error, one can be sure it is higher.
The finding of 10% Nonresponse Error in the 60s when the overall response rate was higher, and at least 10% Response Error, gives one an unsettling feeling about what the size of the total error might really be.
In 1968, C.E. Hooper (inventors of the Nielsen meter who licensed it to Nielsen in 1935 for $1,000,000) did a study in New York using a 10,000 home coincidental. Back then, the telephone coincidental was considered the standard of truth for TV measurements. There were no answering machines, fewer dual earners, and different attitudes toward research. The response rate to coincidentals was 90%. There could be no memory error because what was asked was the TV viewing situation at the instant the phone rang seconds ago. The whole interview took only about a minute, less if the TV was off. There were some second sets and those were also measured. The legitimacy of NTI itself rested on validation against the telephone coincidental.
At the end of this telephone coincidental, however, another question was asked: Would you be part of our panel? All we need to do is a quick hookup of a silent meter to your TV set – your identity will always be protected and we will pay you a thousand times more than the extra electricity cost.
About half of those in the coincidental audience agreed. They were then called back and told that the study was called off. The coincidental – remember it was the most accurate method at the time, more accurate than any of today’s methods, because of people’s better attitude toward research – found stark differences by station in terms of the group that said it would take the meter versus the group that said no. One station enjoyed 40% inflated ratings as a result. Some stations were helped and others hurt if you based decisions on just the part of the sample that would have taken a set tuning meter.
Evidence and judgment both suggest it is time to start thinking again about the core of the TV currency, and how we can make it better.
Following below are details of how the coincidental meter agreer ratings compared with the two actual meter panels in the market at the time, along with the key results of the New York Coincidental Study into set tuning meter Nonresponse.
The Future of the Core TV Currency – The Big TV Screen
Here are the key results (click here for the full study):
As you can see, Station B benefits greatly from the set tuning meter Nonresponse Bias. Among the total coincidental sample its prime time rating was 11.5 (ah, the good old days). But among that part of the coincidental sample that would have accepted a set tuning meter, it went up to 18.3.
Does that mean that a real set tuning meter panel would also inflate Station B? You betcha. Look at the last two columns. There were two set tuning meter panels in New York in those days, and both of them provided inflated ratings for Station B, one of them 26% above the truth standard total sample coincidental, the other meter panel a whopping 40% inflated over the truth.
This was about five years after CONTAM found about a 10% overall inflation in NTI (using the coincidental as its truth standard). This time the inflation was the 62.8 HUT shown in the table above for the Meter Agreers, over the 55.7 shown for the total coincidental sample – now 12.7% overall inflation five years later. Possibly a trend? If so, extending that 27% increase over five years to today would forecast an inflation rate I don’t even want to think about.
Talking to classical media researchers around the market there is suspicion that in fact all of the media measurements are slightly inflated, but that they are probably all increased by about the same percentage so we can ignore it. It would be nice to schedule some date in the future by when we want to actually know the facts of the case.
Millions of dollars are at stake. “Millions” is small change on the scale of the discussion. For a small network, millions might be involved plus or minus in getting a currency that is more accurate in its reflection of what is going on with the big TV screens in the home. For a large network group, billions might be involved.
A more accurate TV currency might lower the in-home component – the largest component – of the ratings. Some would say that this disadvantages television because ratings for the other media – even the small TV screen mobile measures – are all inflated, so they will get higher allocation than they deserve, and in-home TV will get less than it deserves.
Others like me might argue that audience size is becoming less important than ROI, and that ROI will drive allocation by media type, not audience size. Plus, everyone benefits from the greatest accuracy. More dollars flow by Reverse Gresham’s Law (smart money drives out ignorant money) to the medium whose metrics inspire more confidence. And in an ROI based business such as this is fast becoming, the seller and agency will increasingly have aligned interests with the buyer, as the advertiser moves toward performance based payments.
In the earlier era, the buyer side had the preponderance of brainiac media researchers. There are never more than a hundred such people in the business at any one time. When they were on the buyer side they pushed for quality to a degree that is now a distant memory.
Today the push for quality has no bite in it. Empty words because the business people on all sides do not see its importance and are busy making money for today, for this quarter’s head-patting/bonusing or flogging/firing. And most of the brainiac media researchers are now on the seller side, where in today’s environment their inherent and never ending push for quality must be exercised with the greatest care and delicacy.
Education is clearly the answer. It’s time for the media research rock stars (I am already tired of “brainiacs”) to create their own or combined initiatives within or across their organizations as to the monetary benefits that will derive from improving the quality of the base measurement.
Best to all,