Continue this week The Vergecast, Verge editor-in-chief Nilay Patel talks to Jeremy Singer-Vine, the data editor for the BuzzFeed News research unit, approximately his story that was recently published on the false comments about the online debate on net neutrality by the Federal Communications Commission.
If you have not read the piece, you should. The investigation revealed where all the fake responses in the FCC net neutrality process came from, including dead people who left comments and obscure political agents involved in the scam.
It is not really a story about net neutrality. Instead, it is about how systems designed for public participation in government are so easily scammed and what the challenges are to prevent such scams.
Nilay and Jeremy discuss why it happened, how it happened and what happens next if we want to use the Internet to encourage open access to government without corruption.
Below is a slightly edited excerpt of the conversation.
Nilay Patel: So a few weeks ago, you wrote a story with Kevin Collier BuzzFeed & # 39; The Impersonators & # 39; mentioned. This story revealed that millions of comments in the net neutrality procedure were fake and you discovered the companies that did that kind of counterfeiting.
Jeremy Singer-Vine: Turn right. So when the FCC commented on its proposal for net neutrality to repeal the net neutrality provisions from the Obama era, there were 22 million comments over the course of many months. And there was a lot of good reporting, among other things The edge and other outlets, showing that many of these comments are very funky. Some were clearly fake, because they didn't come from anyone. Some of them seemed to imitate other people. People said there were comments under their names that they certainly weren't leaving.
And then the general knowledge that there were millions of problematic comments had been known for a while, and we tried to identify them as much as possible, and in the end we focused on a particular group of nearly 2 million comments who, through our report, discovered that it were clear examples of imitation and were ultimately funded by the broadband industry.
So to give some perspective on this, it has been happening for a long time. The FCC recently won its appeal and said you could change the rules, but then Trump is chosen. He appoints Ajit Pai as chairman of the FCC. Pai is a species known as a character for the audience of the show, and they race through this procedure to change the rules. And in the course of that very fast process a lot of things happen. In particular, the FCC response servers crashed. Pai claimed that they had been attacked, but he never really released any evidence of this attack, and then everyone noticed that all these fake reactions were bubbling up.
What concerned me was that Pai said time and time again: “It's not the amount of responses we get; it is the quality of them, "which I felt as if telecoms companies such as Verizon have lawyers writing these comments. They are of higher quality. We are going to pay attention to this, not to ordinary people. But it seems that they are astroturfing by and then they had to ignore the whole thing, and they just went on with it. That was kind of the way I read it. And I read your story and it's like, "Oh, it's actually much more sophisticated than that underneath. There are actually companies, and it is their business to flood this type of open response system. "Let me know who these companies are.
Sure. Pai's comments and the general framework of quality over quantity is interesting because it is a kind of constitutional state that federal agencies are supposed to accept all comments about a new proposed rule. And they are not supposed to treat it as a vote, although many agencies report percentages that have comments with some opinion. But they should really say: & # 39; These are the perspectives that we have been given. We will take this into account in our regulations, but we do not have to treat them as a sort of binding vote. "
That is, I think, where Pai & # 39; s perspective comes from. It is based on the rules that agencies must follow. But political actors know that even if the public version of how these things work is quality over quantity, people pay attention to quantity. So there are political consultancies that have emerged over the years that help organizations regardless of political convictions, but that help people gather comments for public comment periods.
So when we saw in net neutrality that they were continuing their 22 million comments, a large proportion – nearly half – were submitted through the FCC's bulk upload system, and these were comments collected on behalf of organizations, some pro-net neutrality, some anti. The idea is the FCC system, it crashed, as you noted, is not the most user-friendly system, so you can go out and you can collect comments on behalf of an opinion or an organization, what do you have, and then submit it all at once to the FCC. Through [the Freedom of Information Act] we were able to record who had sent those bulk shipments. And very quickly when we looked through that data, this specific group of 2 million responses jumped at us because they had a huge overlap with a data breach known as the Modern Business Solutions Data Breach that happened a little earlier than that.
Those comments were submitted by a political consulting firm known as Media Bridge, and they do a number of things, including very vocally, they have written on their website, floods of agencies with comments on a topic that the client is essentially asking for.
You have this quote in the story: "Spend $ 1 million with Media Bridge and you will most likely argue more than a million people for your position." That seems to be the cheapest political advocacy ever conceived.
I mean, sure, the idea of mass commentary is not something that Media Bridge invented. It is something that people of all political beliefs have been doing for a long time, and there is legitimate use for it, which many people think is strong about something, but they don't feel like they are good writers, or they don't have the time to To sit down and work out detailed thoughts.
So it is generally accepted as politically legitimate to sign on behalf of someone else's statement that a political organization or interest group can say, "Do you agree with this statement that we were written in advance?" Enter your information and we will sign it on your behalf and send it to the FCC or another agency. "
But that is not what is happening here.
You wrote that this is one of the greatest examples of identity abuse that has ever occurred in politics.
Exactly, and these comments seemed to be as if they were from ordinary people who made these kinds of typical mass comments. At first glance, it seems no different than mass comments from other organizations. But as we dig deeper, it turned out that for more than 94 percent of the comments submitted through Media Bridge, the personal information on it exactly matched the personal information – we are talking about the name, physical address, e-mail address – with the data contained in that infringement database, the Modern Business Solutions Data Breach.
And as we dig deeper, we found a fairly clear explanation for the remaining 6 percent. So while we were reporting and talking to people whose names were on these comments, it became clear that they had not submitted them and that the most likely explanation is that the data was simply taken directly from this violation, attached to comments that were generated in a kind of Mad Lib style so that they all looked a little different – different enough that they seemed unique and were presented to the FCC.
How did you discover during the reporting process which breach they had used? Unless you only have encyclopedic knowledge, which would be great.
There is a great service available for everyone online called Have I Been Pwned. It is run by Troy Hunt, a security investigator, and what he does is collect these cracked databases as they float across the internet, figure out which email addresses have been violated in each of those individual incidents, and then provide a service to people look at: “Have I been infringed? Has this email address been infringed? ”And it gives you an idea of how secure your personal information can be, but it also allows researchers to find out if a large number of e-mail addresses are. For example, we have taken a random sample of 10,000 e-mail addresses from these comments, which in particular overlap with a given database breach. I think he has collected more than 200 infringements at this time. And we didn't come in with a bias about which violation would be relevant or even which of the sets of comments submitted to FCC would be relevant. But while we were doing our analysis, this series of comments and this specific infringement went all the way up. Nothing beats it.
Media Bridge is therefore harvesting names and e-mail addresses from a data breach.
So Media Bridge is ultimately the organization that submitted the comments. They collaborated with another company, LCX Digital Advertising Company, which, according to our reports, had become entangled in a number of other imitations. They have a disturbing history. They are run by someone who has repeatedly lied about his personal history and resume. We do not know exactly the relationship between LCX and Media Bridge and who exactly wanted it, but it seems that given what we know more about LCX and what we know more about Media Bridge, LCX has media and LCX names given to Bridge, who then submitted it to the FCC.
And they did that by taking this kind of Mad Lib generator and making emails that were just different enough to evade detection?
Good question. If it is different enough to evade detection, that does not seem to be part of the impulse. But most federal agencies use a kind of de-duplication when they try to read responses, especially if you are dealing with millions and millions of responses. I don't know exactly what can get through it and what can't, but it seemed that the purpose of these comments, the purpose of that kind of random text, was to make it harder to say, "Oh, these are all the same comments ”and to treat them all as one, instead requiring someone to read them all.
You have the text of a few of them there, and they are almost impossible to read.
Yes. It is therefore not the most advanced text generation in the world. It really is a kind of Mad Mad-like generator. If you go to the article online, you can play with the generator. We believe that we have successfully reverse-designed the algorithm or process for generating it to get an idea of how it works. But it actually switches synonymous sentences in and out, and sometimes, if those sentences are aligned, it reads like a reasonable letter. Other times there are clear grammatical problems or a kind of non-sequiturs that don't seem completely logical.
The FCC is under no obligation to verify that these are real people. I mean, your opening vignette, one of the names is a woman who died and her granddaughter is very unhappy about it.
So the FCC doesn't have to make sure they are real people at all?
No, and in fact, when people go to the FCC, people who say they have been imitated, the FCC does not only say, "It was not our duty to prevent that, but we are not going to remove it." : "This is part of the permanent public report. If you disagree with something submitted in your name, you are welcome to submit a follow-up comment that corrects the record or what you have." But the FCC not only does not verify, but does not check either to attempt Verify. There is no step in the process that, for example, marks a large entry that seems to imitate many people. There is nothing in their process that would detect that.