Exploiting the “big data” monopoly: YouTube, Google and Facebook are vulnerable in ways small startups are not

“Big data” is a relatively new domain in software. Programming languages such as Python and R have sharply risen in popularity as they specialize in statistical analysis of large data sets.  Fluency in these languages is a requirement for aspiring employees of Alphabet (YouTube/Google) and Facebook.

“Big data” only exists because products with widespread use such as Google, Facebook and YouTube have gotten away with mining terabytes of data from its users. Every browsing habit, search term, private message, e-mail, contact list, posted image, and even voice conversation is being analyzed and recorded. These companies know more about you than you know about you.

The most obvious uses of this large data set are targeted ads, suggested videos, and constructing your social media feed. A less obvious use is using the data as input for “crowd-sourced” AI. Examples include calculating the fastest route (based on recent travel times of other users), the best suggestion (based on data of users with similar browsing habits and interests), the best translation (based on feedback from bilingual users), and optimal image recognition (based on users identifying objects via Re-Captchas).

More sinister uses of the data include colluding with corrupt government agencies such as the NSA to enable spying, and in the most recent case of misuse, colluding with postmodern organizations to censor and demonetize content and ideas that don’t align with Marxist ideology.

Startups obviously don’t have the luxury of an initial billion user base to incorporate these same data mining techniques. When the big tech firms accrued a critical mass of users, they have monopolized the “big data” market and made it difficult for competitors to enter that market. Their collected data will never be released to the public domain since the ensuing backlash over privacy will devastate these companies. An API is granted to outside software developers, not as an act of charity but to allow the big companies to maintain their dominance in the domain and dissuade competition. The API exposes a tiny percentage of the data, but it grows the monopoly further by creating a larger dependency on the network, and also adding more sources of input data to mine.

Smaller companies can employ AI from first principles to solve their problems without having to resort to “big data”.  Graph theory is used for calculating the fastest route, parsers and grammars for translation, and so on. AI programmed from first principles are not susceptible to the same exploits as “crowd-sourced” AI.

If you don’t use first principles to teach a neural net learning basic math, but instead feed it identical inputs and observed outputs, then if you tell the neural net repeatedly that 2 + 2 equals 5, it will eventually think that 2 + 2 really does equal 5.

Now let’s apply that shortcoming to the recent rash of demonetization and censoring happening on YouTube. In yesterday’s post, I posted a video of James Damore’s comments on algorithm manipulation and his allusions to the censorious ideology that has infiltrated the company. To what degree is the algorithm targeting innocent videos because what it has been taught comes from a heavily biased source? To what degree is the algorithm programmed to have built-in bias?

Anecdotally, I have been seeing large mobs of postmodern ideologues flagging videos en masse, which skews the observed input to the demonetization algorithm’s neural net. When YouTube Heroes was introduced, the amount of snitching from these mobs was disproportionate to the amount that would come from individuals of isolated cases.

One example of this: a group called Sleeping Giants made it a priority to flag all Rebel Media videos. They contacted all of the YouTube advertisers that had ads displayed on Rebel Media’s videos to flag the videos by pulling advertisements off them. What would normally be one report and one data point for the demonetization algorithm ends up being multiple data points or one heavily weighted data point.

Because of this disproportion, there definitely is some impact on how the current demonetization algorithm flags similar videos. It is very likely the demonetization algorithm utilizes logic from the suggestion videos algorithm.  In this particular example, the AI would identify videos viewers of Rebel Media would also likely watch, and because they must be of similar content by association, they fall under the pattern of videos that will likely be flagged in the future, and thus they are automatically flagged too.

Just as the suggestion algorithm sometimes suggests videos you have no interest in watching, the demonetization algorithm occasionally flags videos that are virtually the most “advertiser friendly” possible. If you tell a neural net repeatedly that 2 + 2 equals 5, it will believe that to be the case.  If you tell a neural net repeatedly Rebel Media viewers are the most frequent cat video viewers, then some cat videos will become demonetized too.

A lot of YouTubers with uncontroversial content like PewDiePie are trying to understand this algorithm as a lot of their videos are being demonetized seemingly for no reason. It should be more understandable now that the exploitation of the AI by postmodernist mobs is causing collateral damage.

The degree to which internal politics has crept into the algorithm itself is more difficult to estimate, but as James Damore attests, there has been nefarious influence within the company.  Following the politically correct dogma results in praise, but impartiality or opposition to the dogma results in punishment.  It is reasonable to say that there is a greater than zero probability that the suggestion software is programmed to have built-in bias itself as a reflection of the internal biases within the company.

Is there anyway to stop the rampant corruption? The “big data” monopoly needs to be broken up. Spreading awareness of these exploits and supporting alternative technology is a start.

*     *     *

Dr. Jordan Peterson reminds us that we are in a time of chaos, and when the state of order is invariably out of control, the best reaction is to remain truthful and speak the truth. Don’t allow yourself to be censored. Share this post to help keep the dialogue going.

If you would like to see more content like this, please bookmark, leave a comment, consider a donation or support my work.

0 thoughts on “Exploiting the “big data” monopoly: YouTube, Google and Facebook are vulnerable in ways small startups are not”

Leave a Reply

Your email address will not be published. Required fields are marked *