How LinkedIn uses Artificial Intelligence to keep NSFW content out

Jayadevan PK June 20, 2018 18 min

When you post something on LinkedIn, chances are that an algorithm made by Rushi Bhatt’s team in Bengaluru has checked if it’s kosher to be on the professional network.

It sounds easy but consider the complexity: LinkedIn has over 560 million members, 20 million companies, millions of job postings and it works in 24 different languages.

If all its millions of users seamlessly post on the platform every day, it is because LinkedIn’s algorithms, with a lot of help from humans, green-light them before the user can blink an eye.

“We have to walk this fine line between freedom of expression and not letting poor content live on the site. That makes it really complicated for everybody, including humans,” says Bhatt, an alum of Amazon and Yahoo with a Ph.D. in cognitive and neural systems from Boston University and degrees from the Tata Institute of Fundamental Research and what is today NIT, Surat.

At its worst, a poor newsfeed can drive away users. On the other hand, a good one can keep you hooked on a platform for hours. At LinkedIn, it is the job of the “Feed AI” team to maintain fidelity. Bhatt’s job is to literally keep the NSFW stuff away. It’s a problem almost all major platforms with user-generated content – be it Youtube or Twitter – struggle with.

Bhatt, 43, was the first engineer to be hired on to LinkedIn’s ‘relevance team’ in Bengaluru, where the company opened one of its largest development centers outside of the US in 2011. He reports to Deepak Agarwal, who heads the artificial intelligence efforts at LinkedIn.

Also see: Akshay Kothari, the 30 yo steering LinkedIn’s second-biggest market

LinkedIn’s move to let more user-generated content on the platform has not gone unnoticed by critics who argue that it leads to a lot of “garbage content,” on the site.

We caught up with Bhatt, 43, earlier this month to bring you a deep dive into how the professional network uses machine learning to make sure the content you see on your feed is relevant. Edited excerpts from the interview.

How are the teams structured at LinkedIn and how does the relevance team fit in?

We have the product team and engineering, which includes apps and platform that work closely with business. The core charter of the relevance team, led by Deepak Agarwal, is to build and deploy machine learning and data-driven applications at scale. Our engineers are capable of doing production engineering as well as machine learning. The relevance team mostly sits out of Sunnyvale, San Francisco, Bengaluru, Dublin, and New York — a global team.

I was the first relevance engineer in Bengaluru. The idea was to start a relevance organization in Bengaluru. At LinkedIn, we have this four-in-a-box situation where you need apps engineering, product management, relevance, and site reliability to be part of a team for it to be complete. In Bengaluru, relevance was missing. Now, we have a team of over 40 people who have picked up a lot of challenging machine learning problems.

Now, we have a team of over 40 people who have picked up a lot of challenging machine learning problems.

The focus has been on how do we merge machine learning with human intelligence. How do we really solve this complicated problem of identifying content that is more relevant to our members so they advance their professional information consumption more than anything else? The universal content filtering team which takes action on content on the platform was already present in Bengaluru. Entire LinkedIn site’s content filtering is managed from Bengaluru. They build the platform that can label content and take action. So, it made sense for us to be in Bengaluru.

How complex is this problem of content filtering?

LinkedIn is a very complex site where there’s a feed where all the user-generated content goes in and some recommended content also goes and there’s also jobs, and company pages, member profile pages and so on. So there are many exit point and entry points. Also, user generated content can also be short-form content, comments, pictures that you can post, first party videos etc. You really have to make sure that all the machine learning classification applies to all the input points and all the filtering and ranking happens at all the consumption points. The UCF (short for universal content filtering) team builds and runs all those systems that process this data. That system has to be top class. Machine learning sits on top of that. We built the team to do the machine classification.

Since machine learning task is so difficult, it can’t be just done by training classifiers on their own by generating data whichever way they want.

The other thing we had to do was the whole man-plus-machine thinking. Since machine learning task is so difficult, it can’t be just done by training classifiers on their own by generating data whichever way they want. We need highly trained professionals who can label and understand appropriate and inappropriate content as per our terms of service and policies vetted by our legal staff to comply with all the laws of all the lands.

Sometimes, you will see something on the LinkedIn feed that’s inappropriate for you. We can’t just remove it from the feed for everyone. We pass it through this human evaluation pool, really go and scrutinize if it complies with our notion of terms of service. If it doesn’t comply, we take action. If it’s really a personal preference, then that gets taken into account in a different way because it may still be useful to someone else. All this data now starts feeding the machine. We don’t burden our human judges with boring tasks. They do brain intensive classification. Only the stuff that machine learning cant. Then there is a feedback loop where we take all these judgments, train machines, and continuously update the classifiers.

We also keep track of how our classifiers are doing. Because sometimes there’s a new kind of content that comes to the classifier where it is not trained on. We keep a very close eye on our classifiers by sampling what it is doing and sending it continuously for classifications. If a classifier is doing really poorly in certain areas, then we take some action on the classifier itself.  Like retrain it or tweak it or whatever is necessary to make sure that we walk this fine line between freedom of expression and not letting poor content live on the site. That makes it really complicated for everybody including humans.

Humans can sometimes put their hand up and say, “I have no idea boss, tell me what to do.” But for classifiers, it’s very difficult. So, we keep an eye on the classifier on a daily basis and patch up the classification if we see some red flags.

Also see: The tough black box choices with algorithmic transparency in India

Can you give us the layperson version of machine learning?

If you go just with computer science, right, what you talk about is the algorithm and the behavior of the algorithm that does not change with time, right? But machine learning is a system which is a combination of algorithms and data. These are algorithms that can predict new data based on observed data. It can extract hidden patterns from the data, summarise data into concise descriptions, and optimize an action. But the main point here is that everything hinges upon data.

But the main point here is that everything hinges upon data.

So, when you, say, predict new data, based on observed data, let’s say you have been observing Bangalore traffic for the last 15 days, right? And now you want to say, okay, what’s the traffic going to be tomorrow? Tomorrow is going to be a very tricky thing because tomorrow’s a holiday. So, you can’t just say, Fridays, on an average, my traffic looks like this, because the Friday tomorrow is very different from other Fridays. So, we really have to build a classifier that is aware of holidays, where the hotspots in the traffic, and so on. These machine classifiers are really capable of making multivariate decisions by looking at all these different aspects of nature. Like you observe nature and then you try to predict things.

So, that is machine learning where you really extract hidden structure from data and don’t just memorize that it is going to take me 45 minutes.

These are all predictive mechanisms. And we do it all the time. When we are playing cricket, for instance. Where’s the ball coming in? Where is it pitching? And before the ball comes to you, your bat is ready. So, that’s the kind of thing that’s also happening with machine learning. Where you don’t have to wait until you see the data point that comes up to you. You want to build a general concept, by and large. So the concept that classifier would have learned for predicting tomorrow’s traffic should not really be like what is on Fridays, right? It would be like on weekdays, here’s the pattern that I see and on holidays here is the pattern that I see. And then as an input vector, it will get something like: okay, tomorrow’s Friday, tomorrow is a holiday tomorrow at 10 am there is a political rally, whatever. So, that is machine learning where you really extract hidden structure from data and don’t just memorize that it is going to take me 45 minutes.

A textbook example is how do you learn if you should go out and play ball. On day one, you observed that outlook was sunny and the temperature was hot, and so on and so forth. Would you want to go out and play with a ball or not? And, the answer is no, probably because it’s too hot, or whatever. Some other days, it’s overcast and hot, but it’s not very windy. And you would go ahead and play the play ball. Now, one way is to say, “Okay, I’m going to find out every possible weather condition and I’m going to remember whether it is going to be good to play ball or not.” Now, for four variables, maybe you can create a very large excel sheet, that’s fine. With hundreds of variables, like, what is a holiday? And what time is it? And is it Eid or not, then it becomes really complicated. You cannot even observe all possible combinations. So then you have to extract that pattern.

But internally, what happens is, you’re really trying to partition data points into positive outcomes and negative outcomes. We represent ‘sunny or not’ as a binary variable, 0 or 1. ‘Overcast or not’ as a binary variable and you come up with a 3,4 or n-dimensional space and then really, the task becomes like, how do I draw some curve or some line that separates the play ball instances from the non-play ball instances. And that is machine learning. Now, you can write all kinds of very different kinds of optimizations to do the best job you can.

Your task is to draw that line that separates negative outcomes from the positive outcomes.

Your task is to draw that line that separates negative outcomes from the positive outcomes. You will never do a perfect job, these are all statistical systems but you have to achieve as much perfection as you can. And that’s the action. So when you hear about these deep learning models, internally they are doing just that. But they have much more processing capabilities, much more representational power.

In LinkedIn’s context, where do you use ML to solve these problems?

So, one that you’re familiar with is: identifying the quality of the content. In Bangalore, we do several other things. One is the entire learning course search in Bangalore. When you go to LinkedIn Learning and search for Python programming or whatever, chances are, that somebody in Bangalore team built an algorithm that is running on the platform that serves the results to you. And there we have been applying a lot of deep learning neural networks.

We do quite a bit of jobs recommendation from Bangalore as well.  We work in collaboration with the US teams. A bulk of our work is also content classification for assessing the quality of content.


A bulk of our work is also content classification for assessing the quality of content.

Then there are several projects on the multimedia side as well. All this image classification requires a tremendous amount of pre-processing. Now, we are at a stage where pre-processing is also becoming machine learning. So before you would come up with some kind of human-rated features, then you will feed to machine learning. Now, more and more, we are talking about end-to-end machine learning where the only thing that goes into the classifier is the raw bitmap. Deep neural networks are so powerful because of those multiple layers.

How so?

That is the biggest advancement in machine learning in the last few years, since 2012, where the whole line of thinking has changed from really hand tuning models to building these deep learning classifiers. What has triggered that is just the availability of high-quality data.

For example, the first successful neural networks from 2012 onwards were trained on millions of images. But they can distinguish between thousand different types of things.

So it’s really the mixing of high compute, power algorithmic advancements and a ton of very clean data. Wherever we see successes, high-quality data is the primary driver. That naturally means human intelligence. So again, it’s man plus machines.

Companies like Google already have a ton of data to act on. What happens when someone posts on LinkedIn?

That’s my favorite slide. We built this slide in 2016. This helps us stay the course. And this is what we have been doing. So when we talk about machine learning, it’s all about iteration. Just learn and iterate, and iterate. And that’s all you do. So this is one of those slides, which is the overall flow that we envisioned when we started out April 2015.

The individual boxes have become more and more sophisticated as techniques improve and we also learn about our users. The columns are really different modules. So, there is all this content that is being created and uploaded to the site. There is some capability that we give to our members to flag something that they don’t like. And then there is the UCF platform and the online classifiers… so, the whole infrastructure for monitoring, classifying content throughout the LinkedIn site. And then there are the human editors and the intelligence.

In less than a second, we have applied all the classifiers that we have at our disposal for that particular content type.

(Let’s consider) you as a user came to the site, you created a piece of content. In less than a second, we have applied all the classifiers that we have at our disposal for that particular content type. If it’s cleared, then that’s the green light: you continue to display that content on the site, your connections get to see it on the feed, if you have sent out a message, then your connections get the message and so on, right?

If a classifier says this is suspicious, then we trigger the human-labeling loop. We say, hey, human, tell us: is my classifier making the right decision or not? The humans will tell us if this is good or bad. We take an immediate decision based on human judgments so that we affect the content on the site, whether to continue to show it or not. And if the content is good, then you go back and say, “Okay, classifier made a mistake. We’re going to register this mistake.” Next time, we are going to get much better either through technique improvement, or more data, usually a combination of both.

We have like millions and billions of posts right? Now we can not look at every possible triggered piece of content through human evaluation, because that becomes very expensive. So, some of our classifiers where we have enough confidence, we also automatically classify that content instead of going through human judgment. For those classifiers, to make sure they continue to operate at a precision (we strive for perfection), we periodically take samples from classifiers and send them for human labeling anyway. If it deviates too much, then that triggers some alarms. And then we will investigate further.

All of this is stored away so that there’s a paper trail and also there’s scope to train future generations of models.

We continuously monitor likes, views, if many people are hiding the content, or if many people are flagging the content, and if we see that something is off with this piece of content with all those variables, then again, we go and trigger another set of classifiers which are sensitive to those those features. So those are these vitality and content quality classifiers. And if that classifier says, “Yeah, something is not right here”, then that also goes for human review or we start taking that content to a lower ranking in the feed pending human evaluation. So there is a spectrum of actions that we can take, just to protect our members while we are investigating. Because what happens is, sometimes if something is really viral, then you don’t want to wait for a human to make that decision.

We want to do something about it right away. So we said, okay, pending, human judgment, let’s really limit distribution, so that we don’t do a lot of damage.

What is the scale at which you do this?

Every piece of content on LinkedIn. We’re talking millions of items per day. That’s pretty much all the decisions at the time of creation in less than a second. And that’s, that’s why a serious engineering platform that can classify all these content with the volume within these stipulated latency.

So what you have text, images, and videos…

Yeah. Within videos, we have video frames and text channel.

You’ll extract those frames and run it through a neural network?

Yeah, so we use all of that information. The standard way is to take some samples depending how much compute you have. You take some frames, pass them through some image classification and then take a combined vote of what the outcome of each frame is. And you put all those things and take a final decision on the entire video.

Does content still escape? Let’s say, for example: if I were to upload something, which is not safe for work, does it still pass through?

Sometimes. When it does pass we rely on that second box (refer the picture above). Let’s say classifier mistakenly cleared something but when members complain, of course, we go ahead and investigate. That’s when it gets caught. So something that is highly viral, we proactively just keep an eye on things. Whenever a member complaints, we take it extremely seriously. Sometimes it may be a false positive. Sometimes it may be genuine but we take it seriously.

You mentioned that there was an older way of doing things. And there’s a new way…

All the old and new ways have remained about the same. It’s really the sophistication that has improved. There was always classification, there was always human reviews. But we’re just getting better.

A very important technique in the machine learning world is what is called transfer learning.

A very important technique in the machine learning world is what is called transfer learning. So what you will do is, you don’t really train this entire network. You would do is really not learn all those layers, and all the coefficients and all these neurons from scratch, because that is very compute intensive and data hungry. What people have noticed is that even if you train neural networks on a very general use case, for example, Google Imagenet is one of the very standard data sets that everybody utilizes. They pre train neural networks on ImageNet and then they adapt only the last few layers with your application specific data so that you can train your model with very little data and you don’t have to relearn every layer. You really only need a few last few layers that are really making a material impact on the network.

Besides ImageNet, what data sets do you use?

We use inception, which is the model and that is industry standard. On the text side, we use similar things but we use something called fast text which is shallower than this but learns internal representations of words. We have had tremendous successes with something called long short-term memory models (LSTM).


               

Disclosure: FactorDaily is owned by SourceCode Media, which counts Accel Partners, Blume Ventures and Vijay Shekhar Sharma among its investors. Accel Partners is an early investor in Flipkart. Vijay Shekhar Sharma is the founder of Paytm. None of FactorDaily’s investors have any influence on its reporting about India’s technology and startup ecosystem.