The Flipkart data view: Q&A with Mayur Datar, AI team

This time, last year, when FactorDaily had interviewed Flipkart’s chief data scientist Mayur Datar, the Indian e-commerce firm had less than two dozen data scientists on its rolls. Today, that number has more than doubled — and is expected to grow at a similar rate over the next 12 months.

And even that is not enough, given the scale of challenges that Flipkart deals with on a daily basis. Over the past year, Flipkart’s focus on artificial intelligence and machine learning has gone up several notches, as the online retailer looks to solve new problems for customers, while simultaneously fending off new cyber security threats.

In an interview , Datar, along with two principal architects at Flipkart, Regunath B and Utkarsh B, speak to FactorDaily reporters of the new initiatives that Flipkart is currently working on, potentially including a new virtual assistant and new voice-based features Flipkart will launch to target tier-2 and tier-3 Internet users, how Flipkart is battling malicious bots regularly, and how Flipkart is developing tools such as demand forecasters to better manage its own inventory. Edited excerpts:

What are some of the new things Flipkart is doing with AI?

Utkarsh: The AI revolution is going through in phases. Right now the focus is on building a lot of business intelligence, backed by data centricity. Obviously, Flipkart has been in e-commerce for a long time and we have an abundance of data, it’s really a goldmine for us….Many of the constructs that come into play when we are scaling — repeatability, sustainability and predictability — for all of these, if you put humans in the loop, it’s not possible because humans will differ from each other. For example, if I take packages out of a warehouse, there are guys who pick products, there are guys who pack products, and then maybe hand over to the transport guys. So, in all of these, if I ride on how competent these guys are and just ride on that, it will not fly. Because every year, there is additional scale. Instead, can I codify what they do, the kind of decisions they make, etc? If you start codifying all of these best practices, this is where AI comes into play…that’s business intelligence and one of our biggest focus areas right now.

Tell us about the complexity of the data that Flipkart deals with? Also, the size and scale of the problems that you deal with?

Datar: I think it’s fairly common for most consumer internet companies to capture most of the human interaction with products. If you look at our storefront, which means our app and our mobile desktop site, then anytime you click on a particular search, you click on a product, add to cart, etc — all of these events get captured, alongwith the context on who’s the user, what device is being used etc. We have hundreds of events that get captured and, of course, the volume of the event varies…..And it’s a funnel view — people come to the product page, add to cart and finally check out.

The funnel narrows down as you go on — so there’s one complexity that way. There is also a lot of business processes that we deal with. Besides being a simple consumer internet company, we also look at the inwarding, retailing aspects of it. All of those need to be captured as well. And then there’s the fulfillment side of it — which means that when an order is placed it goes through the motherhub to the fulfilment centre to your delivery hub to your last mile. So, you can imagine that there are tens of hundreds of these events in each of these aspects. And then are other things — such as affordability and fintech.

So, long story short, e-commerce as a platform is like having 5 or 6 big industries under one roof — and each of these industries have a fair amount of complexity in terms of the amount of data they generate. So, a lot of length and breadth in the data that we capture. In terms of volumes, every day, we capture several terabytes of data, and during the sale events, there’s another factor of 5 to 10 that the same number gets multiplied by.

In terms of your AI roadmap from last year to now, what kind of progress have you made? Looking into the future, where do you see things going?

Regunath: Flipkart, as we know it, is made of two parts — one is the customer funnel, and the other is the post-order path. For us, AI and machine learning is applied across both these. I’ll give you specific examples — right from the time a person signs up for a new address, we have to detect whether this guy is a potential re-seller or a potential fraud, because the moment they have an address, then they get a delivery destination to go to the next line of making a fraud: they can probably abuse returns and stuff. So, we have our models go through it from the start to determine whether it’s a fraudulent address. Now for genuine users, we are trying to discover products — discovery is all about being able to innately judge what the intent is….

Then on the pricing side, we need to have the ability to protect against abuse from sellers. For example, to show high discounts, sellers can bump up the MRPs (maximum retail price). Over the last year, we have also seen a huge jump in bot traffic. People are trying to use bots to buy products, so that they can sell the products offline….So, we have interventions that are machine learnt. For example, when it comes to post order, right from pricing, today almost 70% of our fashion apparel is priced automatically by a machine.

How did you solve the bot problem?

Regunath: Bots are bad for various reasons — for instance, they bring artificial synthetic traffic to our systems and they continue to load our systems even after the sale period is over and done. For us, it was starting to hurt us. After the flash sale was over, these bots were consuming critical resources in a way that regular shoppers were getting affected. Then we implemented algorithms, mechanisms to distinguish between what is a bot and what is human. And now we run that as a BAU (business as usual). Today in a steady state, we have bot detection in force — that and resellers, as well. Resellers buying on our platform is both an opportunity lost for our customers, but also the brand takes a hit.

What are some of the newer problems you are trying to solve using AI and machine learning? Could you share some examples?

Datar: I’ll give specific examples. Search is one of them. There are two aspects to understanding search queries. One is at a token level figuring out if the user has made a mistake or is it a synonym of something else. So you might have misspelled “jeans” or typed in “joota” instead of shoes. Figuring out that is tough because as far as we are concerned, semantically it’s the same thing. That’s what we are supposed to show. We are using deep learning and word embedding techniques to solve that problem. The other aspect is figuring out what attribute or feature does a search query correspond to.

For example, if you search for “mustard black Kanchivaram sarees” the fact that mustard and black are colours and Kanchivaram is a category and saree is a store. So we get a lot of these queries that are easy for humans to figure out. In the case of Red Tape shoes, red is not actually a colour. It’s a brand. So, using the past click data we’ve done some models. In the last year, we also took into account the catalog information. So even before a product launches, it is in our catalog. For example, until the Motorola One was launched, no one was doing that query. So I’m not able to learn it until I see queries and clicks. But now because it’s part of my catalog, I can anticipate such queries. This kind of semi-supervised learning is an additional improvement that we’ve made to some of our intent models.

On the advertising side, predicting the click through rates or conversion rate when content is shown to you. Last year it was only on product listing ads. Now we have expanded that both from a scale perspective and many other products within advertising. So whether it is brand advertising, merchandising etc. On every part we have a score which tells us how likely are you to engage with a particular piece of content and ultimately how likely are you to buy that product. That has given us huge improvements both in terms of topline and bottomline. The amount of money we make from our ads have increased because we have gotten better at targeting the right content to the right user.

On demand forecasting side, we have made fairly significant wins in terms of predicting the demand of a product at a national level, zonal level and so forth. That has reduced errors, better inventory and fewer out of stocks. As an e-commerce company, our goal is to make sure that we don’t inventorize our products for too long because that locks in capital, takes up space in our warehouses etc.

What about Flipkart’s other outside data sources?

Regunath: I’m not sure of anything else outside of Flipkart we interface with. That could be a work in progress because we do experiment. There is no sure shot way of getting there. Some of those efforts can possibly be happening but right now nothing is in production. None of our reseller detection algorithms today that are there in production use external data.

Are you also working on voice related technologies?

Datar: Yes. You must have heard that we acquired a company called

Can you go deeper into how you plan to use them?

Datar: We won’t be able to comment much at this point. Clearly that’s a very important area of investment for us. You’ll see announcements from Flipkart as we launch new features.

Anything around virtual assistants?

Datar: Yes. Virtual assistants and chatbots are a fairly common theme and across customer service, across your discovery, as well as seller support and various touch points where we are exploring that. And, in fact, we have launched some versions of them as well.

So virtual assistants or bots don’t always have to be in the form of a chat. Sometimes it could be auto responders to your emails and so on.

Utkarsh: This is very relevant to that customer segment which goes beyond tier 2 (cities) and onwards.These guys might never have visited a shopping mall and they don’t understand and add to cart or shopping cart kind of a concept. And traditionally discovery, search, brows etc has all been pivoted around English and certain notion of filters and navigation concepts. That doesn’t fly with that customer segment. So, this is where all of these techniques will have to be leveraged.

Can you tell us about the work you are doing with the Israeli company?

Regunath: So we acquired this company called UpStream. We use them primarily right now for a couple of inputs. One is price insights and also for selection for improvement. And UpStream already had a product which they were selling to e-commerce companies worldwide where they could look at primarily market and competition and come up specific intents around saying what should I be pricing my product with specific goals in mind. Or what are the kinds of selections that you want to bring on board so that you can reach a larger customer base. And few things that you can always start off with is what are the search queries where you’re not able to show products. What is the kind of GMV that’s lost because of lack of a price elasticity. Systems like this can help us discover that elasticity in price because otherwise lot of these pricing decisions today are done manually. And as a human, it’s known that you can probably keep seven data points in mind when you make a decision, whereas systems can keep lot more.

Can you dive a little deeper into the demand forecasting aspects? Can you talk about any number? By how much has it improved your business?

Datar: I cannot give you a specific numbers but what I can say is that we have seen about double digit improvements in reduction of errors. There are two different aspects. One is making sure that we get the right kind of features. So, what are the predictor variables like how much are we going to sell next week? Next month? Engineering those from a data perspective, the right kind of feature engineering, discounting outliers from past behaviour are some of the things we have done. From a technique perspective, we have tried three or four different techniques. There are the usual Arima (auto-regressive integrated moving average) kind of models and then there is deep learning models. So trying three or four different types and and using the combination of these models what is typically know in data science as “bagging and boosting” on top of it has led to the improvements in error reduction.

Utkarsh: This is a very fundamental and key investment for us. The entire supply chain network is getting morphed as we speak. We are investing in a multi tiered kind of an architecture where (we are not) trying to have all warehouses, let’s say at the same tier — and these are homogeneous or identical. All products at all warehouses doesn’t fly (off the shelves). What a multi tier architecture means is that you might have a certain warehouse (in certain parts) which are pretty deep. These are large warehouses with a national flavour — then as you get closer to the customer, you might start creating tiers and start forward, placing certain inventory in those tiers. And rest of it for national, wherever you feel that regional is not meeting the demand.

Any other seller side tools?

Regunath: One is on pricing — we offer insights. We tell our sellers, hey this is a price at which customers would be willing to pay for. Second is demand forecast.

For sellers, we also offer them specific product lines. For instance, Peter England a brand known for low cost format shirts, we asked them to list more t-shirts to target younger population. Now, the t-shirt line is selling more than formal shirts. This is the insight we are able to give our brands and vendors.

Today as part of seller portals we have these insights.

How has your total base of users grown from last year?

Datar: We currently have 150 million customers.

Just to zoom out a little bit — your top AI priorities for this year?

Datar: It is difficult for us to say that because today intelligence is part of every single flow. From the time a person registered, we have models checking if he’s even a genuine customer. From the time he’s placing an order, from the time what he’s searching for, the recommendations that we show to him, the way we price the product — they are embedded in every engineering function at Flipkart. So you would have a data scientist in pretty much every engineering team that Flipkart has.

How many data scientists do you have on board?

Datar: Roughly 45.

How has the team grown over the last year?

Datar: More than doubled in the last year.

So, what’s the roadmap ahead?

Datar: Continue to grow aggressively. While we are augmenting data scientists, we also understand that data scientists are going to be a fundamental play in almost every part of the org. So, all of us who are not traditional data scientists, we also go through data science and engineering kind of courses so that we appreciate data scientists and enable them as much as possible. That is a big investment that we make. All of our engineers are asked to go through this skilling.

A lot of upskilling of our engineers. and data science is just one flavour of engineering. And if you look at some of the biggest successes that have come in the industry — Google Brain etc. — it has been when data scientists and system engineers or architects have come together because when they understand each other, they can unlock huge potential. So, Google Brain, which is deep learning, happened when traditional technology that neural networks could actually be put at scale and in order to do the scaling part of it, the underlying technology is distributed systems — so how do you scale across different dimensions — so what is known as data parallelism versus model parallelism — these are very distributed systems concepts. so we also think there is a lot of magic at the cusp of things. So a view that we have only 45 data scientists — how much is that going to grow – that is just one part of the story. I think it’s important that the two fields — core engineering and data science come together — there’s a lot of upskilling and I think there will be a huge unlock of potential in that way.

Have you been publishing any papers? Any notable ones?

Datar: Yeah, last year. I think there’s a web page where we list all of the publications. In all the important conference, CIK, Dubdubdub, etc, we have been publishing. Last year, we had around 10 or so publications in such international conferences. On publications and open source, outside of even data science, I think we are in India one of the top companies when it comes to open sourcing network as well.


Updated at 12:57 pm on April 8, 2019   to correct the headline. The earlier headline read "The Flipkart data view: Q&A with Mayur Datar, deputies". Principal architects Regunath Balasubramanian and Utkarsh Bhriguvanshi are not Datar's subordinates and part of the larger Flipkart tech team with roles in the company's AI efforts. Also, earlier, the quote card of Utkarsh Bhriguvanshi incorrectly identified him as Regunath Bhriguvanshi; that has been corrected.

Disclosure: FactorDaily is owned by SourceCode Media, which counts Accel Partners, Blume Ventures, Vijay Shekhar Sharma, Jay Vijayan and Girish Mathrubootham among its investors. Accel Partners and Blume Ventures are venture capital firms with investments in several companies. Vijay Shekhar Sharma is the founder of Paytm. Jay Vijayan and Girish Mathrubootham are entrepreneurs and angel investors. None of FactorDaily’s investors has any influence on its reporting about India’s technology and startup ecosystem.