At almost every step of the user journey, from the moment you log into Flipkart till you receive your package, there are dozens of machine algorithms at work: in personalised recommendations that figure what products you are likely to buy, customer service bots that can tell you when your shipment is arriving, anti-fraud algorithms that determine whether the order is genuine, routing the package from a warehouse to your city, and assigning the shipment to a field agent to deliver it home.
“The ultimate goal will be when you can take almost every decision as data-driven and model-driven. We’re all working towards that,” says Ravindra Babu, principal data scientist at Flipkart at slash n, Flipkart’s annual developer conference held in Bengaluru on 5th and 6th of April. Slash n loosely translates to new line in coder speak, and the theme of the event this year was “AI for India”, a deviation from the previous years, where it went with the tagline – intelligence at scale.
It’s a packed auditorium at the Radisson Blu venue in Marathalli, a neighbourhood renowned as for its data science training institutes, in a city that’s acknowledged as a hotspot in India’s fledgling AI ecosystem. There are least 700 people in attendance, mostly software engineers from Flipkart and its tech partners. There are no self-driving trucks, drones, robots, personal assistants or any of the sexy stuff at the frontiers of AI. It is more focused on problems such as personalisation, product discovery, address understanding, search, and fraud detection. This may not grab the eyeballs, but talk to the data scientists and engineers here, and they will tell you how impactful their projects have been, in terms of reducing costs, and reducing friction in the delivery supply chain. There seems to be a genuine culture of knowledge sharing here, with many presentations ending with a link to Flipkart’s contributions to the open source technologies it leverages, via libraries on Github.
The event was held just before Flipkart made news headlines celebrating its move to a new office complex. The e-commerce player is reportedly being courted by both Walmart and Amazon, and an event such as this serves as an opportunity to showcase some of the business value unlocked by leveraging machine learning.
Data-Driven from Day One
Babu is one of the AI veterans at the e-commerce major with a 24-year run as a scientist at the Indian Space Research Organisation and six years at Infosys. He’s been with Flipkart for more than four years and has worked on a range of data science problems to help with address classification, catalogue management, and fraud modelling.
Flipkart “had an analytics team from almost day one and the importance of data science was understood as early as 2013-14,” says Babu. “Some of the analysts were doing machine learning work as well and now there is definitely an emphasis from the top itself.” He is referring to the company’s ‘AI for India’ programme led by co-founder and chairman Sachin Bansal. While announcing it in December, Bansal outlined the company’s intent to “harness AI to solve complex problems unique to India”.
Babu has worthy peers in house. Mayur Datar, chief data scientist at Flipkart is a Google veteran, holds a Ph.D from Stanford University, and has an h-index of 19 on Scopus. The h-index denotes the productivity and impact of the published work of a scientist or scholar. A score of 19 puts him among the top 20 data scientists in India. Another AI star at Flipkart is Jatin Chhugani, Head of AI Labs at Flipkart at Palo Alto, California, an import from its 2015 acquisition of F7 Labs.
“There’s a realisation in the company from the top down from Kalyan, Sachin and Binny that we cannot afford to continue doing business as usual. We really need to kick it up a notch or two and we cannot ignore AI and machine learning. It has to be up front and centre. We have to be AI first,” says Datar. By Kalyan, he is referring to Flipkart’s CEO Kalyan Krishnamurthy and Binny is Binny Bansal, co-founder of the company and currently its Group CEO.
“E-commerce is a very thin margin business. At the scale at which we are operating, any reliance on manual operations or labour intensive processes really hurts our bottom line,” adds Datar. “Predicting future demand is also a huge potential money saver. It can save us a lot of money by giving that guidance to our sellers.”
A data moat
If AI and machine learning is mission critical at Flipkart, how many data scientists does it have pegging away at it? Depending on how you frame that question, the number varies. At its most conservative number – there are around 20 at Flipkart, six or seven at its fashion portal Myntra, and seven in Palo Alto, says Datar, detailing his filters: “People who can conduct research in machine learning, who have formal education in one of these areas, and have had provable impact in industry, with good experience.”
Counting engineers and product managers, some 50-odd people work as data scientists at Flipkart. The company plans to double that headcount by hiring 50 more.
To compare, Amazon has over 5,000 people working on just the Alexa platform. In India, based on LinkedIn data, there are an estimated 50- 60 data scientists, and a thousand-odd of machine learning engineers, though such a comparison is directional, at best.
Datar acknowledges that Flipkart is limited by talent supply. At a talk earlier at the Slash n event, Luo Si, principal engineer/senior director at Alibaba Group Inc casually mentioned that the Chinese giant’s natural language programming (NLP) platform team has 90 engineers and scientists. This disparity was not missed by Datar. “Clearly, we have a long way to go if you aspire to be anything like the Alibabas of the world, but we need to start this journey,” he says.
Flipkart has encouraged engineers to take the Machine Learning Crash Course that Google recently launched. Datar says their goal is to upskill everyone. Of the Flipkart engineering team of about 1,300, more than a hundred have been through courses of Google, Udacity and INSOFE.
“I think it’s important to clarify here that it’s not like we suddenly woke up one day. It’s more about putting more wood behind the arrow,” Datar says. “Aspirationally, we want us to become kind of the flag bearer and the bellwether for the tech industry and pave the way for how we use AI and ML here and also democratise some of the technologies that we are building in the context of India, so that they can then be used by other businesses in India and in the public sector as well,” he says.
Flipkart is looking to rope in the data science community at large through data challenges, Datar says. “It might be similar to a Netflix challenge,” he says, referring to the internet video company’s $1 million challenge to create an algorithm that predicts user ratings for films. The competition ran from 2006 to 2009 and the prize-winning algorithm saw a 10.06% improvement over an earlier product.
“They had given a subset of the data and the idea was to give recommendations for users. So how do you build a recommendation engine? Similarly, we might put out data from search, from recommendations, from our catalog, from our question answers, and put out a challenge out there to utilise this data, build your models,” says Datar.
Utkarsh Bhriguvanshi is known to be Flipkart’s go-to person for any new and critical initiatives. “I take care of certain complex categories like grocery but I try to put a lot of my brain power into the supply chain these days,” says Bhriguvanshi, a principal architect at Flipkart. He gave us a background on Flipkart’s investments in building the infrastructure for big data and machine learning experiments. Flipkart daily generates over 10 terabytes of data, which grows five-fold on a day during events like its annual Big Billion Days sale.
“It’s a goldmine for us,” says Bhriguvanshi of the data. “So we have invested in a lot of infrastructure on keeping this data intact, which can be mined at any point of time.”
Flipkart Data Platform lets its business and product teams to glean insights from data in a self-serve fashion. An ML Platform built around a year ago makes it easier for the company’s data scientists to host a model, feed data to it, check the results, and tweak the models. “The beauty of it is that people who are not data scientists (and are) traditional technologists can in a self-serve way, play around with certain models,” Bhriguvanshi says.
Key AI/ML use cases
Flipkart has productionised more than 20 projects that employ machine learning, says Datar. Decision trees, logistic regression, support vector machines, and deep learning, and are some of the most-used machine learning algorithms internally, he says.
Their AI use cases range from preventing transaction fraud, customer support, logistics and warehousing, estimating product popularity, image speech and text processing, intent modelling, conversational search, personalisation, discovery, forecasting, pricing, address understanding.
Search itself is a huge research area for machine learning, says Babu. “The normal example that we take in a presentation is the ‘iron table’. Is the user asking for a table made of iron legs or a table for ironing the clothes?” Flipkart’s first search result links to the ironing boards category and provides a link to foldable iron tables on the third result.
Babu also highlighted the “address understanding” problem, one that is unique to India. The problem is that users often type addresses like ‘Khan House, X mohalla, Y village’. Flipkart uses machine learning algorithms to group addresses belonging to a sub-area, separate compound words, and eliminate bogus orders when users type gibberish in the address field. Called ‘monkey type’ in the industry, Babu has a research paper in 2017 articulating the problem and two solutions for it.
Flipkart is working on a pilot project in metro cities to optimise the routes of their delivery agents and creating systems to automatically sort packages on their final routes by area. “A few hours of activity has been reduced to a few minutes,” Babu says, quantifying its impact.
Pin code prediction is another feature that is in production – it helps their systems understand if a particular address has an incorrect PIN code, which users often get wrong, and can result in delayed deliveries and increased costs.
Lucky Dhakkad, data scientist at Flipkart, showcased its Geocoder project partnering maps company MapMyIndia, which uses deep learning text classification techniques to identify the lat-long (latitude-longitude) coordinates for a given address. It’s used at the delivery hub to plan more efficient routes for the field executive. “We construct a polygon from the address, try to draw the most granular polygon for it,” she explains. Flipkart has seen a 20% to 25% improvement over the previous model, she says. The feature sounds quite similar to what logistics player Delhivery has done in this space.
Predicting or forecasting the demand for products at an individual product level, category level, and figuring out what the right pricing should be is another big area where machine learning is applied. “We’ve used a combination of deep learning-based models and more classic techniques that do time series forecasting,” Datar says. “These are your ARIMA models, well known in statistics, which are effectively a fancier way of doing moving averages with some attributes mixed into it,” he explains.
Flipkart uses computer vision to understand and extract product attributes from images uploaded by its sellers, and to check if it is accurate. “We literally get millions of new products every day, if I were to put human resources into that, it would be too expensive and untenable,” Datar says. Its deep learning models check to ensure that there’s no vulgar or objectionable content being uploaded. “Like an Indian flag being shown on a doormat, an example that happened with Amazon, or showing the usage of drugs, or something in the images,” he says.
It might seem like a trivial use of machine learning, but considering the controversy stoked by the listing on Amazon Canada, which escalated all the way to Jeff Bezos, it’s a mistake Flipkart would not want to repeat.
Cutting fraud, boosting revenues
Preventing transaction fraud, whether its from resellers or users, is another key area of application of machine learning for Flipkart. It has in production fraud models that prevent reseller fraud – those who exploit the discounts on Flipkart and sell items offline.
A recent application of machine learning that went into production in December 2017 calculated RPI, short for revenue per impression, for any new product on Flipkart. It’s a measure that helps Flipkart figure out how much revenue it would make if it showed a particular product to you.
“What machine learning models do is they look at the features of the product –they determine that this product was in the shoe category, white colour, brand was Nike, it had stripes. It was this material, it is in this price range, it’s meant for women, so on and so forth,” Datar says. “We’re regressing on these numbers, based on past data, and trying to generalise from it, using techniques like logistic regression and decision trees.” This application helps address the cold start problem, and has given a 1.5% to 2% improvement on net conversion, he says.
Shoury Bharadwaj, senior engineering manager, part of the team that is building Flipkart’s Billion private label demonstrated how the company uses insights from millions of reviews using a tool called Review Analyser to drive product design and create differentiated products for Indian customers. The Billion brand includes smartphones, air-conditioners, mixer grinders, backpacks, power banks, diapers, among others.
For instance, the Billion-branded power bank was optimised for looks based on machine learning insights. “We arrived at a rose gold power bank which was well received by customers,” says Bharadwaj. “Power backup, fast charging were other key features we integrated using review insights.”
Speech recognition and support for Indian languages is being seen as another critical area for Flipkart –the idea is to build an interface that accommodates for the next wave of internet users who are illiterate or semi-literate. Flipkart currently supports voice input for search through its web app and only supports English and a smattering of Hindi.
The speech recognition feature, powered by Microsoft Cortana, is still at an experimental stage. “When Satya (Nadella) was here four-five months back, we demoed it to him and integrated it on our mobile site,” says Datar, adding it’s too early to give data on it. “We didn’t just take a model that Microsoft had but actually gave them data to improve their speech recognition understanding. The word error rate really came down, its close to 10-11% right now,” he says.
In the areas of drones or robotics, says Datar, Flipkart is interested in partnering or acquiring certain companies. “Robotics, NLP, image understanding, pricing, all of these would be potential areas where we would be looking at partners. It’s not always acquisition, it’s about us using their services,” he says.
Flipkart’s data leaders were reluctant to quantify the impact of its data science experiments in financial terms. “The number is not always in terms of revenue, orders and so on. It could also just be purely from an engagement perspective,” Datar insists.
That said, they do have the systems in place to measure the performance of new machine learning models put in production. “Any new model that we have, we take it through our A/B platform. For a subset of the users, we will still show the old system that is currently running, for another subset, say 20%, we show this new system. We look at metrics… the average search position, the click through rate, the conversion rate, and so on, and make sure that these KPIs (key performance indicators) are improving in the right direction.” he says.
He gives search as an example of quality of engagement. “If our search gets better, users become more happy. They keep coming more to our platform. The NPS on our platform improves. So, there’s a long term effect as well, which is why I don’t want to talk about the x million dollars (in savings or revenue upside),” he says. NPS is short net promoter score, a measure of how satisfied a customer is with a product or service and will recommend it to others.
If Datar and his team can get it right, AI and machine learning initiatives at Flipkart may accelerate the company on the trajectory it is and recover some of its lost mojo in the Indian ecommerce sweepstakes.
Subscribe to FactorDaily
Our daily brief keeps thousands of readers ahead of the curve. More signals, less noise.
To get more stories like this on email, click here and subscribe to our daily brief.
Updated the infographic on Data, Tech and AI at Flipkart at 11:30 AM, 20 April 2018 to correct a typo in weekly transacting users. The infographic previously read as '1.7k'.
Disclosure: FactorDaily is owned by SourceCode Media, which counts Accel Partners, Blume Ventures and Vijay Shekhar Sharma among its investors. Accel Partners is an early investor in Flipkart. Vijay Shekhar Sharma is the founder of Paytm. None of FactorDaily’s investors have any influence on its reporting about India’s technology and startup ecosystem.