The great rush to data sciences in India

Sriram Sharma February 26, 2018 13 min

Story Highlights

  • Indian software developers rush to data sciences courses – both offline and online – as they worry that their core skills may be becoming obsolete.
  • Hundreds of training shops in Bengaluru, Hyderabad, and other Indian cities offer courses and overseas certification partnerships to meet the demand.
  • Yet, much of the training that students absorb is the use of tools and frameworks rather than the underlying maths, programming, and problem-solving skills.

It’s 9 am on a February morning and the mercury is just inching past 20 degrees Celsius in Bengaluru. The workday is already two hours old in the metropolis’s densely laid-out eastern suburb of Marathahalli. A student batch of both unemployed and working software professionals at Robotek Minds, a tech training institute, has just finished its data science class.

Data science is the new buzzword in the tech industry and the code jocks in the Marathahalli class have a singular focus: a job or a leg-up at one of the shiny information technology campuses dotting the city and housing the world’s leading tech corporations. Which, they hope, will be a passport to a comfortable salary that will grow in long strides in the years ahead as the use of data in the world economy explodes.

“The very first point is on the salary… we get a good pay,” says Azmat Ali, who is paying Rs 25,000 for a 50-hour data science training course at Robotek Minds, explaining his interest in the field. “The world is completely dependent on data and processing it. Other tools may expire but data will not expire.”

There are hundreds and thousands in India who have ambitions similar to Ali’s. Look around these bylanes of Marathahalli and you are greeted by posters covering almost every wall advertising data science or related courses in local training shops that jostle for space with paying guest digs for young men and women.

Also see: Former US chief data scientist D J Patil on data science’s relevance

Prasad Reddy, the operations head at Robotek Minds, counts his employer among half a dozen institutes providing data science courses in the immediate vicinity, out of the 50-60 in the larger neighbourhood. Marathahalli’s nearest competition is Ameerpet “where around ten institutes provide this kind of training,” he says, speaking of the IT training hub in Hyderabad.

Advertisements in Bengaluru for data science courses
(Clockwise from top left) Ads for data science courses on a hoarding, inside a bus, and wrapped around a city bus in Bengaluru. | Photos: Sriram Sharma, Josey Puliyenthuruthel, Jayadevan P K

Robotek Minds offers courses also in machine learning, deep learning, and artificial intelligence (AI) with “100% job support”. “Our faculty will explain how to crack the interview, help with resume preparation, discuss real-time scenarios,” says Reddy. Some of the large certified ones even more. “We are not charging that much since we get a lot of unemployed students, who are not willing to pay that much,” he says.

Another training institute Eminent IT Info has done its bit to make sure anyone interested in data science knows of it — its posters are splashed all over the neighbourhood advertising weekday and weekend batches for training in Python, a high-level programming language; statistical software framework and language, R; machine learning and deep learning; natural language processing (NLP); and AI. A receptionist says its 90-day data science course starts at Rs 30,000 and quickly forwards PDFs of the courseware over WhatsApp.

“Three to four years back, everyone was crazy for Hadoop (an open-source big data processing framework). Now, it’s data science. From the inquiries which we get, almost 50% of the crowd wants to do data science,” says Sourabh Sharma, marketing manager at Realtime Signal Technologies, a training institute with branches in Marathahalli and BTM Layout in south Bengaluru. The institute charges Rs 40,000 for a course on data science covering machine learning, Python and R and taking three to four months to complete.

Data science classes jostle for space with paying guest accommodation for young men and women in Marathahalli, Bengaluru.
Data science classes jostle for space with paying guest accommodation for young men and women in Marathahalli, Bengaluru. | Photo: Rajesh Subramanian

The Marathahalli institute managers we interviewed say they are holding up against the rapid spread of massive open online courses (MOOCs) that offer learners the flexibility to work through a course of their choice and at their pace. Akhil Teja, data science trainer at Robotek Minds, insists the key to learning data science concepts – “It’s not a tough subject to learn if you are able to focus on the concepts,” he says – is finding the right mentors, which MOOCs can’t provide.

Warning: MOOCs ahead

Teja’s edge over MOOCs may be getting dull as a recent visit to Ameerpet, a locality that presents a shabby facade of decades-old buildings showed. A sewage canal nearby is under a major engineering overhaul throwing up a stench and the adjacent road has been barricaded into a narrow lane where people elbow for space with vehicles. In the middle of this north-central Hyderabad chaos, we found at least a dozen IT institutes advertising machine learning and data science courses.

The locality has seen better days and is now at the receiving end of tech disruption, due to the proliferation of MOOCs — global ones such as Udacity, Coursera and EdX, and NPTEL, an India government-run MOOC. “There were around 10,000 to 12,000 institutes earlier in 2009-10, but now there are hardly 3,000,” says Suchitha Rudragani, administrator at Sathya Technologies, an Ameerpet IT training institute. It charges Rs 20,000 for its 40-hour course in data science and offers a range of options from online, offline, fast tracks, and weekend batches. “Over 60-70% candidates are interested in data science now. This technology is booming in the US and UK,” she says.

Also see: MIT’s Alan Edelman on Machine Learning

Some of them are working professionals. Australia-returned Manikiran Reddy, a mid-career Oracle developer, has enrolled for a morning data science course in a nearby training shop, Kelly Technologies. His aim: add data science skills to his talent stack to be more versatile in his work and target job opportunities better.

A data science class at Robotek Minds, Marathahalli, Bengaluru
Students in a learning session at Robotek Minds’s spartan Marathahalli classroom, Bengaluru. | Photo: Rajesh Subramanian

Others, to be sure, are less enthusiastic about getting an education from here. Dilip Kumar Reddy, a third-year computer science student, finds more value in Coursera, which offers a ₹3,103 per month subscription. Apart from certifications from a number of international universities, the all-you-can-eat package and quality of teachers attracts him.

“It’s a lot better; it’s taught by professors who are teaching courses at universities like University of Michigan, people who teach Masters students,” says Reddy, who studies at B.V. Raju Institute of Technology in Medak district about 60 km north of Hyderabad. “Your assignments are verified by international professors directly. They give you personal feedback on your assignment, areas you should concentrate on.”

The demand is reflecting on Indian MOOCs, as well. IIT-Kharagpur professor Sudeshna Sarkar’s ‘Introduction to Machine Learning’ course on NPTEL has seen a 4X year-on-year growth in enrolments in its July-September 2017 batch. Data for later courses are not immediately available.

Nirant Kasliwal, an undergrad from BITS Pilani, puts his passion for data science to an EdX course he took some five years ago. For him, today “Udacity is the best, even their free courses are pretty good”. A Medium post that takes a data-driven approach to ranking the top-reviewed best introductory courses on data science, rates Kirill Eremenko’s Data Science A-Z course on Udemy and Intro to Data Analysis by Udacity as the best online resources.

Partha Pratim Talukdar, assistant professor at the Department of Computational and Data Sciences at Indian Institute of Science, Bangalore.
Partha Pratim Talukdar, assistant professor at the Department of Computational and Data Sciences at Indian Institute of Science, Bangalore.

Six out of ten developers are looking to acquire, or are currently learning machine learning and deep learning skills, says competitive coding platform HackerRank as part of its 2018 developer skills report released last month. AI beats blockchain, augmented reality/virtual reality, internet of things, and quantum computing, as the most sought-after tech skill.

Two of the top three languages Indian developers plan to learn next are Python (43%), and R (36%), which are the two most used languages by data scientists and statisticians respectively, according to Kaggle’s 2017 State of Data Science and Machine Learning survey, which polled its global online community of data scientists.

Python is also the most loved language in HackerRank’s language preference graph, with more than a 30% lead over the next closest language, C. “Python has a number of libraries (NumPy, SciPy, Pandas) geared towards machine learning, which are used by a lot of companies, and its an easy language to learn, the closest programming language to English”, says HackerRank co-founder Vivek Ravisankar, explaining its growing popularity.

Also see: Here’s why India is likely to lose the AI race

While high-quality resource materials are available and access to information has been democratised through MOOCs, learning by doing is the most important, says Partha Pratim Talukdar, assistant professor at the Department of Computational and Data Sciences at Indian Institute of Science, Bangalore. “You need to get your hands dirty, really do stuff,” he says. “It’s not just doing coursework, doing real projects is key to learning data science.”

Keeping obsolescence at bay

For developers, the race to acquire AI skills is an existential challenge. There’s the fear of job loss to automation and being left with outdated skills. While data science boasts a higher take-home package and scope for growth, it requires interdisciplinary skills: a combination of maths, statistics, programming, and some domain expertise.

“There is a genuine concern among software developers and testers that the requirements for their kin are on the way down. These guys are a bit uncertain about their own future, and one of the popular avenues to up-skill towards a more rewarding future is getting into data science and analytics,” says Charanpreet Singh, co-founder and director of Praxis Business School, a business school that offers analytics programs at its Kolkata and Bengaluru campuses.

A data analytics classroom at Praxis Business School.
A data analytics classroom at Praxis Business School.

Its one-year program in business analytics, priced a little above Rs 5 lakh, started off with eight students in 2011. It now has about 150 students, Singh says, adding he keeps the class size limited to focus on quality. “All the big names have since started their analytics program, so obviously the market demand has increased tremendously,” he says.

Praxis’s business analytics course has 80-85% engineers and a growing number of science graduates majoring in economics, statistics, and maths. It teaches techniques (statistics, machine learning, deep learning, visualisation), tools and technologies (Python, R, Tableau, Spark and Hadoop), and how it can be applied to business. “Data science requires a problem solver’s mindset – understanding business problems, converting a business problem into a data problem, and going deep into data. Some of these people struggle at the beginning to get into that mould,” he says.

Also see: If data is the new oil, Google wants to sell you the drilling tools

The push by tech giants like Google, Amazon, Microsoft, and IBM to ensure that their respective ecosystems come up on top doesn’t help, says Gunnvant Singh Saini, a data science trainer at Bengaluru’s Jigsaw Academy. “They have their own libraries that they want to promote. So, for example, Google has a big stake there and somehow Tensorflow becomes the de facto deep learning library and everyone just uses that,” he points out. Tensorflow is an open source machine learning framework from Google.

Easy as they make data science onboarding, such tools and crash courses come with the risks of shortcuts. “Being able to use that toolkit and get the best advantage out of it still requires you to understand the basics of maths and stats. If you do not understand what is the statistically valid sample size for your models, it will always fail,” says Santanu Bhattacharya, a prominent Indian data scientist who has led teams at Facebook and is now at MIT’s Emerging Market Group. “It becomes a scenario of a monkey with a machine gun.”

Shinu Abhi, director, corporate training at REVA Academy for Corporate Excellence, which runs an advanced analytics program for working professionals, agrees. Its training course stresses on what she calls the “two primal KPIs” for companies – increasing revenue and reducing cost. “The creamy layer of smart data scientists bring business impact,” she says, stressing how skills need to go beyond just use of tools.

Credentials-job mismatch?

At the higher end of the training value chain, international certification is one of the big draws. International School of Engineering (INSOFE), with a presence in Bengaluru and Hyderabad, for instance, provides certification recognised by the Language Technologies Institute of Carnegie Mellon University (CMU) for a post-graduate program in data analytics and optimisation. Jigsaw Academy’s data science post-graduate program gets you certified from the University of Chicago Graham School.

A building crowded with training centres at IT training hub Ameerpet in Hyderabad.
A building crowded with training centres at IT training hub Ameerpet in Hyderabad. | Photo : Sriram Sharma

Piyush Mishra, an engineer and aspiring data scientist currently undergoing a course at INSOFE in Bengaluru says that its CMU affiliation and access to labs and faculty of data scientists motivated him to spend over Rs 3.5 lakh on the 23-weekend course. The program includes a paid internship as well. Despite layoffs in the IT industry, AI, deep learning, and automation are performing, he says. “That’s why I planned for this course.”

Vaibhav Gokhale, who did a Master’s in mechanical engineering from Purdue University and recently completed his internship at INSOFE, will be soon joining a newly created analytics team at a Pune company at a salary of Rs 4 lakh a year. The relatively low salary for his kind of education doesn’t deter him. “Initially, it’s fine, the field is growing. Once I have some experience I can demand more,” he says, attributing the low take-home to his lack of programming experience. “In our class, most have an average of ten years experience in some domain. If someone has experience in software development and business background, they are more likely to get higher pay.”

Also see: Why India’s data scientists make a fraction of their US counterparts

Core AI job roles related to deep learning, machine learning, and NLP, are areas where talent supply is lower than market demand in India, says Rishabh Kaul, co-founder, Belong, a Bengaluru HR startup, citing its talent supply index study from March 2017.

A bulk of the demand for data scientists and machine learning engineers is being generated by R&D centres of large global corporations, says Kaul. “India is still maturing as an AI/ML ecosystem. For a lot of companies, their expectation from AI talent is knowledge of tools, languages, software packages, and libraries.”

According to Belong’s research, less than 2% of professionals who call themselves data scientists or data engineers have a Ph.D in AI-related technologies and just 4% AI professionals in India have worked on areas such as deep learning and neural networks. Kaggle’s research also seems to indicate that India’s talent pool is much younger than elsewhere and seems more bottom heavy, with fewer Master’s and Ph.D-level talent.

“Starting salary for basic analytics folks can be anywhere from Rs 4 lakh to Rs 8 lakh per annum, while for data scientists with 4-5 years of experience it can be Rs 15-30 lakhs per annum. 10-plus years of experience in AI can pay anywhere between Rs 60 lakh and Rs 1.5 crore depending upon the company,” says Kaul. In the US, AI professionals with 10 plus years of experience earn upwards of $300,000 and with top hirers, this can go upwards of half a million dollars and stock, he says.

The market demand for data science in India is much smaller than the developer job market, as expected. As of Feb 15, LinkedIn listed 2,657 jobs related to data science in India in the past month, while there were 31,378 results for other developer jobs. Kaul believes this is an underestimation and that some 4,000-5,000 data science jobs are added every month in India.

Also see: How data is making Delhivery India’s first e-logistics unicorn

Kasliwal, the BITS Pilani alum who works at a stealth startup in Bengaluru, strikes a note of caution about the kind of data science work done in India. “Even if you get a data science job, you will be doing a lot more development work, rather than data science. People looking to get into this field in India need to temper their expectations, and learn to take the ones and twos, rather than go for the sixes,” he says, taking a cricket batsman’s mindset. The students in Marathahalli could benefit from that advice.


               

Added a link to "http://research.hackerrank.com/developer-skills/2018/" on 3:07 PM, 26th February to the HackerRank research.

Lead photo and graphics: Rajesh Subramanian

Disclosure: FactorDaily is owned by SourceCode Media, which counts Accel Partners, Blume Ventures and Vijay Shekhar Sharma among its investors. Accel Partners is an early investor in Flipkart. Vijay Shekhar Sharma is the founder of Paytm. None of FactorDaily’s investors have any influence on its reporting about India’s technology and startup ecosystem.