Three Myths About Data Science Debunked

11065 Views
0 Comments
4 Likes

Despite the lack of consensus on the definition of data science, many organizations already have a data science team. And even in companies without data scientists, sooner or later business analysts will join a software or process improvement initiative with a machine learning or AI component. When that happens, good understanding of what data science is (and isn’t) can make a big difference in a BA’s ability to create value.

Based on follow-up questions I receive in posts like “How do I transition from business analyst to data scientist?” and in interviews with candidates interested in joining my data science team, I’ve identified three misconceptions still circulating within the BA community:

Data Science Myths - Debunked

1) Data science is just a buzzword for data analytics

While there is nothing wrong with traditional data analytics, data science is not the same thing.

The first difference has to do with the variety, velocity, and volume of data. Modern data science leverage machine learning and other advanced analytics techniques to process not only structured (e.g., tabular) data, but also unstructured data that can may represent images, sounds, free text. And often we’re taking about terabytes or petabytes of data being generated and processed continuously from multiple sources. In business environments, those sources may include IoT sensors, retail checkout, credit card transactions, call center logs, phone call recordings, etc.

The second difference is the emphasis on prediction and prescription. Traditional analytics tend to focus on backward-looking impact (e.g., quarterly sales growth, percentage of sales from new products, unit costs, cycle time, etc.), with some predictive modeling and time series forecasting thrown into the mix. Data science, on the other hand, is much more focused on forward-looking learning, with sophisticated machine learning models and other AI tools being primarily used to specify optimal future behavior and actions.

This is why we see data analysts that are great at generating and organizing structured data struggling in a data scientist role. Skills typical of traditional data analytics, such as writing SQL queries and creating data visualizations, are undoubtedly useful for a data scientist. However, data science roles also require a solid foundation in statistics and programming, as well as being comfortable with ambiguity and uncertainty and having the ability and curiosity to experiment and explore possibilities involving prediction or optimization.

Business analysts who consider data science as just a new name for traditional analytics risk ignoring a wide range of options when framing business problems, and consequently missing valuable opportunities to create greater ROI value for their organization.

2) Data science is only applicable to large organizations in specific industries

It’s true that traditionally predictive analytics has received more attention in certain industries such as credit, insurance, marketing, and advertising. It’s also true that data science is more resource intensive than traditional analytics, requiring advanced skills and tools, historical data, and, depending on the application, operationalization mechanisms like MLOps. Clearly this can create barriers for adoption in organizations with limited resources, but that doesn’t mean that businesses of any size and in industry can’t exploit data science solutions to create value.

Recent advances in technology are rapidly shrinking the investment in time and resources required to build predictive models using machine learning algorithms. In my role as a data scientist, I’ve been able to develop accurate ML models to support better decision-making in less than two weeks, using only free tools and NLP (natural language processing) models already pre-trained with big data. Ten years ago, the same kind of model would have taken months to produce, and due to cost and time constraints, avoided in favor of less accurate but cheaper models based on correlation analysis. 

Business analysts may also find opportunities to design solutions that take advantage of machine learning and AI technology embedded in commercially available software without the involvement of data scientists. For instance, NLP and speech recognition technology available in voice control platforms like Alexa and Google Assistant is being used in hospitals to replace the old-fashioned nurse call button that patients press when they need assistance. With a system configured to interpret, prioritize and route voice requests, nurses can ensure that a patient who is having chest pain is assisted ahead of a patient that simply needs their pillow adjusted.

3) Data science projects must follow the same development lifecycle adopted in conventional software projects

Companies starting to adopt advanced analytics often expect their data science initiatives to follow the same approaches used in conventional software projects. If you’re introducing single sign-on to protect internal applications or building an online checkout process, the effort can be easily broken down into units of work and added to a sprint backlog with a clear estimate of time to complete. And after the solution goes live, the project ends, with another team put in charge of post-production maintenance. That makes sense in systems whose behavior is consistent over time: if you click “Buy” and the item doesn’t end in your shopping cart, that’s a bug.

When it comes to using advanced analytics in business applications, though, organizations need a different approach and mindset. Most data science projects look more like scientific research than a conventional IT initiative, and successfully deploying a predictive or optimization model will require different tools and processes.

First, the definition of problem, solution, and success can be significantly fuzzier in a machine learning project. In some cases, the problem itself may need to change as the project progresses. In one of my projects, we started with the goal of fixing erroneous GPS readings to ensure accurate location estimates for moving assets, and ended up building a model that simply discarded wrong readings. Because of the variability of the environmental conditions the sensors operated on, it turned out to be impossible to train a model that performed well in real life to fix incorrect readings. The goal then changed to accurately detecting incorrect readings so they could be excluded from view (a valuable outcome because instead of receiving wrong information, customers would see a message asking them to wait for the next valid sensor reading, typically available in a matter of minutes).

Secondly, the deployment of machine learning models in a production environment has specific requirements and challenges that aren’t present in conventional software projects. Whether machine learning models are being used to identify faces, understand speech, or predict customer churn, errors will always occur, and even a model that initially has high accuracy, performance can dramatically degrade over time because of changes in input data. For that reason, the deployment of predictive and optimization models requires tools that go beyond what’s used in conventional software, including mechanisms to version not only code but also training data; continuously monitor inputs and outputs to detect emerging performance issues; establish data lineage and provenance; debug predictions; and more.

# # #

Today, we find ourselves surrounded by technology that exceeds human performance in tasks ranging from parking cars to designing dental crowns that fit individual patients. In this new reality, companies are having to rethink old models for how analytics support their operations.

Any company can generate simple descriptive statistics about aspects of its business, but the ones radically improving their performance are using advanced analytics to predict and optimize outcomes. Data-driven organizations are currently using predictive modeling to identify their most profitable customers, establish prices in real time for maximum yield, understand the impact of unexpected constraints in a supply chain, trigger maintenance to reduce the number of unplanned equipment breakdowns, and so forth.

Business analysts who can help their organizations exploit data science capabilities to enhance performance and create competitive advantage are sought after by recruiters, and highly valued by their colleagues and managers. From expanding the range of solutions to be considered for solving business problems, to ensuring that each data science initiative is tightly focused on improving an important dimension of business performance, those BAs play a vital role in organizations seeking algorithmic approaches to optimize performance.


Author: Adriana Beal worked for more than a decade in business analysis and product management helping U.S. Fortune 500 companies and high tech startups make better software decisions. Prior to that she obtained graduate degrees in Electrical Engineering and Strategic Management of Information in her native country, Brazil. In 2016 she got a certificate in Big Data and Data Analytics from the University of Texas, and since then she has been working in machine learning and data science projects in healthcare, mobility, IoT, customer science, and human services. Adriana has two IT strategy books published in Brazil and work internationally published by IEEE and IGI Global. You can find more of her useful advice for business analysts at bealprojects.com.

 



Upcoming Live Webinars

 




Copyright 2006-2024 by Modern Analyst Media LLC