Healthcare and Machine Learning – a Practical Approach

Machine learning is arguably the next most exciting thing to happen in healthcare. However, there is a gap between the promising developments by tech giants and startups, and the practical steps that the healthcare organizations can take today. The goal of this post is to bridge this gap and suggest the plan of action. In order to do this, we need to address three questions:

  1. What is machine learning in general?
  2. What can machine learning do for healthcare?
  3. What steps you can take today?

What is machine learning?

Simply stated, machine learning is learning by machines, as illustrated by the cartoon. On a bit more serious level, machine learning (ML) is all around us and includes the following practical and useful applications:

  • Credit card fraud detection, where you deal with thousands of features and billions of transactions
  • Customer recommendations, where you recommend millions of products to millions of users
  • Genome data manipulation with thousands of human genomes, for example, to detect genetic associations with disease

A great branch of machine learning called “deep learning” produces results such as accurate language translation, which is usable in practice and face recognition, which is as precise as performed by humans.

Machine learning is an algorithm that learns from data. It uses statistical and mathematical techniques to build a model from the observed data and then uses this model to predict future results. Usually, the performance of this model gets better with more data; in fact, Anand Rajaraman was one of the first who said that “More data usually beats better algorithms.” By the way, Anand is an engaging personality responsible for WalmartLab machine learning and Amazon Retail Platform (which handles 25% of the US transactions). You can read more about him on his site Datawocky, where we got this picture of him.

To use today’s machine learning, you don’t need to be able to solve the equations or explain the statistical calculations. Instead, multiple machine learning tools and libraries are available (Spark ML, Amazon Machine Learning, H2O, TensorFlow, etc.) that can bring machine learning into the hands of a regular developer. In fact, we at Elephant Scale teach these; our recent course on Machine Learning in Spark is one of our most popular.

The moral of this is that “Data is the new oil,” and that machine learning is the engine that runs on this oil and that propels you beyond your competitors. Now let us turn to healthcare?

What can machine learning do for healthcare?

Here we will survey promising use cases and outline the ML technologies likely to prove useful. Before we go on, however, we need to make a distinction between traditional machine learning and a new branch quickly gaining the public’s attention  – deep learning.

Traditional machine learning has been around since 1950’s. It tries to find the best model that will fit your data. For example, Naive Bayes Classifiers (NBC) accept various features of your data (results of medical tests, prior history converted into numbers, and so on. The features may in the tens or hundreds or more. Based on the known diagnosis outcomes, NBC will assign probabilities connecting diagnosis to features. Later, when faced with a new set of tests and clinical history for the new patient, NBC will be able to assign probabilities to various diagnoses.

The problem with traditional machine learning is that it creates a model which tries to simplify the reality. If the model is given too much flexibility, it reflects the character of the training data too much, resulting in poor prediction. If the model is made smoother, then it will only approximate the training data to a certain extent. The result is that the precision of standard machine learning faces a natural limit. Often it is enough for many applications, such as marketing emails. However, the 70-85% correct prediction rate, typical for this approach, is too low for healthcare.

Enter neural networks and deep learning which is based on them. Neural networks are inspired by how the human brain works, and imitate signal passing between randomly connected neurons. Given enough complexity, neural networks can produce models that fit the data better than traditional machine learning, since the data itself is used to influence the creation of the model. The neural networks are also not new, having appeared in the 1960’s. Their success was initially limited by the enormous computational resources they would require for training. These resources because available at companies such as Facebook, Google, IBM, and Microsoft, in about 2010 and on. The year 2016 saw these resources being offered, together with the prepackaged machine learning and deep learning technologies, to the public.

Now we are ready to delve into specific use cases.

  • Diagnosis. Dozens of companies are developing ML diagnostic tools based on recognizing patterns.

The classical machine learning techniques are Naive Bayes Classifiers, described above, and also Decision Trees and Clustering. These achieve reasonable performance and can be used as a starting point.

The real promise, however, comes from deep learning. Here we can mention IBM’s Dr. Watson, Google’s DeepMind Health, and Apple’s ResearchKit. These models fulfill to a high degree the following success requirements:

  • Good performance
    Dealing with missing data
    Dealing with noisy data

Also desired are the three more characteristics, and it is their presence that makes the problem so much more complex.

  • Transparency of diagnostic knowledge
  • Explanation ability
  • Reduction of the number of tests

Thus, the above models based on all of the available data, and on a combination of various techniques, all based on continuous feedback, so that the application learns from its mistakes and improves its performance with time.

  • Medical tests. For example, IBM together with Pathway Genomics on multiple healthcare application that incorporate Dr. Watson, such as
    • A simple blood test for early cancer detection
    • Genomic wellness app announced in January of 2016 at Digital Health Summit
  • Sepsis prediction, based on the real-time data from the emergency room.

This is a vital area, so here is a little background.  Sepsis is one of the leading causes of mortality in hospitalized patients. And yet, a reliable means of predicting sepsis onset remains elusive. One of the efforts used machine learning classification system with multivariable combinations of easily obtained patient data (vitals, peripheral capillary oxygen saturation, Glasgow Coma Score, and age) and produced better results than previous ones. Here you can see the knowledge of Machine Learning combined with the healthcare expertise.

Another success is by a startup called Sentient, which achieved a sepsis prediction rate.

  • Follow-up care, to detect if patients are taking the right medications, and to predict which patients are likely to take or not take meds.

This is important because one of the biggest hurdles in health care is hospital readmittance. Patients should follow their treatment recommendations when they go home. But they often do not. A startup called AiCure is using mobile technology and facial recognition technologies to determine if a patient is taking the right medications. Facial recognition is a well-known task in machine learning; recently Facebook told the world how they achieve near-human accuracy at this, with the project called DeepFace.

Other promising use cases (sorry we cannot give enough attention to all) include

  • Medical resources allocation
  • Alerting and diagnostics from real-time patient data embedded devices
  • Physician attrition (hospitals love physicians with multiple hospital admission privileges, but so do other hospital systems)
  • Survival analysis
  • Readmission risk

Throughout this analysis, we mentioned wearables or embedded devices. Big Data by now has a great way of dealing with the requirements posed by variables, as well as solving the processing problems and applying machine learning in the process. Incidentally, we at Elephant Scale teach these building blocks, such as Kafka and Cassandra, and many others. We also teach Machine Learning.

This cartoon is a play on the Hadoop logo, which is an elephant, just of a different color. Hadoop is the foundation of many Big Data technologies.

This is all very exciting, you might say, but as a healthcare executive or an IT director, what am I to do today?

What steps can you take today?

There are things that you can do today, that will prepare you for the revolution that is to happen in a few years.

Implement Big Data lakes

A “data lake” is a collection of Hadoop clusters. Hadoop is a tried and proven building block for Big Data. All of the newer tools, such as Spark, are still based at least on the Hadoop storage, called HDFS, and often you need to use multiple Hadoop tools. The new offerings will need to get their data from somewhere, and if you prepare for this now, you will be many steps ahead of the game. When the providers of machine learning solutions will come with their applications, they will find your data ready for them.

Implement Big Data education

This is based on our customer experience. We teach for one of the major financial institutions. Initially, they were a leader in innovative offerings, but these were later copied by competitors. The company then implemented the Big Data learning initiative; that gave the power of Big Data to all of their financial analysts. These experts then came up with new approaches, such as new ways of customer segmentation, leading to improved effectiveness of perks and promotions.

The main idea is similar to one of the business gurus, who tells us that “if you want people to be more fit, put bikes within the easy reach of the people.”

Start using your existing resources

If you are, say, a hospital system, your unique advantages are the data that you have, and the researchers who are already working with that data. However, to implement machine learning, you need the knowledge of how to bring these together. Once your team is brought together and is equipped with knowledge, you can embark on the machine learning today.


Keep in mind, that in Machine Learning, you cannot know what the data will tell you. Rather, you need to start playing with the data. Then you will find trends, clusters, dependencies, anomalies, etc. It is hard to ”invent” a data science project. It needs to come as a result of the work of your researchers, their newly acquired machine learning skills, and their experiments with the data.

We sincerely wish you success on this journey. We at Elephant Scale teach courses in the technologies needed to put this advice into practice, and we can and will be glad to help.



Mark Kerzner
Written by:

Mark Kerzner

Mark Kerzner is the co-founder of Elephantscale. He is a Trainer, Author(AI, Machine Learning, Spark, Hadoop, NoSQL, Blockchain)

Leave a Reply

Your email address will not be published. Required fields are marked *