Hadoop Foundation II: What type of data Hadoop can help me with?
Part I of the series “Laying the foundation of a data-driven enterprise with Hadoop” is accessible here.
Any type of data can use Hadoop, whether it is batch interactive or real-time. It can be applied to any data whether it is a traditional system or coming from the internet of things (IoT) and be deployable anywhere whether it is on-premise, cloud, appliances, Linux, and Windows. A pretty important thing is that it enables a consistent experience for bringing that data together in a way that is interoperable with tools you already have.
At the center of the platform is the technology we call YARN. We view it as a data operating system just like Windows with the power of multitasking applications that run on tops of it like Microsoft Office or Adobe Photoshop, so YARN is that sort of an operating platform for Hadoop that enables a wide range of data processing engines, open source as well as from partners such as Microsoft’s HDInsights, Talend and others that run natively on the platform to get benefits of the scalability.
Hadoop: A modern platform
For a modern data platform, you need operations, security, and governance so these capabilities builds into the platform. This way it is easy to manage, monitor, and provision, on-premise or in the cloud, and manage high availability. It should also help manage the lifecycle of the platform as well as the workloads that are running on the platform and get active alerts when you need to do parent feeding for workloads.
Data governance is important to be able to manage data to add its life cycle or understand the linear algebra data. Hadoop is no different than any data system you have in your enterprise. Most of them participate in data governance.
Top use case: The Single View of X
In the top middle, probably the top 2 use cases we see are the single view use cases. It is the single view of the customer, a single view of the product, a single view of the supply chain, and a single view of patients.
Being able to collect disparate data arguably from silo data sets and brings them together where you can join them in a way that you haven’t been able to before is a very big use case to drive additional revenue or better care.
The world of fast data, data in motion, as well as rich historical data, deep historical machine learning, and data modeling underpin predictive analytics. In many cases, you will see businesses transforming themselves with predictive analytics applications, so that’s the landscape of the journey where folks will pick one and move to others in their journey.
Build on top of simple use cases OR use them as reference points
Single view use case
To give you an example, let’s have a look at Mercy Corps. Since, most of us are patients at a given point, in our birth or our life-cycle, and they are really about delivering transformational outcomes at scale when it comes to patients. They have been onto their journey of 1 Patient/1 record, clearly a single view use case. They have a million patients that they deal with across hospitals and clinics, basically, they bring in data from electronic EHRs called EPIC. They bring it into Hadoop where they can begin to join it, and aggregate it with other data sources, around the patient, or the clinical care lab, they can also onboard the real-time patient sensor data so they can perform a better analysis of the patient to deliver better care. They have internal systems data, and third-party data sets so they can bring data into a central location, to be able to provide that single view of the patient.
Text notes
Beyond a single view of a patient, another use case is bringing the free-text lab notes online that historically Mercy Corps was never able to search across. So, they went from the ability to never discover insights to a matter of seconds. So, if we look at their Mercy Corps towards becoming data-driven, it started from the lower hexagon that is cost savings use cases, and they went above the line into the transformational business outcome use case.
They are continuing their journey with things like vital signs monitoring, preventive care, medical decision support or device data ingest and they are working on lab notes archive, operational efficiencies, and a single view from a doctor’s perspective.
The series bases on the Hortonworks webinar titled “Laying the foundation for a Data-Driven Enterprise with Hadoop”. It is accessible here.