This is the latest version of our course on big data: Watch the full course on coursera at …
The following diagram shows a typical Big Data Infrastructure Design. This is from one of Allied Consultant’s Big Data works.
Cluster Expected Volume Benchmark hardware Project Hardware requirements Cores RAM # nodes Disk Source 6 Million records / month ~ 3 records per second HDFS 6 million/month 1 namenode, 20 datanodes, 2 CPU/node, 64GB RAM/node 1 6G 1 : Master 3: Slaves 120% of 6G =7.2GB/month Kafka 4 topics 6 million/month per topic 1 nodes […]
With the help of Big data, Small Businesses can gain the competitive edge they require to stay ahead of the curve. For beginners, Big data includes large data sets of information that can reveal insights about your customers to help you make valuable business decisions. Data can help you to compose an overall strategy for […]
Data Scientists are known for having a knack for statistics, data analysis, etc. to understand and obtain insights from a given dataset, usually quite enormous in quantity. If you don’t think you have this skill set, but the career appeals to you, you can always take a course like a springboard data science which will […]
Resource Management in Information Technology There is a whole host of technology available nowadays to ensure that your IT hardware resources manages efficiently. You may have a data center in-house, and a few cloud nodes/services/apps which together may constitute your investment in hardware. That would translate to resource capacity: memory, disk, and processor. In the […]
Synchronous vs Async pipelines Synchronous big data pipelines are a series of data processing components that get triggered when a user invokes an action on a screen. e.g. clicking a button. The user typically waits till a response receives to intimate the user of the results. In contrast in asynchronous implementation, the user initiates the […]
There is a lot of hype about “Big Data” solutions with most of our customers. I looked at first a few years ago and I found most things to be very early stage with little genuine intent to implement from customers. However, in the recent past, I have seen an increase in the number of […]
You can also access Part I & Part II of this series “Laying the foundation of a data-driven enterprise with Hadoop“. Hadoop platform has performed well in the batch interactive as well as the real-time data processing if the core is Apache Hadoop. Recently, Hortonworks launched a new technology called Apache NiFi. It was created at […]