What is Big Data?
Before this lets first understand what is Data ? Data is the quantities, characters, or symbols on which operations are performed by a computer. Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.
Three types of Big Data
The huge data collection can be differentiate in three different types, nothing but the mode of data collection
Volume: The quantity of data being generated and stored. this is typically processed in high volumes and with unstructured data. Examples Facebook data feeds or netflix stream on a web page or a mobile application.
Velocity : The speed in which the data is being generated and processed. High velocity data streams directly into the memory instead of being written in the database and is often in real time which allows to analyze data as it is being generated for our needs.
Variety : Refers to the many types of data that are available. 80% of the world’s data is now unstructured. With Big Data, texts, images, audio, and video can now be processed and analyzed to derive meaning and insight.
Why Big Data ?
Fact: Every data more and more data generated around the world and it keeps growing. According to IBM, every day 2.5 exabytes (2.5×1018) of data is being generated and this number continues to grow. The IDC estimates that between 2013 and 2020, the amount of data that we generate will increase from 4.4 zettabytes to 44 zettabytes. And by 2025, there will be over 163 zettabytes of data. That’s a lot of data!
Utilize Big Data
Predictive analytics: Utilizing a variety of statistical techniques such as data mining(discovering patterns in large data sets), predictive modeling(uses statistics to predict outcomes), and machine learning(using statistical techniques to give computer machines the ability to “learn”) to analyze and predict future events.
User behavior analytics(UBA): Defined by Gartner as a cybersecurity process where Big data is used to analyze human behavior patterns and then algorithms and statistical analysis is deployed to predict and indicate potential threats. Instead of tracking devices or security events, UBA tracks a system’s users for any suspicious activity.
How to Process and Analyzed the Data ?
The most popular software being used to analyze big data is Apache Hadoop which was released in 2011. It utilizes Hadoop Distributed File System (HDFS) and MapReduce. Hadoop separates files into large blocks and distributes them across nodes into a cluster. Each node can then manipulate the data that it has access to. This allows for faster and more efficient data processing.
MapReduce refers to the two separate tasks(Map and Reduce) that allows Hadoop to separate the data, convert it into another set of data, and then broken down into key-value pairs. The key-value pairs are then broken down into smaller sets of key-value pairs with reduce




Comments
Post a Comment