Successful data mining
The hype about Big Data has made us forget that the use of a search engine has become an indispensable part of our daily work. We research facts, people, innovations or we shop electronically. Search and data analysis have become a regular part of our everyday lives. Our expectations of the quality of the results are high.
The technical and content effort to achieve the results is through the major providers is huge. Inside a company it is often not possible to achieve comparable results. The reasons for this are often not of a technical nature. Unstructured gathered data are an essential barrier for the use of information in search engines. This kind of information is often worthless for automatic decision support.
Systems that are to be used for decision support must have a high data quality. The promised benefits of Big Data and Artificial Intelligence can only be achieved with high data quality.
If it is already difficult to establish a consistent data schema for technical data, this becomes an almost impossible undertaking for unstructured external data.
To be able to recognize patterns, patterns have to be made. Unstructured data like e.g. in mails or photos is often of little value for pattern recognition. Read more about this in our blog.
Recognizing patterns has great advantages in a complex world. If the number of influencing variables increases explosively, the lifetime of a human being is no longer sufficient to analyze these data sets meaningfully. However, in order to be able to analyze really large data sets, important technical requirements must be met.
It is not surprising that the essential and fundamental technologies for Big Data come from a company with the undisputedly largest amounts of data. MapReduce is a programming model introduced by Google for concurrent calculations over (several petabytes) large amounts of data on computer clusters. Among other things, the Hadoop framework supported by the Apache Foundation is based on the MapReduce algorithm. Other products such as Hbase (a simple database for managing extremely large amounts of data) and Hive (a project originally developed by Facebook with data warehouse functionality) are based on Hadoop.