BIG DATA

r

Datasets or combinations of datasets whose size (volume), complexity (variability) and speed of growth (velocity) make it difficult to capture, manage, process or analyse them using conventional technologies and tools, such as relational databases and conventional statistics or visualization packages, within the time necessary to make them useful.

CHARACTERISTICS

Veracity

r

Veracity refers to the quality of the data that is being analyzed. High  veracity data has many records that are valuable to analyze and that contribute  in a meaningful way to the overall results. Low veracity data, on the other  hand, contains a high percentage of meaningless data. The non-valuable in  these data sets is referred to as noise.

Example

Data from a medical experiment

Variety

r

Variety makes Big Data really big. Big Data comes from a great variety  of sources and generally is one out of three types: structured, semi structured  and unstructured data. The variety in data types frequently requires distinct  processing capabilities and specialist algorithms.

Example

CCTV audio and video files that are generated at various locations in a city.

Velocity

r

Velocity refers to the speed with which data is generated. High velocity  data is generated with such a pace that it requires distinct (distributed)  processing techniques.

Example

Twitter messages or Facebook posts.

Volume

r

The volume of data refers to the size of the data sets that need to be  analyzed and processed, which are now frequently larger than terabytes and  petabytes. The sheer volume of the data requires distinct and different  processing technologies than traditional storage and processing capabilities. In  other words, this means that the data sets in Big Data are too large to process  with a regular laptop or desktop processor

Example

credit card transactions on a day within Europe.