BIG DATA

r

Data sets or combinations of data sets whose size (volume), complexity (variability), and rate of growth (velocity) make it difficult to capture, manage, process, or analyze using conventional technologies and tools.

CHARACTERISTICS

r

They are called the four v's

VOLUME

r

The size of the data sets that must be analyzed and processed, which now often exceed terabytes and petabytes. It requires distinct processing technologies and dissimilar to traditional storage and processing capabilities.

example

r

The set would be all credit card transactions in one day within Europe.

VERACITY

r

the quality of the data being analysed.

LOW VERACITY DATA

r

High truth data has many records that are valuable to analyze and that contribute significantly to the overall results.

HIGH VERACITY DATA

r

Low veracity data contains a high percentage of nonsense data. What has no value in these data sets is called noise.

example

r

Data from a medical experiment or trial.

VARIETY

r

Variety makes Big Data really big.

example

r

Large variety data sets would be CCTV video and audio files that are generated at various locations in a city.

TYPE

r

The variety of data types often requires specialized algorithms and processing capabilities.

STRUCTURED

NOT STRUCTURED

r

documents, videos, audios, etc..

SEMI-STRUCTURED

r

software, spreadsheets, reports.

VELOCITY

r

the speed with which the data is generated. High-speed data is generated at such a rate that it requires different (distributed) processing techniques.

example

r

data that is generated with high speed would be Twitter messages or Facebook posts.