Terms associated with Big Data.

Apollo teams
Teams of highly capable individuals who can, collectively, perform badly due to excessive and destructive debate, lack of coherence, and over-zealous attention to weakness in other’s arguments.

Big Data
Any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information.
Data sets that are so large or complex that traditional data processing applications are inadequate.
Large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.
3Vs (volume, variety and velocity) are three defining properties or dimensions of Big Data. Volume refers to the amount of data; Variety refers to the number of types of data; Velocity refers to the speed of arrival of that data.

The Chief Executives: CEO, CFO, CMO, CTO, COO, CDO, CIO, CLO, CCO . . .

Data analytics
DA is the science of examining raw data with the purpose of drawing conclusions about that information.

Data artisan
Data artisans are employees who possess a blend of technical skills and business acumen that enables them to extract actionable insight from the huge volumes of data that exist.

Data blending
Data blending is the ability to bring data from multiple data sources into one place, without the need for any special coding

Data insight
A thought, fact, combination of facts, data &/or analysis of data that provides meaning and furthers understanding of a situation or issue that has the potential of benefiting the business or re-directing the thinking about that situation or issue which then, in turn, has the potential for benefiting the business.

Data mining
The practice of examining large pre-existing databases in order to generate new information.

Data scientist
An employee or business intelligence (BI) consultant who excels at analysing data, particularly large amounts of data.

Data scraping
This is a technique in which a computer program extracts data from human-readable output coming from another program – often the internet.

These are companies or organisations that make sophisticated use of data to drive business decisions.

Data wrangler
The person performing the wrangling. In the context of scientific research, the term often refers to a person responsible for gathering and organizing disparate data sets collected from many different sources.

Data velocity
Big Data Velocity deals with the pace at which data flows in from sources such as business processes, machines, networks and human interaction with social media, mobile devices, etc.

An area of human endeavour or other specialised discipline. Specialists and experts develop and use their own domain knowledge.

External data
Data from elsewhere which may be linked to internal data to provide extra insight. Examples may include traffic information for real estate site evaluation, population demographic information for retail network optimisation, etc. To add value, it must be possible to link external data to internal data either by some form of shared ID or, e.g. by location.

Geeks & Nerds
Geek: An enthusiast for a particular topic or field. Geeks are collection-oriented, gathering facts and mementos related to their subject of interest. They are obsessed with the newest, coolest, trendiest things that their subject has to offer.
Nerd: A studious intellectual, although again of a particular topic or field. Nerds are achievement- oriented, and focus their efforts on acquiring knowledge and skills over trivia and memorabilia.

Both are dedicated to their subjects, and sometimes socially awkward. The distinction is that geeks are fans of their subjects, and nerds are practitioners of them.

Internal data
Internal data is self-evident: it may consist of spreadsheets and databases of assets, transactions, personnel records, etc. But it may also consist of unstructured data (see below) in the form of text files, images, diagrams, etc.

Structured data
Information with a high degree of organisation, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward algorithms. It includes spreadsheets, databases and delimited text files.

Unstructured data
The opposite of structured data: it includes text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.


Domain-specific terms

Catchment area
Most retail companies and service organisations (e.g. schools and hospitals) have defined catchment areas. Sometimes these are imposed e.g. for an ambulance service but, other times they derive from a number of factors – drive-times, service levels and similar. The catchment area is often taken as that geographic region from which 80% of the customers/clients/patients are drawn.

Is a measure of the number of individuals moving out of a collective group over a specific period of time. It is most widely applied in business with respect to a contractual customer base. For instance, it is an important factor for any business with a subscriber-based service model, including mobile telephone networks and pay TV operators.

Gravity modelling
The term used in international trade and elsewhere to model (or estimate) the flow of people, goods &/or services between a source of supply and consumption. It is widely used in banking, insurance, retail and e.g. schools to calculate footfall (or pupil numbers). It is based on a variant of Newton’s law (1687) where the mass terms are replaced by availability (of the resource – mainly people) and attractiveness of the destination. This is as applicable to IKEA, the optimisation of a bank network, or to the recent humanitarian exodus from Syria.



The terms used in the area of Big Data, Analytics are constantly evolving. If anybody has extra terms, or comments on the above definitions, then please let us know.