Business intelligence tools
Business intelligence tools are a type of application software designed to retrieve, analyze, transform and report data
for business intelligence. The tools generally read data that have been previously
stored, often, though not necessarily, in a data warehouse or data mart.
1. Types
of business intelligence tools
3. Open
source commercial products
Types of business
intelligence tools
The key general categories of business
intelligence tools are:
·
Reporting
and querying software: tools that
extract, sort, summarize, and present selected data
·
OLAP: Online analytical processing
Except for spreadsheets, these tools are
sold as standalone tools, suites of tools, components of ERP systems, or as components of software targeted to a
specific industry. The tools are sometimes packaged into data
warehouse appliances.
Open source free
products
·
Eclipse BIRT (The Business
Intelligence and Reporting Tools) Project
·
SpagoBI
·
R
o
R is a programming language and software environment for statistical computing and graphics. The R language is widely
used among statisticians and data minersfor developing statistical software[2][3] and
data analysis.[3] Polls, surveys of
data miners, and studies of scholarly literature databases show that
R's popularity has increased substantially in recent years.[4][5][6][7]
o
R is an implementation of the S programming
language combined with lexical scoping semantics inspired by Scheme.[8] S was created by John Chambers while atBell Labs.
There are some important differences, but much of the code written for S runs
unaltered.
o
R was created by Ross Ihaka and Robert
Gentleman[9] at
the University of
Auckland, New Zealand, and is currently developed by the R Development Core Team, of
which Chambers is a member. R is named partly after the first names of the
first two R authors and partly as a play on the name of S.[10]
o
R is a GNU project.[11][12] The source code for
the R software environment is written primarily in C, Fortran, and R.[13] R
is freely available under the GNU General
Public License, and pre-compiled binary versions are provided for
various operating systems. R uses a command line
interface; there are also several graphical
front-ends for it.
·
KNIME
·
TACTIC
Open source
commercial products
·
Jaspersoft: Reporting, Dashboards, Data Analysis, and Data
Integration
·
Palo (OLAP database): OLAP Server, Worksheet Server and ETL Server
·
Pentaho: Reporting, analysis, dashboard, data mining and
workflow capabilities
·
TACTIC: Reporting, management, dashboard, data mining and
integration, workflow capabilities
Proprietary free
products
·
Biml - Business
Intelligence Markup Language
·
Birst
·
icCube
·
InetSoft
o
Tableau Software (/tæbˈloʊ/ tab-loh)
is an American computer software company
headquartered in Seattle, Washington. It produces a family of interactive data visualization products
focused on business intelligence.[2]
o Tableau offers five main products: Tableau Desktop, Tableau
Server, Tableau Online, Tableau Reader and Tableau Public. Tableau Public and
Tableau Reader are free to use, while both Tableau Server and Tableau Desktop
come with a 14 day fully functional free trial period, after which the user
must pay for the software. Tableau Desktop comes in both a Professional and a
lower cost Personal edition. Tableau Online is available with an annual
subscription for a single user, and scales to support thousands of users. [14]
Proprietary
products
·
ApeSoft
·
Birst
·
BOARD
·
ComArch
·
Dundas Data
Visualization, Inc.
·
Grapheur, implementing the reactive
business intelligence (RBI)
approach
·
GoodData - Cloud Based
·
icCube
·
IDV Solutions Visual Fusion
·
InetSoft
·
InfoZoom
·
Jackbe
·
Jaspersoft (now TIBCO, iReport,Jasper Studio,
Jasper Analysis, Jasper ETL, Jasper Library)
·
Jedox
·
JReport (from Jinfonet Software)
·
Looker
·
myDIALS
·
Oracle
o Hyperion
Solutions Corporation
o Business Intelligence Suite Enterprise Edition (OBIEE – Oracle BI Enterprise Edition)
·
Pentaho (now Hitachi Data Systems)
·
PRELYTIS
·
Qlik
·
Quantrix
·
Roambi
·
SAP
NetWeaver Business Intelligence
·
SiSense
·
SAS
·
TARGIT
Business Intelligence
·
Teradata
·
XLCubed
·
Yellowfin
Business Intelligence
·
Zoho Reports (as part of
the Zoho Office Suite)
BI market
Vendors in the business intelligence
space are often categorized into:
· The consolidated big four "mega vendors"
o Oracle Hyperion
o
SAP BusinessObjects,
o
IBM Cognos, and
o Microsoft BI.[14]
· The independent "pure-play" vendors:
o MicroStrategy,
o Tableau,
o QlikView
§ QlikView – a self-service BI tool or business discovery platform from Qlik
§ The QlikView Business Discovery platform delivers true self-service BI that empowers business users by driving innovative decision-making.
o SAS
· Oracle Business Intelligence Foundation Suite
o Oracle
Business Intelligence Enterprise Edition (OBIEE)
o Oracle
Scorecard and Strategy Management
o Oracle
Essbase
o Oracle
Essbase Analytics Link for Hyperion Financial
Management
o Oracle
Business Intelligence Publisher
o Oracle
Business Intelligence Mobile
Oracle Business Intelligence Foundation Suite
includes the following capabilities:
o Enterprise BI Platform (OBIEE)
OBIEE (Oracle Business
Intelligence Enterprise Edition) OBIEE Plus (Oracle Business Intelligence
Enterprise Edition Plus) - is Oracle Corporation's set
of business
intelligence tools consisting of former Siebel Systems business
intelligence and Hyperion Solutions business intelligence offerings.
The industry counterpart and main competitors of OBIEE are
Microsoft BI, IBM Cognos, SAP
AG Business
Objects and SAS Institute Inc. The products currently leverage a common BI Server
§
Siebel Systems
Siebel CRM
Systems, Inc. was a software company
principally engaged in the design, development, marketing, and support of customer relationship management (CRM)
applications.
Principal competitors included Oracle Corporation (now Siebel's owner); SAP America Inc.; Vantive Corporation (subsidiary of PeopleSoft Inc., now owned by Oracle); Sage Group; Clarify Corporation (subsidiary of Amdocs); SAS Institute Inc.; Epiphany, Inc.; Broadbase Software Inc; Salesforce.com; Microsoft Dynamics; SugarCRM.
Siebel Systems competed directly
with Oracle and SAP. These competing software suites gradually
developed HR, Financial and ERP packages that were readily integrated and thus
did not require specialists to deploy,[citation needed] enabling them to steadily erode
Siebel's market share.
§
Hyperion
In 2007, Oracle acquired Hyperion, a leading provider of performance management software.
The transaction extends Oracle's business intelligence capabilities to offer
the most comprehensive system for enterprise performance management.
Over the past few years, Oracle has significantly reoriented our
business intelligence product strategy, shifting our focus from offering a
solution that works for Oracle-only environments towards offering a
best-of-breed business intelligence and performance management product family
that will work with heterogeneous information sources in an enterprise, both Oracle
and non-Oracle.
The acquisition of Hyperion extends our business intelligence
product strategy. Customers are increasingly using performance management and
business intelligence together. Hyperion adds complementary products to
Oracle's business intelligence offerings including a leading enterprise
planning solution, world-class financial close and reporting products, and a
powerful multi-source OLAP server. Coupled with Oracle's BI tools and
pre-packaged analytic applications, the combination redefines business
intelligence and performance management by providing the first integrated,
end-to-end Enterprise Performance Management System that spans planning,
financial close, operational analytic applications, BI tools, reporting, and
data integration, all on a unified BI platform.
After the acquisition, we
introduced a new product family called Oracle Business Intelligence Enterprise Edition Plus.
This integrated suite includes all of the Oracle and Hyperion reporting and
analysis tools. Since 2007, we have released new versions of the Oracle
Enterprise Performance Management System, which include continued innovations
and capabilities to improve business insights and decision-making. Read more about the latest release. In 2010,
we released Oracle
Business Intelligence 11g,
which provides the most complete and integrated suite of BI tools in the
market.
Hyperion
Solutions Corporation was an Enterprise Performance Management software company, located in Santa Clara, California, USA, which was acquired by Oracle Corporation in 2007. Many of its products were targeted at
the business intelligence (BI) and business performance management markets, and as of 2013 are still actively developed and sold
by Oracle as Oracle Hyperion products.
Hyperion software products include:
·
Essbase
·
Hyperion Intelligence and
SQR Production Reporting (products acquired in 2003 takeover of Brio
Technology)
·
Hyperion Enterprise
·
Hyperion Strategic
Finance
·
Hyperion Financial
Data Management
·
Hyperion Enterprise
Performance Management Architect
·
Hyperion Financial
Close Management
·
Hyperion Disclosure
Management
·
Hyperion Performance
Scorecard
·
Hyperion Business
Modelling
·
Hyperion Financial
Management
·
Hyperion Master Data
Management/Oracle Data Relationship Management
·
Hyperion Financial
Reporting
·
Hyperion Web Analysis
·
Hyperion SmartView
·
Hyperion EPM Workspace
·
Hyperion Profitability
and Cost Management
·
Hyperion System 9 BI+
(a combination of Interactive Reporting, SQR, Web Analysis, Financial
Reporting, EPM Workspace and SmartView)
·
Hyperion Financial
Data Quality Management (Also referred to as FDM EE)
·
Hyperion Tax Provision
·
Planning Budgeting
Cloud Service
o OLAP Analytics
o Scorecard and Strategy Management
o Mobile BI
o Enterprise Reporting
·
Teradata – for
data warehousing
Teradata
Corporation is a
publicly-held international computer company that sells analytic data platforms,
marketing applications and related services. Its analytics products are meant
to consolidate data from different sources and make the data available for
analysis. Teradata marketing applications are meant to support marketing teams
that use data analytics to inform and develop programs. In early 2015, the
company formed two divisions: Data & Analytics for its data analytics
platforms and related services and Marketing Applications for its marketing
software and related services.[4] The corporate headquarters are in Miamisburg, Ohio.
Teradata is an enterprise software company that develops and sells a relational database
management system (RDBMS) with the
same name. Teradata is publicly traded on the New York Stock Exchange (NYSE)
under the stock symbol TDC.[5]
The Teradata product is referred to as a "data warehouse
system" and stores and manages data. The data warehouses use a
"shared nothing" architecture, which means that each server node has
its own memory and processing power.[6] Adding more servers and nodes increases the amount of data that can be stored.
The database software sits on top of the servers and spreads the workload among
them.[7] Teradata sells applications and software
to process different types of data. In 2010, Teradata added text
analytics to track unstructured
data, such as word
processor documents, and semi-structured data, such as spreadsheets.[8]
Teradata's product can be used for business analysis. Data
warehouses can track company data, such as sales, customer preferences, product
placement, etc.[7]
Teradata is a massively parallel processing system running a shared-nothing architecture.[29] Its technology consists of hardware, software,
database, and consulting. The system moves data to a data
warehouse where it
can be recalled and analyzed.[13]
The systems can be used as
back-up for one another during downtime, and in normal operation balance the
work load across themselves.[30]
In 2009, Forrester Research
issued a report, "The Forrester Wave: Enterprise Data Warehouse
Platform," by James Kobielus,[31] rating Teradata the industry's number one enterprise
data warehouse platform in the "Current Offering" category.
Marketing research company Gartner
Group placed
Teradata in the "leaders quadrant" in its 2009, 2010, and 2012
reports, "Magic Quadrant for Data Warehouse Database Management
Systems".[32][33]
Teradata is the most popular data
warehouse DBMS in the DB-Engines database ranking.[34]
In 2010, Teradata was listed in Fortune’s annual list of Most Admired Companies.[35]
Active enterprise data warehouse
Teradata Active Enterprise Data
Warehouse is the platform that runs the Teradata Database, with added data
management tools and data
mining software.
The data warehouse differentiates
between “hot and cold” data – meaning that the warehouse puts data that is not
often used in a slower storage section.[36] As of October 2010, Teradata uses Xeon 5600
processors for the server nodes.[37]
Teradata Database 13.10 was announced
in 2010 as the company’s database software for storing and processing data.[38][39]
Teradata Database 14 was sold as
the upgrade to 13.10 in 2011 and runs multiple data warehouse workloads at the
same time.[40] It includes column-store analyses.[41]
Teradata Integrated Analytics is
a set of tools for data analysis that resides inside the data warehouse.[42]
Backup, archive, and restore
BAR is Teradata’s backup and
recovery system.[43]
The Teradata Disaster Recovery
Solution is automation and tools for data recovery and archiving. Customer data
can be stored in an offsite recovery center.[44]
Platform family
Teradata Platform Family is a set
of products that include the Teradata Data Warehouse, Database, and a set of
analytic tools. The platform family is marketed as smaller and less expensive
than the other Teradata solutions.[45]
Teradata's
main competitors are similar products from vendors such as Oracle, IBM, Microsoft and Sybase IQ. Also, competitors include data warehouse appliance vendors
such as Netezza[69] (acquired in November 2010 by IBM), DATAllegro (acquired in August 2008 by Microsoft), ParAccel, Pivotal Greenplum Database,
and Vertica Systems (acquired in February 2011 by HP), and from packaged data warehouse
applications such as SAP BW and Kalido.
·
Hadoop
Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed
processing of very
large data sets on computer clusters built from commodity hardware.
All the modules in Hadoop are designed with a fundamental assumption that
hardware failures (of individual machines, or racks of machines) are
commonplace and thus should be automatically handled in software by the
framework.
The core of Apache Hadoop
consists of a storage part (Hadoop Distributed File System (HDFS)) and a
processing part (MapReduce). Hadoop
splits files into large blocks and distributes them amongst the nodes in the
cluster. To process the data, Hadoop MapReduce transfers packaged code for nodes to process in parallel,
based on the data each node needs to process. This approach takes advantage of
data locality[3]—nodes manipulating the data that
they have on hand—to allow the data to be processed faster and more efficiently than it
would be in a more conventional supercomputer
architecture that
relies on a parallel file system where computation and data are
connected via high-speed networking.[4]
The base Apache Hadoop framework
is composed of the following modules:
- Hadoop
Common – contains
libraries and utilities needed by other Hadoop modules;
- Hadoop
Distributed File System (HDFS) – a distributed file-system that stores
data on commodity machines, providing very high aggregate bandwidth across
the cluster;
- Hadoop
YARN – a
resource-management platform responsible for managing computing resources in
clusters and using them for scheduling of users' applications;[5][6] and
- Hadoop
MapReduce – a programming
model for large scale data processing.
The term "Hadoop" has
come to refer not just to the base modules above, but also to the
"ecosystem",[7] or collection of additional software packages
that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others.[8][9]
Apache Hadoop's MapReduce and
HDFS components were inspired by Google papers
on their MapReduce and Google File System.[10]
The Hadoop framework itself is
mostly written in the Java programming language, with some native code in C and command line utilities written as Shell script. For end-users, though MapReduce
Java code is common, any programming language can be used with "Hadoop
Streaming" to implement the "map" and "reduce" parts
of the user's program.[11] Other related projects expose other
higher-level user interfaces.
Prominent corporate users of
Hadoop include Facebook and Yahoo. It can be deployed in traditional on-site
datacenters but has also been implemented in public cloud spaces such as Microsoft Azure,Amazon Web Services, Google Compute Engine,
and IBM Bluemix.
Apache Hadoop is a registered
trademark of the Apache Software
Foundation.
· Top 45 Big Data Tools for Developers
https://blog.profitbricks.com/top-45-big-data-tools-for-developers/
· 43 Bigdata Platforms and Bigdata Analytics Software
1. IBM Bigdata Analytics
IBM Bigdata Analytics solution portfolio includes InfoSphere Streams ,
InfoSphere BigInsights , IBM Watson Explorer , IBM PureData powered by Netezza technology , DB2 with BLU Acceleration , IBM Smart Analytics System , InfoSphere Information Server and InfoSphere Master Data Management.
2. HP Bigdata
HP’s Bigdata Analytics solution includes HP HAVEn and HP Vertica. HP HAVEn is a platform comprised of software, services, and hardware. Big Data of any type either structured or unstructured can be analyzed to lead to powerful strategic insights. HP Vertica Dragline let organizations store their data in a cost effective manner, and provide capabilities to explore it quickly using SQL based tools.
3. SAP Bigdata Analytics
SAP Bigdata Analytics platform includes In Memory Platform called, SAP HANA, and SAP IQ, which is a column oriented, grid based, massively parallel processing database. There is also SAP HANA platform and Apache Hadoop solution available together. Bigdata Analytics solutions include the Predictive Analytics and Text Analytics solutions.
4. Microsoft Bigdata
Microsoft Azure is an open and flexible cloud platform which enables to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. The applications can be built using any language, tool or framework and can integrated with other public cloud applications in the IT environment.
5. Oracle Bigdata Analytics
Oracle Bigdata Analytics solutions include Oracle Big Data Appliance, Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine. These are engineered Systems which are pre-integrated to reduce the cost and complexity of IT infrastructures. The database include Oracle Database, Oracle NoSQL Database, MySQL and MySQL Cluster, Oracle Event Processing, Oracle NoSQL Database and Oracle Coherence, Oracle Endeca Information Discovery and in database analytics.
6. Talend Open Studio
Talend Open Studio is a versatile set of open source products for developing, testing, deploying and administrating data management and application integration projects. Talend delivers the only unified platform that makes data management and application integration easier by providing a unified environment for managing the entire lifecycle across enterprise boundaries.
Talend’s products dramatically lower the adoption barrier for businesses wanting powerful packaged solutions to operational challenges like data cleansing, master data management, and enterprise service bus deployment.
Leveraging and extending leading Apache technologies, Talend’s open source ESB and open source SOA solutions help organizations to build flexible, high-performance enterprise architectures that integrate and service-enable distributed applications.
7. Teradata Bigdata Analytics
Teradata has built a simple architecture called, the Unified Data Architecture in Bigdata Analytics. The Teradata Aster Discovery Platform ease the discovery of crucial business insights from all data types. With its powerful analytic applications coupled with minimal time and effort requirements, it provides the discovery insights needed for sophisticated companies today.
8. SAS Bigdata Analytics
SAS Bigdata Analytics solution portfolio includes Credit Scoring for SAS
Enterprise Miner, SAS High-Performance Data Mining, SAS Model Manager, SAS Scoring Accelerator, SAS Text Miner and SAS Visual Statistics.
9. Dell Bigdata Analytics
Dell Bigdata Analytics includes Kitenga Analytics Suite, Boomi AtomSphere and SharePlex Connector for Hadoop. Kitenga Analytics Suite provides you with integrated information modeling and visualization capabilities in a big data search and business analytics platform.
10. HPCC Systems Big data
HPCC Systems is an Open-source platform for Big Data analysis. The Data Refinery engine called Thor, clean, link, transform and analyze Big Data. Thor supports ETL (Extraction, Transformation and Loading) functions like ingesting unstructured/structured data out, data profiling, data hygiene, and data linking out of the box. The Data Delivery engine (Roxie) provides highly concurrent and low latency real time query capability. The Thor processed data can be accessed by a large number of users concurrently in real time fashion using the Roxie.
The programming language, Enterprise Control Language (ECL), is used to program both the data processing jobs on Thor and the queries on Roxie.
11. Palantir Bigdata
Palantir Bigdata solution includes Palantir Gotham to integrate, manage, secure, and analyze all of the enterprise data and Palantir Metropolis to integrate, enrich, model, and analyze any kind of quantitative data.
12. Pivotal Bigdata
Pivotal Big Data solutions help to discover insight from all data to build applications that serve customers in the context to store, manage, and deliver value from fast, massive data sets using the most disruptive set of enterprise data products such as MPP and column store databases, in-memory data processing, and Hadoop.
13. Google BigQuery
Google BigQuery is a web service that enables companies to analyze massive datasets using Google’s infrastructure. This can analyze up to billions of rows in seconds. It is scalable and easy to use with the familiar SQL query language. BigQuery lets developers and businesses tap into powerful data analytics on demand against multi-terabyte datasets in seconds.
14. Pentaho Big Data Analytics – A Hitachi Data Systems Company
Pentaho Big Data Analytics offers a comprehensive and unified solution that supports the entire big data lifecycle. Regardless of the data source, within a single platform the solution provides visual big data analytics tools to extract and prepare the data plus the visualizations and analytics. The Open, standards based architecture, make it easy to integrate with or extend existing infrastructure.
15. Amazon Web Service
Amazon Web Services provides cloud based analytics services to help you process and analyze any volume of data, whether your need is for managed Hadoop clusters, real-time streaming data, petabyte scale data warehousing, or orchestration.
16. Cloudera Enterprise Bigdata
Cloudera Enterprise includes CDH (Cloudera's distribution of Apache Hadoop), the open source Hadoop-based platform, as well as advanced system management and data management tools plus dedicated support and community advocacy.
17. Hortonworks Data Platform
HDP is a platform for multi-workload data processing across an array of processing methods – from batch through interactive to real-time – all supported with solutions for governance, integration, security and operations.
18. FICO Bigdata Analytics
FICO offers comprehensive Big Data Analytics software solutions, Predictive Analytics and Business Intelligence tools including FICO Data Orchestrator, FICO Decision Management Platform, FICO Decision Optimizer, FICO Model Builder, FICO Model Central Solution, FICO Predictive Analytics and FICO Solution Stack.
19. Cisco Bigdata
Cisco UCS Common Platform Architecture (CPA) for big data includes computing, storage, connectivity, and unified management capabilities. Unique to this architecture are transparent, simplified data and management integration with an enterprise application ecosystem.
20. Splunk Bigdata Analytics
Splunk offers a portfolio of Bigdata Analytics software such as Hunk: Splunk Analytics for Hadoop, NoSQL Data Stores, Splunk Hadoop Connect, Hadoop Management and Splunk DB Connect.
21. Fusion-io Bigdata
Fusion-io solutions eliminate the random workload performance deficiencies common to MongoDB, Cassandra and NoSQL databases, such as HBASE, while reducing the operational overhead of their conventional scale out architectures. Fusion based solutions deliver predictable and consistently high performance across the entire database, resulting in a more efficient overall system that can require fewer nodes, less DRAM, and use less energy for power and cooling.
22. Intel Bigdata
Intel portfolio includes technology products such as Intel Xeon processors, 10 Gigabit server adapters, SSDs, and the Intel Distribution improve performance for big data projects.
23. Mu Sigma Bigdata
Mu Sigma’s platforms for Data Sciences include muXo, muHPC and muText. muXo is an advanced decision optimization engine designed to solve complex business problems. It provides a suite of constantly evolving, cutting-edge meta-heuristic algorithms. muHPC is a suite of popular statistical algorithms, integrated in the form of R packages, for Big Data analysis. Written in MapReduce, muHPCTM algorithms leverage the power of parallel computation. Mu Sigma’s text mining engine enables knowledge discovery from unstructured and semi-structured data.
24. MicroStrategy Bigdata
MicroStrategy Bigdata solution called PRIME, which is deployed on the Cloud, provides visualization and dashboarding engine with an innovative massively parallel in-memory data store. This architecture allows companies to rapidly build and deploy powerful information-driven apps that deliver analytics to hundreds of thousands of users in a fraction of the time and cost of other approaches.
25. Opera Solutions Bigdata
Opera Solutions Bigdata solution Vektor Big Data analytics and Signal-processing platform integrates Big Data flows from both inside and outside the enterprise; provides the technology to identify, extract, and store Signals; and supports deployment of all Signal Apps.
26. Redhat Bigdata
Majority of big data implementations run on Linux. Red Hat Enterprise Linux is a leading platform for big data deployments. Red Hat Enterprise Linux excels in distributed architectures and includes features that address critical big data needs. Managing tremendous data volumes and intensive analytic processing requires an infrastructure designed for high performance, reliability, fine-grained resource management, and scale-out storage.
27. Informatica Bigdata
Informatica PowerCenter Big Data Edition provides a safe, efficient way to integrate all types of data on Hadoop at any scale without having to learn Hadoop.
28. MarkLogic Bigdata
MarkLogic Bigdata solution the Enterprise NoSQL database, brings all the features into one unified system: a document-centric, schema-agnostic, structure-aware, clustered, transactional, secure, database server with built-in search and a full suite of application services.
29. Vmware Bigdata
vSphere is a robust, high-performance virtualization layer that abstracts server hardware resources and makes them shareable by multiple virtual machines. Runs Hadoop workloads on vSphere to achieve higher utilization, reliability and agility.
30. Syncsort Bigdata
Syncsort Hadoop Solutions helps on the challenges of collecting, processing and integrating data in Hadoop. It remove barriers for wider Hadoop adoption: connect, develop, deploy, re-use, and accelerate. No programming or tuning are required.
31. SGI Bigdata
SGI InfiniteData Cluster offers the compute platform for Hadoop Solutions with cluster installations now reaching tens of thousands of nodes.SGI UV offers the solution with the industry’s most powerful shared memory platform to find hidden data relationships or perform real-time analysis.
32. MongoDB
MongoDB is the leading NoSQL database, empowering businesses to be more agile and scalable. Fortune 500 companies and startups alike are using MongoDB to create new types of applications, improve customer experience, accelerate time to market and reduce costs.
33. Guavus Bigdata
The Guavus Reflex platform is capable of creating actionable information from widely distributed, high volume data streams in near real-time. Reflex uses highly optimized computational algorithms and machine learning to distill actionable insights from very large datasets.
34. Alteryx Bigdata
Alteryx Bigdata solution access, integration, and cleaning of sources of data as varied as Hadoop (including Cloudera & MapR) or NoSQL (MongoDB) and Excel or Teradata with predictive and spatial tools, combined in a simple, workflow design environment.
35. 1010data Advanced Analytics
The 1010data analytics platform includes advanced, built-in analytic functions such as Statistics (distribution analysis, correlation, and variance), Predictive modeling and forecasting (linear and multivariate regression, logistic regression), Machine learning (clustering analysis, Markov chains for Monte
Carlo simulations, principal component analysis). These functions are integrated directly into the system, so they run incredibly quickly on large volumes of data
36. Actian Analytics Platform
Actian Analytics Platform deliver the next generation analytics in three
Editions - Extreme Performance Edition, Hadoop SQL Edition, Cloud Edition. Extreme Performance Edition accelerates the analytics value chain from connecting to massive amounts of raw big data all the way to delivering actionable business value from sophisticated analytics. Hadoop SQL Edition accelerates Hadoop and makes it enterprise-grade by providing high-performance data enrichment, visual design and SQL analytics on Hadoop without the need for MapReduce skills. Cloud Edition integrates cloud and on-premises applications while providing robust data quality and other data services.
37. MapR
The MapR Distribution for Apache Hadoop provides organizations with an enterprise grade distributed data platform to reliably store and process big data. MapR packages a broad set of Apache open source ecosystem projects enabling batch, interactive, or real time applications.
38. Tableau Software bigdata
Tableau Software bigdata solutions connect to any data, anytime and anywhere, regardless of its size and complexity or mix of unstructured and structured data with the technologies like Google BigQuery and a variety of Hadoop flavors.
39. QlikView Bigdata
QlikView offers two approaches to handling Big Data, both deliver the same great user experience. Either with QlikView’s 100% In-Memory Architecture or QlikView Direct Discovery, which is a hybrid approach that leverages both in-memory data and data that is dynamically queried from an external source.
40. Attivio’s Bigdata
Attivio’s Active Intelligence Engine combines Big Data and Big Content, including Hadoop. Universal indexing and automatic ad hoc JOIN of all information matching a given query, without costly data modeling and with full security. There is also advanced text analytics that adds context and signals from human-generated information sources and support for business intelligence/data visualization tools.
41. DataStax Bigdata
DataStax Enterprise (DSE), which is built on Apache Cassandra, delivers what Internet Enterprises need to compete in today. With in-memory computing capabilities, enterprise-level security, fast and powerful integrated analytics and enterprise search, visual management, and expert support, DataStax Enterprise is the leading distributed database choice for online applications that require fast performance with no downtime.
42. Gooddata
The GoodData Platform is a portfolio of tools, APIs and frameworks, which makes the key components of a BI solution to collect, store, combine, analyze, and visualize. These were built to exist in the cloud and be delivered as an end-to-end service.
43. GE Bigdata
The Industrial Internet coordinate multiple industrial applications to work intelligently in order to optimize entire operational environments.
· NoSQL
A NoSQL (originally referring to "non
SQL"[1]) database provides a mechanism
for storage and retrieval of
data that is modeled in means other than the tabular relations used in relational databases.
Such databases have existed since the late 1960s, but did not obtain the
"NoSQL" moniker until during a surge in popularity in the early
twenty-first century,[2] triggered by the storage needs of Web 2.0 companies
such as Facebook, Google and Amazon.com.[3][4][5]
Motivations for this approach
include simplicity of design; simpler "horizontal"
scaling to clusters of
machine, which is a problem for relational databases;[2] and finer control over availability. The data
structures used by NoSQL databases (e.g. key-value, graph, or document) differ
slightly from those used by default in relational databases, making some
operations faster in NoSQL and others faster in relational databases. The
particular suitability of a given NoSQL database depends on the problem it must
solve. Sometimes the data structures used by noSQL
databases are also viewed as "more flexible" than relational database
tables.[6]
NoSQL databases are increasingly
used in big data and real-time web applications.[7] NoSQL systems are also sometimes called
"Not only SQL" to emphasize that they may support SQL-like
query languages.[8][9]
Many NoSQL stores compromise
consistency (in the sense of the CAP theorem) in favor of availability,
partition tolerance, and speed. Barriers to the greater adoption of NoSQL
stores include the use of low-level query languages (instead of SQL, for
instance the lack of ability to perform ad-hoc JOIN's across tables), lack of standardized
interfaces, and huge previous investments in existing relational databases.[10]Most NoSQL stores lack true ACID transactions, although a few recent
systems, such as FairCom c-treeACE, Google Spanner (though technically a NewSQL database), FoundationDB, Symas LMDB andOrientDB have made them central to their
designs. (See ACID and JOIN Support.) Instead they offer a
concept of "eventual consistency" in which database changes are
propagated to all nodes "eventually" (typically with milliseconds) so
queries for data might not return updated data immediately.
Unfortunately, not all NoSQL
systems live up to the promised "eventual consistency" and partition
tolerance, but in experiments with network partitioning often exhibited lost
writes and other forms of data loss.[11] Fortunately,
some NoSQL systems provide concepts such as "Write Ahead Logging" to
avoid data loss.[12] Current relational databases also "do
not allow referential integrity constraints to span databases" as well.[13]
History
The term NoSQL was used by Carlo Strozzi
in 1998 to name his lightweight, Strozzi NoSQL open-source relational database that did not expose the standard SQL
interface, but was still relational.[14] His NoSQL RDBMS is distinct from the
circa-2009 general concept of NoSQL databases. Strozzi
suggests that, as the current NoSQL movement "departs from the relational model
altogether; it should therefore have been called more appropriately 'NoREL'",[15] referring
to 'No Relational'.
Eric Evans reintroduced the term NoSQL in early 2009 when Johan Oskarsson of Last.fm organized
an event to discuss open-source distributed databases.[16] The name
attempted to label the emergence of an increasing number of non-relational,
distributed data stores. Most of the early NoSQL systems did not attempt to
provide atomicity,
consistency, isolation and durability guarantees, contrary to the prevailing
practice among relational database systems.[17]
As of July 2015, the most popular
NoSQL databases are MongoDB, Apache Cassandra, Redis, Solr, ElasticSearch, HBase, Splunk, Memcached, and Neo4j.[18]
Types and examples of NoSQL
databases
There have been various
approaches to classify NoSQL databases, each with different categories and
subcategories, some of which overlap. A basic classification based on data
model, with examples:
·
Column: Accumulo, Cassandra, Druid, HBase, Vertica
·
Document: Clusterpoint, Apache CouchDB, Couchbase, DocumentDB, HyperDex, Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx
·
Key-value: CouchDB, Oracle NoSQL Database, Dynamo, FoundationDB, HyperDex, MemcacheDB, Redis, Riak,
FairCom c-treeACE, Aerospike, OrientDB, MUMPS
·
Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog
·
Multi-model: OrientDB, FoundationDB, ArangoDB,
Alchemy Database, CortexDB
A more detailed classification is
the following, based on one from Stephen Yen:[19]
Type |
Examples
of this type |
Key-Value Cache |
Coherence, eXtreme Scale, GigaSpaces, GemFire, Hazelcast, Infinispan, JBoss
Cache, Memcached, Repcached, Terracotta, Velocity |
Key-Value Store |
|
Key-Value Store
(Eventually-Consistent) |
DovetailDB, Oracle NoSQL
Database Dynamo, Riak,
Dynomite, MotionDb, Voldemort,
SubRecord |
Key-Value Store (Ordered) |
Actord, FoundationDB, Lightcloud, LMDB,
Luxio, MemcacheDB,
NMDB, Scalaris, TokyoTyrant |
Data-Structures server |
|
Tuple Store |
Apache
River, Coord, GigaSpaces |
Object Database |
DB4O, Objectivity/DB, Perst, Shoal, ZopeDB, |
Document Store |
Clusterpoint, Couchbase, CouchDB, DocumentDB, Lotus Notes, MarkLogic, MongoDB, Qizx, XML-databases |
BigTable, Cassandra, Druid, HBase, Hypertable, KAI, KDI, OpenNeptune, Qbase |
Correlation databases are model-independent, and instead of
row-based or column-based storage, use value-based storage.
Key-value stores
Main
article: Key-value database
Key-value (KV) stores use the associative array (also known as a map or dictionary) as
their fundamental data model. In this model, data is represented as a
collection of key-value pairs, such that each possible key appears at most once
in the collection.[20][21]
The key-value model is one of the
simplest non-trivial data models, and richer data models are often implemented
on top of it. The key-value model can be extended to an ordered model that maintains keys inlexicographic order. This extension is powerful, in
that it can efficiently process key ranges.[22]
Key-value stores can use consistency models ranging from eventual consistency to serializability.
Some support ordering of keys. Some maintain data in memory (RAM), while others
employ solid-state drivesor rotating disks.
Examples include Oracle NoSQL Database, redis,
and dbm.
Document store
Main
articles: Document-oriented
database and XML database
The central concept of a document
store is the notion of a "document". While each document-oriented database
implementation differs on the details of this definition, in general, they all
assume that documents encapsulate and encode data (or information) in some
standard formats or encodings. Encodings in use include XML, YAML,
and JSON as well as binary forms like BSON.
Documents are addressed in the database via a unique key that represents that document. One of
the other defining characteristics of a document-oriented database is that in
addition to the key lookup performed by a key-value store, the database offers
an API or query language that retrieves documents based on their contents
Different implementations offer
different ways of organizing and/or grouping documents:
·
Collections
·
Tags
·
Non-visible metadata
·
Directory hierarchies
Compared to relational databases,
for example, collections could be considered analogous to tables and documents
analogous to records. But they are different: every record in a table has the
same sequence of fields, while documents in a collection may have fields that
are completely different.
Graph
Main
article: Graph database
This kind of database is designed
for data whose relations are well represented as a graph (elements
interconnected with an undetermined number of relations between them). The kind
of data could be social relations, public transport links, road maps or network
topologies, for example.
Graph databases and their query language
Name |
Language(s) |
Notes |
RDF GraphStore |
||
High-performance graph database |
||
RDF GraphStore
added in DB2 10 |
||
High-performance, scalable,
distributed graph database |
||
RDF graph store with reasoning |
||
Distributed, real-time graph
database featuring cell-level security |
||
middleware and database engine hybrid |
||
Object database
·
db4o
·
JADE
·
ObjectDB
·
ODABA
·
Perst
·
ZODB
Tabular
·
BigTable
·
Mnesia
Tuple store
·
TIBCO ActiveSpaces
Triple/quad store (RDF) database
·
Apache JENA (It's a framework, not a database)
·
Stardog
Hosted
·
Datastore on Google Appengine
·
Cloudant Data Layer (CouchDB)
·
Freebase
Multivalue databases
·
D3 Pick database
·
Extensible
Storage Engine (ESE/NT)
·
Northgate
Information Solutions Reality,
the original Pick/MV Database
·
OpenQM
·
Revelation Software's OpenInsight
Multimodel database
·
OrientDB
Performance
Ben Scofield rated different
categories of NoSQL databases as follows: [23]
Data
Model |
Performance |
Scalability |
Flexibility |
Complexity |
Functionality |
Key–Value Store |
high |
high |
high |
none |
variable (none) |
Column-Oriented Store |
high |
high |
moderate |
low |
minimal |
Document-Oriented Store |
high |
variable (high) |
high |
low |
variable (low) |
Graph Database |
variable |
variable |
high |
high |
|
Relational Database |
variable |
variable |
low |
moderate |
Performance and scalability
comparisons are sometimes done with the YCSB benchmark.
Handling relational data
Since most NoSQL databases lack
ability for joins in queries, the database schema generally needs to be designed
differently. There are three main techniques for handling relational data in a
NoSQL database. (See table Join and ACID Support for NoSQL databases that support
joins.)
Multiple queries
Instead of retrieving all the data
with one query, it's common to do several queries to get the desired data.
NoSQL queries are often faster than traditional SQL queries so the cost of
having to do additional queries may be acceptable. If an excessive number of
queries would be necessary, one of the other two approaches is more
appropriate.
Caching/replication/non-normalized data
Instead of only storing foreign
keys, it's common to store actual foreign values along with the model's data.
For example, each blog comment might include the username in addition to a user
id, thus providing easy access to the username without requiring another
lookup. When a username changes however, this will now need to be changed in
many places in the database. Thus this approach works better when reads are
much more common than writes.[24]
Nesting data
With document databases like MongoDB
it's common to put more data in a smaller number of collections. For example in
a blogging application, one might choose to store comments within the blog post
document so that with a single retrieval one gets all the comments. Thus in
this approach a single document contains all the data you need for a specific
task.
ACID and JOIN Support
If a database is marked as supporting ACID or joins, then the documentation for the database
makes that claim. The degree to which the capability is fully supported in a
manner similar to most SQL databases or the degree to which it meets the needs
of a specific application is left up to the reader to assess.
Database |
ACID |
Joins |
Yes |
Yes |
|
Yes |
Yes |
|
Yes |
Yes |
|
Yes |
Yes |
|
Yes |
Yes |
|
Yes |
No |
|
Yes |
No |