An Overview of Business Intelligence and BigData Tools

Business intelligence tools

Business intelligence tools are a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The tools generally read data that have been previously stored, often, though not necessarily, in a data warehouse or data mart.

1.      Types of business intelligence tools

2.      Open source free products

3.      Open source commercial products

4.      Proprietary free products

5.      Proprietary products

Types of business intelligence tools

The key general categories of business intelligence tools are:

·        Spreadsheets

·        Reporting and querying software: tools that extract, sort, summarize, and present selected data

·        OLAP: Online analytical processing

·        Digital dashboards

·        Data mining

·        Data warehousing

·        Local information systems

Except for spreadsheets, these tools are sold as standalone tools, suites of tools, components of ERP systems, or as components of software targeted to a specific industry. The tools are sometimes packaged into data warehouse appliances.

Open source free products

·        Eclipse BIRT (The Business Intelligence and Reporting Tools) Project

·        SpagoBI

·        R

o   R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data minersfor developing statistical software[2][3] and data analysis.[3] Polls, surveys of data miners, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years.[4][5][6][7]

o   R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme.[8] S was created by John Chambers while atBell Labs. There are some important differences, but much of the code written for S runs unaltered.

o   R was created by Ross Ihaka and Robert Gentleman[9] at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.[10]

o   R is a GNU project.[11][12] The source code for the R software environment is written primarily in C, Fortran, and R.[13] R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface; there are also several graphical front-ends for it.

·        KNIME

·        TACTIC

·        JasperReports

Open source commercial products

·        Jaspersoft: Reporting, Dashboards, Data Analysis, and Data Integration

·        Palo (OLAP database): OLAP Server, Worksheet Server and ETL Server

·        Pentaho: Reporting, analysis, dashboard, data mining and workflow capabilities

·        TACTIC: Reporting, management, dashboard, data mining and integration, workflow capabilities

Proprietary free products

·        Biml - Business Intelligence Markup Language

·        Birst

·        Datacopia

·        icCube

·        InetSoft

·        Tableau Software

o     Tableau Software (/tæbˈloʊ/ tab-loh) is an American computer software company headquartered in Seattle, Washington. It produces a family of interactive data visualization products focused on business intelligence.[2]

 

o     Tableau offers five main products: Tableau Desktop, Tableau Server, Tableau Online, Tableau Reader and Tableau Public. Tableau Public and Tableau Reader are free to use, while both Tableau Server and Tableau Desktop come with a 14 day fully functional free trial period, after which the user must pay for the software. Tableau Desktop comes in both a Professional and a lower cost Personal edition. Tableau Online is available with an annual subscription for a single user, and scales to support thousands of users. [14]

Proprietary products

·        ActiveReports

·        Actuate Corporation

·        ApeSoft

·        Birst

·        BOARD

·        ComArch

·        Data Applied

·        Decision Support Panel

·        Domo[1]

·        Dundas Data Visualization, Inc.

·        Dimensional Insight

·        Dynamic AI[2]

·        Entalysis

·        Grapheur, implementing the reactive business intelligence (RBI) approach

·        GoodData - Cloud Based

·        IBM Cognos

·        icCube

·        IDV Solutions Visual Fusion

·        InetSoft

·        Information Builders

·        InfoZoom

·        Jackbe

·        Jaspersoft (now TIBCO, iReport,Jasper Studio, Jasper Analysis, Jasper ETL, Jasper Library)

·        Jedox

·        JReport (from Jinfonet Software)

·        Klipfolio Dashboard

·        Lavastorm

·        LIONsolver

·        List & Label

·        Logi Analytics

·        Looker

·        Microsoft

·        MicroStrategy

·        myDIALS

·        Numetric[3]

·        Oracle

o   Hyperion Solutions Corporation

o   Business Intelligence Suite Enterprise Edition (OBIEE – Oracle BI Enterprise Edition)

·        Panorama Software

·        Pentaho (now Hitachi Data Systems)

·        Pervasive DataRush

·        PRELYTIS

·        Qlik

·        Quantrix

·        RapidMiner

·        Roambi

·        SAP NetWeaver Business Intelligence

o   Business Objects

·        SiSense

·        SAS

·        Siebel Systems

·        Spotfire (now Tibco)

·        Sybase IQ

·        Tableau Software

·        TARGIT Business Intelligence

·        Teradata

·        Lighthouse

·        XLCubed

·        Yellowfin Business Intelligence

·        Zoho Reports (as part of the Zoho Office Suite)

BI market

Vendors in the business intelligence space are often categorized into:

·        The consolidated big four "mega vendors"

o   Oracle Hyperion

o   SAP BusinessObjects, 

o   IBM Cognos, and 

o   Microsoft BI.[14]

·        The independent "pure-play" vendors

o   MicroStrategy, 

o   Tableau,

o   QlikView 

§  QlikView – a self-service BI tool or business discovery platform from Qlik

§  The QlikView Business Discovery platform delivers true self-service BI that empowers business users by driving innovative decision-making.

o    SAS

·        Oracle Business Intelligence Foundation Suite

o   Oracle Business Intelligence Enterprise Edition (OBIEE)

o   Oracle Scorecard and Strategy Management

o   Oracle Essbase

o   Oracle Essbase Analytics Link for Hyperion Financial Management

o   Oracle Business Intelligence Publisher

o   Oracle Business Intelligence Mobile

Oracle Business Intelligence Foundation Suite includes the following capabilities:

o   Enterprise BI Platform (OBIEE)

OBIEE (Oracle Business Intelligence Enterprise Edition) OBIEE Plus (Oracle Business Intelligence Enterprise Edition Plus) -  is Oracle Corporation's set of business intelligence tools consisting of former Siebel Systems business intelligence and Hyperion Solutions business intelligence offerings.

The industry counterpart and main competitors of OBIEE are Microsoft BI, IBM CognosSAP AG Business Objects and SAS Institute Inc. The products currently leverage a common BI Server

§  Siebel Systems

Siebel CRM Systems, Inc. was a software company principally engaged in the design, development, marketing, and support of customer relationship management (CRM) applications. 

Principal competitors included Oracle Corporation (now Siebel's owner); SAP America Inc.; Vantive Corporation (subsidiary of PeopleSoft Inc., now owned by Oracle); Sage Group; Clarify Corporation (subsidiary of Amdocs); SAS Institute Inc.; Epiphany, Inc.; Broadbase Software Inc; Salesforce.com; Microsoft Dynamics; SugarCRM.

Siebel Systems competed directly with Oracle and SAP. These competing software suites gradually developed HR, Financial and ERP packages that were readily integrated and thus did not require specialists to deploy,[citation needed] enabling them to steadily erode Siebel's market share.

§  Hyperion

In 2007, Oracle acquired Hyperion, a leading provider of performance management software. The transaction extends Oracle's business intelligence capabilities to offer the most comprehensive system for enterprise performance management.

Over the past few years, Oracle has significantly reoriented our business intelligence product strategy, shifting our focus from offering a solution that works for Oracle-only environments towards offering a best-of-breed business intelligence and performance management product family that will work with heterogeneous information sources in an enterprise, both Oracle and non-Oracle.

The acquisition of Hyperion extends our business intelligence product strategy. Customers are increasingly using performance management and business intelligence together. Hyperion adds complementary products to Oracle's business intelligence offerings including a leading enterprise planning solution, world-class financial close and reporting products, and a powerful multi-source OLAP server. Coupled with Oracle's BI tools and pre-packaged analytic applications, the combination redefines business intelligence and performance management by providing the first integrated, end-to-end Enterprise Performance Management System that spans planning, financial close, operational analytic applications, BI tools, reporting, and data integration, all on a unified BI platform.

After the acquisition, we introduced a new product family called Oracle Business Intelligence Enterprise Edition Plus. This integrated suite includes all of the Oracle and Hyperion reporting and analysis tools. Since 2007, we have released new versions of the Oracle Enterprise Performance Management System, which include continued innovations and capabilities to improve business insights and decision-making. Read more about the latest release. In 2010, we released Oracle Business Intelligence 11g, which provides the most complete and integrated suite of BI tools in the market.

 

Hyperion Solutions Corporation was an Enterprise Performance Management software company, located in Santa Clara, California, USA, which was acquired by Oracle Corporation in 2007. Many of its products were targeted at the business intelligence (BI) and business performance management markets, and as of 2013 are still actively developed and sold by Oracle as Oracle Hyperion products.

Hyperion software products include:

·        Essbase

·        Hyperion Intelligence and SQR Production Reporting (products acquired in 2003 takeover of Brio Technology)

·        Hyperion Enterprise

·        Hyperion Planning

·        Hyperion Strategic Finance

·        Hyperion Financial Data Management

·        Hyperion Enterprise Performance Management Architect

·        Hyperion Financial Close Management

·        Hyperion Disclosure Management

·        Hyperion Performance Scorecard

·        Hyperion Business Modelling

·        Hyperion Financial Management

·        Hyperion Master Data Management/Oracle Data Relationship Management

·        Hyperion Financial Reporting

·        Hyperion Web Analysis

·        Hyperion SmartView

·        Hyperion EPM Workspace

·        Hyperion Profitability and Cost Management

·        Hyperion System 9 BI+ (a combination of Interactive Reporting, SQR, Web Analysis, Financial Reporting, EPM Workspace and SmartView)

·        Hyperion Financial Data Quality Management (Also referred to as FDM EE)

·        Hyperion Tax Provision

·        Planning Budgeting Cloud Service

o   OLAP Analytics

o   Scorecard and Strategy Management

o   Mobile BI

o   Enterprise Reporting

 

·      Teradata – for data warehousing

Teradata Corporation is a publicly-held international computer company that sells analytic data platforms, marketing applications and related services. Its analytics products are meant to consolidate data from different sources and make the data available for analysis. Teradata marketing applications are meant to support marketing teams that use data analytics to inform and develop programs. In early 2015, the company formed two divisions: Data & Analytics for its data analytics platforms and related services and Marketing Applications for its marketing software and related services.[4] The corporate headquarters are in Miamisburg, Ohio.

Teradata is an enterprise software company that develops and sells a relational database management system (RDBMS) with the same name. Teradata is publicly traded on the New York Stock Exchange (NYSE) under the stock symbol TDC.[5]

The Teradata product is referred to as a "data warehouse system" and stores and manages data. The data warehouses use a "shared nothing" architecture, which means that each server node has its own memory and processing power.[6] Adding more servers and nodes increases the amount of data that can be stored. The database software sits on top of the servers and spreads the workload among them.[7] Teradata sells applications and software to process different types of data. In 2010, Teradata added text analytics to track unstructured data, such as word processor documents, and semi-structured data, such as spreadsheets.[8]

Teradata's product can be used for business analysis. Data warehouses can track company data, such as sales, customer preferences, product placement, etc.[7]

Teradata is a massively parallel processing system running a shared-nothing architecture.[29] Its technology consists of hardware, software, database, and consulting. The system moves data to a data warehouse where it can be recalled and analyzed.[13]

The systems can be used as back-up for one another during downtime, and in normal operation balance the work load across themselves.[30]

In 2009, Forrester Research issued a report, "The Forrester Wave: Enterprise Data Warehouse Platform," by James Kobielus,[31] rating Teradata the industry's number one enterprise data warehouse platform in the "Current Offering" category.

Marketing research company Gartner Group placed Teradata in the "leaders quadrant" in its 2009, 2010, and 2012 reports, "Magic Quadrant for Data Warehouse Database Management Systems".[32][33]

Teradata is the most popular data warehouse DBMS in the DB-Engines database ranking.[34]

In 2010, Teradata was listed in Fortune’s annual list of Most Admired Companies.[35]

Active enterprise data warehouse

Teradata Active Enterprise Data Warehouse is the platform that runs the Teradata Database, with added data management tools and data mining software.

The data warehouse differentiates between “hot and cold” data – meaning that the warehouse puts data that is not often used in a slower storage section.[36] As of October 2010, Teradata uses Xeon 5600 processors for the server nodes.[37]

Teradata Database 13.10 was announced in 2010 as the company’s database software for storing and processing data.[38][39]

Teradata Database 14 was sold as the upgrade to 13.10 in 2011 and runs multiple data warehouse workloads at the same time.[40] It includes column-store analyses.[41]

Teradata Integrated Analytics is a set of tools for data analysis that resides inside the data warehouse.[42]

Backup, archive, and restore

BAR is Teradata’s backup and recovery system.[43]

The Teradata Disaster Recovery Solution is automation and tools for data recovery and archiving. Customer data can be stored in an offsite recovery center.[44]

Platform family

Teradata Platform Family is a set of products that include the Teradata Data Warehouse, Database, and a set of analytic tools. The platform family is marketed as smaller and less expensive than the other Teradata solutions.[45]

Teradata's main competitors are similar products from vendors such as Oracle, IBM, Microsoft and Sybase IQ. Also, competitors include data warehouse appliance vendors such as Netezza[69] (acquired in November 2010 by IBM), DATAllegro (acquired in August 2008 by Microsoft), ParAccel, Pivotal Greenplum Database, and Vertica Systems (acquired in February 2011 by HP), and from packaged data warehouse applications such as SAP BW and Kalido.

·      Hadoop

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework.

The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (MapReduce). Hadoop splits files into large blocks and distributes them amongst the nodes in the cluster. To process the data, Hadoop MapReduce transfers packaged code for nodes to process in parallel, based on the data each node needs to process. This approach takes advantage of data locality[3]—nodes manipulating the data that they have on hand—to allow the data to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking.[4]

The base Apache Hadoop framework is composed of the following modules:

  • Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;
  • Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications;[5][6] and
  • Hadoop MapReduce – a programming model for large scale data processing.

The term "Hadoop" has come to refer not just to the base modules above, but also to the "ecosystem",[7] or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others.[8][9]

Apache Hadoop's MapReduce and HDFS components were inspired by Google papers on their MapReduce and Google File System.[10]

The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as Shell script. For end-users, though MapReduce Java code is common, any programming language can be used with "Hadoop Streaming" to implement the "map" and "reduce" parts of the user's program.[11] Other related projects expose other higher-level user interfaces.

Prominent corporate users of Hadoop include Facebook and Yahoo. It can be deployed in traditional on-site datacenters but has also been implemented in public cloud spaces such as Microsoft Azure,Amazon Web Services, Google Compute Engine, and IBM Bluemix.

Apache Hadoop is a registered trademark of the Apache Software Foundation.

·      Top 45 Big Data Tools for Developers

https://blog.profitbricks.com/top-45-big-data-tools-for-developers/

·      43 Bigdata Platforms and Bigdata Analytics Software

 

1. IBM Bigdata Analytics

IBM Bigdata Analytics solution portfolio includes InfoSphere Streams ,

InfoSphere BigInsights , IBM Watson Explorer , IBM PureData powered by Netezza technology , DB2 with BLU Acceleration , IBM Smart Analytics System , InfoSphere Information Server and InfoSphere Master Data Management.

2. HP Bigdata

HP’s Bigdata Analytics solution includes HP HAVEn and HP Vertica. HP HAVEn is a platform comprised of software, services, and hardware. Big Data of any type either structured or unstructured can be analyzed to lead to powerful strategic insights. HP Vertica Dragline let organizations store their data in a cost effective manner, and provide capabilities to explore it quickly using SQL based tools.

3. SAP Bigdata Analytics

SAP Bigdata Analytics platform includes In Memory Platform called, SAP HANA, and SAP IQ, which is a column oriented, grid based, massively parallel processing database. There is also SAP HANA platform and Apache Hadoop solution available together. Bigdata Analytics solutions include the Predictive Analytics and Text Analytics solutions.

4. Microsoft Bigdata

Microsoft Azure is an open and flexible cloud platform which enables to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. The applications can be built using any language, tool or framework and can integrated with other public cloud applications in the IT environment.

5. Oracle Bigdata Analytics

Oracle Bigdata Analytics solutions include Oracle Big Data Appliance, Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine. These are engineered Systems which are pre-integrated to reduce the cost and complexity of IT infrastructures. The database include Oracle Database, Oracle NoSQL Database, MySQL and MySQL Cluster, Oracle Event Processing, Oracle NoSQL Database and Oracle Coherence, Oracle Endeca Information Discovery and in database analytics.

6. Talend Open Studio

Talend Open Studio is a versatile set of open source products for developing, testing, deploying and administrating data management and application integration projects. Talend delivers the only unified platform that makes data management and application integration easier by providing a unified environment for managing the entire lifecycle across enterprise boundaries.

 Talend’s products dramatically lower the adoption barrier for businesses wanting powerful packaged solutions to operational challenges like data cleansing, master data management, and enterprise service bus deployment.

Leveraging and extending leading Apache technologies, Talend’s open source ESB and open source SOA solutions help organizations to build flexible, high-performance enterprise architectures that integrate and service-enable distributed applications.

7. Teradata Bigdata Analytics

Teradata has built a simple architecture called, the Unified Data Architecture in Bigdata Analytics. The Teradata Aster Discovery Platform ease the discovery of crucial business insights from all data types. With its powerful analytic applications coupled with minimal time and effort requirements, it provides the discovery insights needed for sophisticated companies today.

8. SAS Bigdata Analytics

SAS Bigdata Analytics solution portfolio includes Credit Scoring for SAS

Enterprise Miner, SAS High-Performance Data Mining, SAS Model Manager, SAS Scoring Accelerator, SAS Text Miner and SAS Visual Statistics.

9. Dell Bigdata Analytics

Dell Bigdata Analytics includes Kitenga Analytics Suite, Boomi AtomSphere and SharePlex Connector for Hadoop. Kitenga Analytics Suite provides you with integrated information modeling and visualization capabilities in a big data search and business analytics platform.

10. HPCC Systems Big data

HPCC Systems is an Open-source platform for Big Data analysis. The Data Refinery engine called Thor, clean, link, transform and analyze Big Data. Thor supports ETL (Extraction, Transformation and Loading) functions like ingesting unstructured/structured data out, data profiling, data hygiene, and data linking out of the box. The Data Delivery engine (Roxie) provides highly concurrent and low latency real time query capability. The Thor processed data can be accessed by a large number of users concurrently in real time fashion using the Roxie.

The programming language, Enterprise Control Language (ECL), is used to program both the data processing jobs on Thor and the queries on Roxie.

11. Palantir Bigdata

Palantir Bigdata solution includes Palantir Gotham to integrate, manage, secure, and analyze all of the enterprise data and Palantir Metropolis to integrate, enrich, model, and analyze any kind of quantitative data.

12. Pivotal Bigdata

Pivotal Big Data solutions help to discover insight from all data to build applications that serve customers in the context to store, manage, and deliver value from fast, massive data sets using the most disruptive set of enterprise data products such as MPP and column store databases, in-memory data processing, and Hadoop.

13. Google BigQuery

Google BigQuery is a web service that enables companies to analyze massive datasets using Google’s infrastructure. This can analyze up to billions of rows in seconds. It is scalable and easy to use with the familiar SQL query language. BigQuery lets developers and businesses tap into powerful data analytics on demand against multi-terabyte datasets in seconds.

14. Pentaho Big Data Analytics – A Hitachi Data Systems Company

Pentaho Big Data Analytics offers a comprehensive and unified solution that supports the entire big data lifecycle. Regardless of the data source, within a single platform the solution provides visual big data analytics tools to extract and prepare the data plus the visualizations and analytics. The Open, standards based architecture, make it easy to integrate with or extend existing infrastructure.

15. Amazon Web Service

Amazon Web Services provides cloud based analytics services to help you process and analyze any volume of data, whether your need is for managed Hadoop clusters, real-time streaming data, petabyte scale data warehousing, or orchestration.

16. Cloudera Enterprise Bigdata

Cloudera Enterprise includes CDH (Cloudera's distribution of Apache Hadoop), the open source Hadoop-based platform, as well as advanced system management and data management tools plus dedicated support and community advocacy.

17. Hortonworks Data Platform

HDP is a platform for multi-workload data processing across an array of processing methods – from batch through interactive to real-time – all supported with solutions for governance, integration, security and operations.

18. FICO Bigdata Analytics

FICO offers comprehensive Big Data Analytics software solutions, Predictive Analytics and Business Intelligence tools including FICO Data Orchestrator, FICO Decision Management Platform, FICO Decision Optimizer, FICO Model Builder, FICO Model Central Solution, FICO Predictive Analytics and FICO Solution Stack.

19. Cisco Bigdata

Cisco UCS Common Platform Architecture (CPA) for big data includes computing, storage, connectivity, and unified management capabilities. Unique to this architecture are transparent, simplified data and management integration with an enterprise application ecosystem.

20. Splunk Bigdata Analytics

Splunk offers a portfolio of Bigdata Analytics software such as Hunk: Splunk Analytics for Hadoop, NoSQL Data Stores, Splunk Hadoop Connect, Hadoop Management and Splunk DB Connect.

21. Fusion-io Bigdata

Fusion-io solutions eliminate the random workload performance deficiencies common to MongoDB, Cassandra and NoSQL databases, such as HBASE, while reducing the operational overhead of their conventional scale out architectures. Fusion based solutions deliver predictable and consistently high performance across the entire database, resulting in a more efficient overall system that can require fewer nodes, less DRAM, and use less energy for power and cooling.

22. Intel Bigdata

Intel portfolio includes technology products such as Intel Xeon processors, 10 Gigabit server adapters, SSDs, and the Intel Distribution improve performance for big data projects.

23. Mu Sigma Bigdata

Mu Sigma’s platforms for Data Sciences include muXo, muHPC and muText. muXo is an advanced decision optimization engine designed to solve complex business problems. It provides a suite of constantly evolving, cutting-edge meta-heuristic algorithms. muHPC is a suite of popular statistical algorithms, integrated in the form of R packages, for Big Data analysis. Written in MapReduce, muHPCTM algorithms leverage the power of parallel computation. Mu Sigma’s text mining engine enables knowledge discovery from unstructured and semi-structured data.

24. MicroStrategy Bigdata

MicroStrategy Bigdata solution called PRIME, which is deployed on the Cloud, provides visualization and dashboarding engine with an innovative massively parallel in-memory data store. This architecture allows companies to rapidly build and deploy powerful information-driven apps that deliver analytics to hundreds of thousands of users in a fraction of the time and cost of other approaches.

25. Opera Solutions Bigdata

Opera Solutions Bigdata solution Vektor Big Data analytics and Signal-processing platform integrates Big Data flows from both inside and outside the enterprise; provides the technology to identify, extract, and store Signals; and supports deployment of all Signal Apps.

26. Redhat Bigdata

Majority of big data implementations run on Linux. Red Hat Enterprise Linux is a leading platform for big data deployments. Red Hat Enterprise Linux excels in distributed architectures and includes features that address critical big data needs. Managing tremendous data volumes and intensive analytic processing requires an infrastructure designed for high performance, reliability, fine-grained resource management, and scale-out storage.

27. Informatica Bigdata

Informatica PowerCenter Big Data Edition provides a safe, efficient way to integrate all types of data on Hadoop at any scale without having to learn Hadoop.

28. MarkLogic Bigdata

MarkLogic Bigdata solution the Enterprise NoSQL database, brings all the features into one unified system: a document-centric, schema-agnostic, structure-aware, clustered, transactional, secure, database server with built-in search and a full suite of application services.

29. Vmware Bigdata

vSphere is a robust, high-performance virtualization layer that abstracts server hardware resources and makes them shareable by multiple virtual machines. Runs Hadoop workloads on vSphere to achieve higher utilization, reliability and agility.

30. Syncsort Bigdata

Syncsort Hadoop Solutions helps on the challenges of collecting, processing and integrating data in Hadoop. It remove barriers for wider Hadoop adoption: connect, develop, deploy, re-use, and accelerate. No programming or tuning are required.

31. SGI Bigdata

SGI InfiniteData Cluster offers the compute platform for Hadoop Solutions with cluster installations now reaching tens of thousands of nodes.SGI UV offers the solution with the industry’s most powerful shared memory platform to find hidden data relationships or perform real-time analysis.

32. MongoDB

MongoDB is the leading NoSQL database, empowering businesses to be more agile and scalable. Fortune 500 companies and startups alike are using MongoDB to create new types of applications, improve customer experience, accelerate time to market and reduce costs.

33. Guavus Bigdata

The Guavus Reflex platform is capable of creating actionable information from widely distributed, high volume data streams in near real-time. Reflex uses highly optimized computational algorithms and machine learning to distill actionable insights from very large datasets.

34. Alteryx Bigdata

Alteryx Bigdata solution access, integration, and cleaning of sources of data as varied as Hadoop (including Cloudera & MapR) or NoSQL (MongoDB) and Excel or Teradata with predictive and spatial tools, combined in a simple, workflow design environment.

35. 1010data Advanced Analytics

The 1010data analytics platform includes advanced, built-in analytic functions such as Statistics (distribution analysis, correlation, and variance), Predictive modeling and forecasting (linear and multivariate regression, logistic regression), Machine learning (clustering analysis, Markov chains for Monte

Carlo simulations, principal component analysis). These functions are integrated directly into the system, so they run incredibly quickly on large volumes of data

36. Actian Analytics Platform

Actian Analytics Platform deliver the next generation analytics in three

Editions - Extreme Performance Edition, Hadoop SQL Edition, Cloud Edition. Extreme Performance Edition accelerates the analytics value chain from connecting to massive amounts of raw big data all the way to delivering actionable business value from sophisticated analytics. Hadoop SQL Edition accelerates Hadoop and makes it enterprise-grade by providing high-performance data enrichment, visual design and SQL analytics on Hadoop without the need for MapReduce skills. Cloud Edition integrates cloud and on-premises applications while providing robust data quality and other data services.

37. MapR

The MapR Distribution for Apache Hadoop provides organizations with an enterprise grade distributed data platform to reliably store and process big data. MapR packages a broad set of Apache open source ecosystem projects enabling batch, interactive, or real time applications.

38. Tableau Software bigdata

Tableau Software bigdata solutions connect to any data, anytime and anywhere, regardless of its size and complexity or mix of unstructured and structured data with the technologies like Google BigQuery and a variety of Hadoop flavors.

39. QlikView Bigdata

QlikView offers two approaches to handling Big Data, both deliver the same great user experience. Either with QlikView’s 100% In-Memory Architecture or QlikView Direct Discovery, which is a hybrid approach that leverages both in-memory data and data that is dynamically queried from an external source.

40. Attivio’s Bigdata

Attivio’s Active Intelligence Engine combines Big Data and Big Content, including Hadoop. Universal indexing and automatic ad hoc JOIN of all information matching a given query, without costly data modeling and with full security. There is also advanced text analytics that adds context and signals from human-generated information sources and support for business intelligence/data visualization tools.

41. DataStax Bigdata

DataStax Enterprise (DSE), which is built on Apache Cassandra, delivers what Internet Enterprises need to compete in today. With in-memory computing capabilities, enterprise-level security, fast and powerful integrated analytics and enterprise search, visual management, and expert support, DataStax Enterprise is the leading distributed database choice for online applications that require fast performance with no downtime.

42. Gooddata

The GoodData Platform is a portfolio of tools, APIs and frameworks, which makes the key components of a BI solution to collect, store, combine, analyze, and visualize. These were built to exist in the cloud and be delivered as an end-to-end service.

43. GE Bigdata

The Industrial Internet coordinate multiple industrial applications to work intelligently in order to optimize entire operational environments.

·      NoSQL

A NoSQL (originally referring to "non SQL"[1]) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the "NoSQL" moniker until during a surge in popularity in the early twenty-first century,[2] triggered by the storage needs of Web 2.0 companies such as Facebook, Google and Amazon.com.[3][4][5]

Motivations for this approach include simplicity of design; simpler "horizontal" scaling to clusters of machine, which is a problem for relational databases;[2] and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, graph, or document) differ slightly from those used by default in relational databases, making some operations faster in NoSQL and others faster in relational databases. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by noSQL databases are also viewed as "more flexible" than relational database tables.[6]

NoSQL databases are increasingly used in big data and real-time web applications.[7] NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages.[8][9]

Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability, partition tolerance, and speed. Barriers to the greater adoption of NoSQL stores include the use of low-level query languages (instead of SQL, for instance the lack of ability to perform ad-hoc JOIN's across tables), lack of standardized interfaces, and huge previous investments in existing relational databases.[10]Most NoSQL stores lack true ACID transactions, although a few recent systems, such as FairCom c-treeACE, Google Spanner (though technically a NewSQL database), FoundationDB, Symas LMDB andOrientDB have made them central to their designs. (See ACID and JOIN Support.) Instead they offer a concept of "eventual consistency" in which database changes are propagated to all nodes "eventually" (typically with milliseconds) so queries for data might not return updated data immediately.

Unfortunately, not all NoSQL systems live up to the promised "eventual consistency" and partition tolerance, but in experiments with network partitioning often exhibited lost writes and other forms of data loss.[11] Fortunately, some NoSQL systems provide concepts such as "Write Ahead Logging" to avoid data loss.[12] Current relational databases also "do not allow referential integrity constraints to span databases" as well.[13]

History

The term NoSQL was used by Carlo Strozzi in 1998 to name his lightweight, Strozzi NoSQL open-source relational database that did not expose the standard SQL interface, but was still relational.[14] His NoSQL RDBMS is distinct from the circa-2009 general concept of NoSQL databases. Strozzi suggests that, as the current NoSQL movement "departs from the relational model altogether; it should therefore have been called more appropriately 'NoREL'",[15] referring to 'No Relational'.

Eric Evans reintroduced the term NoSQL in early 2009 when Johan Oskarsson of Last.fm organized an event to discuss open-source distributed databases.[16] The name attempted to label the emergence of an increasing number of non-relational, distributed data stores. Most of the early NoSQL systems did not attempt to provide atomicity, consistency, isolation and durability guarantees, contrary to the prevailing practice among relational database systems.[17]

As of July 2015, the most popular NoSQL databases are MongoDB, Apache Cassandra, Redis, Solr, ElasticSearch, HBase, Splunk, Memcached, and Neo4j.[18]

Types and examples of NoSQL databases

There have been various approaches to classify NoSQL databases, each with different categories and subcategories, some of which overlap. A basic classification based on data model, with examples:

·        Column: Accumulo, Cassandra, Druid, HBase, Vertica

·        Document: Clusterpoint, Apache CouchDB, Couchbase, DocumentDB, HyperDex, Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx

·        Key-value: CouchDB, Oracle NoSQL Database, Dynamo, FoundationDB, HyperDex, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB, MUMPS

·        Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog

·        Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB

A more detailed classification is the following, based on one from Stephen Yen:[19]

Type

Examples of this type

Key-Value Cache

Coherence, eXtreme Scale, GigaSpaces, GemFire, Hazelcast, Infinispan, JBoss Cache, Memcached, Repcached, Terracotta, Velocity

Key-Value Store

Flare, Keyspace, RAMCloud, SchemaFree, Hyperdex, Aerospike

Key-Value Store (Eventually-Consistent)

DovetailDB, Oracle NoSQL Database Dynamo, Riak, Dynomite, MotionDb, Voldemort, SubRecord

Key-Value Store (Ordered)

Actord, FoundationDB, Lightcloud, LMDB, Luxio, MemcacheDB, NMDB, Scalaris, TokyoTyrant

Data-Structures server

Redis

Tuple Store

Apache River, Coord, GigaSpaces

Object Database

DB4O, Objectivity/DB, Perst, Shoal, ZopeDB,

Document Store

Clusterpoint, Couchbase, CouchDB, DocumentDB, Lotus Notes, MarkLogic, MongoDB, Qizx, XML-databases

Wide Columnar Store

BigTable, Cassandra, Druid, HBase, Hypertable, KAI, KDI, OpenNeptune, Qbase

Correlation databases are model-independent, and instead of row-based or column-based storage, use value-based storage.

Key-value stores

Main article: Key-value database

Key-value (KV) stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection.[20][21]

The key-value model is one of the simplest non-trivial data models, and richer data models are often implemented on top of it. The key-value model can be extended to an ordered model that maintains keys inlexicographic order. This extension is powerful, in that it can efficiently process key ranges.[22]

Key-value stores can use consistency models ranging from eventual consistency to serializability. Some support ordering of keys. Some maintain data in memory (RAM), while others employ solid-state drivesor rotating disks.

Examples include Oracle NoSQL Database, redis, and dbm.

Document store

Main articles: Document-oriented database and XML database

The central concept of a document store is the notion of a "document". While each document-oriented database implementation differs on the details of this definition, in general, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and JSON as well as binary forms like BSON. Documents are addressed in the database via a unique key that represents that document. One of the other defining characteristics of a document-oriented database is that in addition to the key lookup performed by a key-value store, the database offers an API or query language that retrieves documents based on their contents

Different implementations offer different ways of organizing and/or grouping documents:

·        Collections

·        Tags

·        Non-visible metadata

·        Directory hierarchies

Compared to relational databases, for example, collections could be considered analogous to tables and documents analogous to records. But they are different: every record in a table has the same sequence of fields, while documents in a collection may have fields that are completely different.

Graph

Main article: Graph database

This kind of database is designed for data whose relations are well represented as a graph (elements interconnected with an undetermined number of relations between them). The kind of data could be social relations, public transport links, road maps or network topologies, for example.

Graph databases and their query language

Name

Language(s)

Notes

AllegroGraph

SPARQL

RDF GraphStore

DEX/Sparksee

C++, Java, .NET, Python

High-performance graph database

FlockDB

Scala

IBM DB2

SPARQL

RDF GraphStore added in DB2 10

InfiniteGraph

Java

High-performance, scalable, distributed graph database

Neo4j

Java

OWLIM

Java, SPARQL 1.1

RDF graph store with reasoning

OrientDB

Java

Sones GraphDB

C#

Sqrrl Enterprise

Java

Distributed, real-time graph database featuring cell-level security

OpenLink Virtuoso

C++, C#, Java, SPARQL

middleware and database engine hybrid

Stardog

Java, SPARQL

semantic graph database

Object database

·        db4o

·        GemStone/S

·        InterSystems Caché

·        JADE

·        NeoDatis ODB

·        ObjectDatabase++

·        ObjectDB

·        Objectivity/DB

·        ObjectStore

·        ODABA

·        Perst

·        OpenLink Virtuoso

·        Versant Object Database

·        ZODB

Tabular

·        Apache Accumulo

·        BigTable

·        Apache Hbase

·        Hypertable

·        Mnesia

·        OpenLink Virtuoso

Tuple store

·        Apache River

·        GigaSpaces

·        Tarantool

·        TIBCO ActiveSpaces

·        OpenLink Virtuoso

Triple/quad store (RDF) database

·        Apache JENA (It's a framework, not a database)

·        MarkLogic

·        Ontotext-OWLIM

·        Oracle NoSQL database

·        SparkleDB

·        Virtuoso Universal Server

·        Stardog

Hosted

·        Amazon DynamoDB

·        Amazon SimpleDB

·        Datastore on Google Appengine

·        Clusterpoint database

·        Cloudant Data Layer (CouchDB)

·        Freebase

·        OpenLink Virtuoso

Multivalue databases

·        D3 Pick database

·        Extensible Storage Engine (ESE/NT)

·        InfinityDB

·        InterSystems Caché

·        Northgate Information Solutions Reality, the original Pick/MV Database

·        OpenQM

·        Revelation Software's OpenInsight

·        Rocket U2

Multimodel database

·        OrientDB

·        FoundationDB

Performance

Ben Scofield rated different categories of NoSQL databases as follows: [23]

Data Model

Performance

Scalability

Flexibility

Complexity

Functionality

Key–Value Store

high

high

high

none

variable (none)

Column-Oriented Store

high

high

moderate

low

minimal

Document-Oriented Store

high

variable (high)

high

low

variable (low)

Graph Database

variable

variable

high

high

graph theory

Relational Database

variable

variable

low

moderate

relational algebra

Performance and scalability comparisons are sometimes done with the YCSB benchmark.

Handling relational data

Since most NoSQL databases lack ability for joins in queries, the database schema generally needs to be designed differently. There are three main techniques for handling relational data in a NoSQL database. (See table Join and ACID Support for NoSQL databases that support joins.)

Multiple queries

Instead of retrieving all the data with one query, it's common to do several queries to get the desired data. NoSQL queries are often faster than traditional SQL queries so the cost of having to do additional queries may be acceptable. If an excessive number of queries would be necessary, one of the other two approaches is more appropriate.

Caching/replication/non-normalized data

Instead of only storing foreign keys, it's common to store actual foreign values along with the model's data. For example, each blog comment might include the username in addition to a user id, thus providing easy access to the username without requiring another lookup. When a username changes however, this will now need to be changed in many places in the database. Thus this approach works better when reads are much more common than writes.[24]

Nesting data

With document databases like MongoDB it's common to put more data in a smaller number of collections. For example in a blogging application, one might choose to store comments within the blog post document so that with a single retrieval one gets all the comments. Thus in this approach a single document contains all the data you need for a specific task.

ACID and JOIN Support

If a database is marked as supporting ACID or joins, then the documentation for the database makes that claim. The degree to which the capability is fully supported in a manner similar to most SQL databases or the degree to which it meets the needs of a specific application is left up to the reader to assess.

Database

ACID

Joins

CouchDB

Yes

Yes

OrientDB

Yes

Yes

c-treeACE

Yes

Yes

FoundationDB

Yes

Yes

HyperDex

Yes

Yes

InfinityDB

Yes

No

LMDB

Yes

No