THOR!

THOR
Hitachi's Object Relational Database Technology

Questions & Answers

Q. What is THOR?

A. THOR™, The Hitachi Object Relational data server, is a next-generation, large-scale data warehousing technology designed to accommodate very large databases (VLDBs) and data mining applications.

Q. Which Hitachi Product Group is responsible for this new database technology?

A. THOR is the responsibility of Hitachi Computer Products (America), Inc. (HICAM), part of the Hitachi family of computer and communications companies. Other companies in the computer and communications group include Hitachi America, Ltd., Hitachi Data Systems (HDS), and Hitachi Personal Computer (HiPC).

Q. Why is Hitachi Computer Products introducing a new database technology?

A. Fortune 1000 customers have always looked to Hitachi for leading edge hardware and software solutions for other markets, such as large scale enterprise computers and data storage. Hitachi recognized that these same customers are facing a new kind of challenge in their data warehousing strategies. These customers are awash in data. They have hundreds, maybe thousands of independent data storage locations scattered throughout enterprise networks. What they need is help turning their raw data into useful information, and by drawing on its experience, Hitachi created a next-generation, object-relational database system that allows customers to extract information from enterprise database systems.

Q. What can Hitachi bring to database warehousing technology that other vendors don't already supply to the market?

A. We had the advantage of a fresh start when we began this project. We were able to understand the ways in which the market was developing, and to respond with an entirely new combination of technologies in the DBMS market—data flow and object oriented implementation—to create a next generation system. While the SQL standard has reduced the burden of prior release compatibility for the other vendors, they still have their proprietary extensions to worry about. More importantly, they all have substantial investment in large code bases which limit their flexibility and extensibility. We believe this gives us a significant competitive advantage in technology, and each of them will have to respond with a significant re-implementation effort, with all that implies for them and their current customers.

Q. What kinds of data can Hitachi's THOR handle?

A. THOR can manage information from a wide variety of sources, and it is uniquely well-suited to handle new data types, such as audio, video, images, and other predefined data types. User defined data types and functions may be introduced in a totally non-disruptive, evolutionary manner.

Q. What differentiates THOR from other database solutions, like Sybase, Oracle, or Informix?

A. THOR was designed from the ground up to be the first data warehousing solution that is both massively parallel and intrinsically object oriented. THOR's unique application of the data flow model allows relational operations to be executed in parallel with unmatched efficiency. The THOR MPP Data Server™ is a unique combination of industry standard components connected using a toroidal mesh interconnect of communication processors.

Q. Is THOR hardware or software?

It's both. THOR combines Hitachi's THOR SQL objectManager™, a high-performance relational database management system, with the THOR MPP Data Server, a modular, high-performance, massively parallel processing architecture. The THOR MPP Data Server uses a modular architecture that adds processing power and capacity as more modules are configured in. The THOR SQL objectManager software has been optimized for the THOR MPP Data Server hardware, but it is an open solution which not only means it can access data from other SQL-based platforms, but it can also be ported to run on other MPP platforms.

Q. Why is Hitachi introducing a proprietary DBMS system in this era of Open Systems?

A. THOR is not a proprietary system. While the THOR MPP has been designed as a platform for SQL objectManager, the latter is fully portable, and it is our intent to make it available on other platforms over time.

Q. What is it that makes the THOR SQL objectManager "open"?

A. THOR SQL objectManager is a fully functional RDBMS that can handle very large databases and very complex data queries. The SQL objectManager is based on the Structured Query Language (SQL). You can replicate disparate data from other RDBMS to THOR and query there. Using standard SQL and supporting gateways, such as ODBC and Sybase Open Server, SQL objectManager can work with any open SQL database.

The SQL objectManager supports the Sybase SQL language, Transact SQL, and the Sybase database network middleware, Open Server/Open Client, so all data types, data definition language, and operators fully support Sybase. In addition, application programmers who are using the latest application builder suites (e.g. PowerBuilder from Sybase, Visual Basic from Microsoft, etc.) can create plug-and-play applications that interface with SQL objectManager's SQL architecture. The product was designed using Object Oriented technology, so it can easily be ported to new platforms.

Q. Why is THOR called an Object/relational system? Why do I care?

A. THOR was designed and implemented as an object oriented system. This means that data objects such as table definitions are treated as a class with particular attributes (columns definitions) and instances of this class are rows. In contrast to other database designs, the operations which are performed on the rows and columns are “encapsulated” and insulated from one another, yielding several benefits:

To begin with, object design allows for maximum parallel execution and row-level concurrency among operations.
Secondly, all of the information necessary to operate upon a specific data type is localized within the encapsulation. This means that new data types and new operators upon them can be added easily, without concern for the effects these new types and their operators might have upon the previous ones.
For example, one might add a new data type called “FINGERPRINT” and a new operator called “SIMILAR” for “FINGERPRINT”s, by simply adding (1) an encapsulated object which checks that a submitted data element is indeed of type “FINGERPRINT” and (2) another encapsulated object which perform operations to check whether two such “FINGERPRINT” objects are “SIMILAR”. The existing control structure would remain unchanged.
Third, the fundamental nature of object orientation within THOR separates data operations from control operations and produces smooth, elegant, and high-quality enhancements to either.
Finally, using OO design, THOR isolates the hardware dependent elements into a few, well defined areas of low-level system code. As a result, transporting THOR to new MPP platforms is straightforward.

Because OO in THOR is fundamental and very powerful, application developer access to this capability of OO must be carefully controlled in a way that protects data integrity. This will be done in stages. The user will initially have large object (BLOB) support for new data types such as video, audio, and bit maps. Next, THOR will introduce special predefined types (e.g., time series, spatial, etc.) and aggregations of existing types. Finally, THOR will introduce more general user defined types and functions. The exact interface of this function is under development and will consider industry standards currently under development (SQL3), existing practice, and customer needs to define this support.

Q. Why use a Massively Parallel Processing (MPP) hardware architecture? Is there an advantage in MPP?

A. There are significant advantages in an MPP architecture for processing the complex queries which characterize the data warehouse application market. These queries are quite different from those in transaction processing. They often touch large amounts of data, are therefore normally quite I/O intensive, and they often run for very long times. They freely use SQL operators such as joins and aggregations which require processing large numbers of inter-row relationships, and thus have much poorer cache behavior than transactions. The result is that an architecture such as MPP, which isolates processor caches, scales much better than Symmetrical Multi-Processing (SMP) for this workload.

Secondly, 32-bit processors are limited to addressing 4GB of memory; a tightly coupled SMP is also limited. Each processor in an MPP can address 4GB however, therefore much larger memories can be configured.

Third, the intensive I/O activity in these applications must compete with the CPU for memory bandwidth. In an SMP, the more CPU's the more intense the competition and the greater the performance impact. In MPP's each node's memory is isolated from the other node CPU's and the competition remains essentially constant.

So-called ccNUMA designs (cache-coherent Non-Uniform Memory Access) from the SMP vendors are an imperfect response to these scaling limitations for SMP designs.

Q. How do you get more processing power from the THOR MPP Data Server architecture?

A. THOR MPP Data Server is made up of a series of nodes. Each node is a complete RISC-based computer system equipped with is own processor, disk, memory and I/O subsystem. The nodes are interconnected into a toroidal mesh; a doughnut-shaped surface that supports nearly linear scalability for data communications. The toroidal mesh approach is not only more cost-effective and homogeneous than other MPP system designs, it also provides built-in redundancy and enhances system availability.

There are four nodes in a module, and up to six modules, or 24 nodes, can be configured into a tower. Though there is no architectural limitation, in the current implementation up to twelve towers or 288 nodes can be interconnected to maximize MPP processing power.

In this "shared-nothing" environment, data is distributed equally among the nodes according to a proprietary hashing scheme. Through our data-driven design, each database query is also distributed for processing among all of the nodes in the configuration; therefore the more nodes available, the more processing power which is applied in parallel to each query. Thus, a complex query being handled by N nodes in an MPP configuration can be processed nearly twice as fast by 2N nodes, and so on.

Q. What is the expected performance?

A. We do not have price performance data available for distribution at this time. It is our intent to complete an audited TPC-D benchmark by early 1997. Our early internal measurements are right on track, however, for a very aggressive objective. That objective is to deliver a system which will give competitive indigestion to the SMP system vendors at the low end of our range, and to outperform the MPP competition

Q. Where was THOR developed?

A. It was developed in HICAM development centers in Santa Clara, California and Boise, Idaho, and manufactured in Norman, Oklahoma

Q. Will customers be able to buy the THOR database software for other hardware platforms?

A. It is our intent to port THOR SQL objectManager to other platforms to meet customer needs.

Q. What are your plans for third-party development to support The Hitachi Object Relational data server?

A. We are working with a number of Software and hardware vendors to insure their products will be supported by THOR. It is expected that tools using the Sybase Open Server will run on THOR unchanged.

Q. Who are your competitors and how do you compare against them?

A. In software, our competitors are primarily the other Open Systems database vendors. All of them have realized both that a)intra-query parallelism is essential to success in the DSS/Data Warehouse market and b) customers need access to far more than mere traditional relational data in order to have competitive DSS applications. The problem is that all of them are burdened with implementations which were initially not parallel, and which did not allow for extension beyond the data types and operators envisioned by SQL 89 or SQL 92. They are struggling to extend their products in ways for which no provision has been made. The result is cumbersome and restrictive at the application interface, and inadequately parallel internally to give the performance required.

The SQL objectManager software, on the other hand, has been designed from the outset to carry out all data operations in parallel. Inherently serial operations, like the final merge pass of a sort, are handled as exceptions, rather than the converse. In addition SQL objectManager has from the outset been designed, as the name implies, as a relational system to manage objects. Our major challenge is not how to extend it— it is inherently infinitely extensible to new data types. Our challenge is, rather, to give the application developer access to the greatest amount of the enormous power of the system while protecting the integrity of the data in the database.

In the hardware platform arena, the major competitors are the vendors of open systems MPP's. There are some tough competitors out there, but we believe that our database-oriented design, combined with our fast, totally symmetric interconnect and the outstanding price performance and service for which Hitachi is known worldwide, give us a significant edge.

More Information

Interested parties can contact Hitachi by telephone at 1-800-588-THOR (1-800-588-8467), 408-588-3300, or by FAX at 408-988-1279. Additional information is also available on the World Wide Web at [live URL used to be here].

Hitachi Computer Products (America), Inc. (HICAM), a subsidiary of Hitachi America, Ltd., develops hardware and software for high performance computing, internet and networking applications.

Hitachi America, Ltd., a wholly owned subsidiary of Hitachi, Ltd., Japan, markets and manufactures a broad range of electronics, computer systems and products, and provides industrial equipment and services throughout the United States.

Hitachi, Ltd. (NYSE:HIT) headquartered in Tokyo Japan is one of the world's largest electronics companies, with fiscal 1995 consolidated sales (ending March 31, 1996) of 76.6 billion. The company markets and manufactures a wide range of products including computers, semiconductors, consumer products, and industrial equipment. [1]

Footnotes

[1] THOR, THOR MPP Data Server, and THOR SQL objectManager are trademarks of Hitachi Computer Products (America), Inc. All other trademarks are the property of their respective owners.

THOR Hitachi's Object Relational Database Technology

Questions & Answers

More Information

Footnotes

THOR
Hitachi's Object Relational Database Technology