Microsoft has
come up with focus of offering apache Hadoop on windows servers and windows
azure. HDinsight is a big data solutions build on HortonWork data platform
(HDP). It represents a significant benefit for wide range of windows user and
make it easy to build big data solution on their existing windows platform. With
its exceptional features HDinsight is available in two variants-
- HDinsight is a services with Windows Azure (An open and flexible windows cloud platform), which can be included as component in any existing azure account
- HDinsight server for windows server which is available for download on- http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/big-data.aspx
HDinsight
service is a cloud based solution for handling and implementing big data
solutions. It targets fundamental needs of distributed computing with
efficiency. This service works as a wrapper to manage and store data along with
monitoring features. Monitoring help to collect and analysis application as
well as cluster performance details. HDinsight consists of two components, HDFS
for computation of data and Azure blob (Binary linear object) for data storage.
Azure blob is a default file system for storing data. It is a service for
storing large dataset which can be accessed worldwide via HTTP or HTTPS. Blob
storage provide high scalability & availability on low cost and long term
sharable option with azure. Common uses of Blob storage could be:
- Images or documents can be accessed directly from browser
- Storing files for distributed access
- Streaming video and audio
- Secure backup and disaster recovery
HDinsight
service helps to manage and analyze large data sets, it also leverages the
parallel processing capabilities of the Map Reduce programming model. The other
Apache Hadoop technologies are also available with HDinsight to facilitate user
with more options and wide scope to fulfill their needs of big data. It
provides implementations of Hive and Pig to assimilate data processing and
warehousing capabilities altogether. It also integrates with other tools, such
as SQL Server Analysis Services and Excel. Tool Integration feature is helpful
in case of test data collection or loading of generated/golden data. In case of
user generated data, integration with tools will help to perform testing using
different datasets. Test data is key factor in testing cycles of big data
applications. Test data can be collected from the different data management
tools or any other BI tool.
HDinsight supports
other Hadoop ecosystems like hive, Hbase, zookeeper etc., it provides automated
Hadoop cluster creation along with different ecosystems.
For processing, data is transferred to HDFS as it is highly optimized for computation of data but it is an expensive way to maintain HDFS cluster after completion of processing. HDinsight provide collective benefit to user by storing data on azure blob and processing on HDFS. HDinsight infrastructure is located on the computing nodes and the data resides in blog storage. For computation, it transfer data from storage to computing nodes and ensure that the transfer should be fast. For the need of fast transfer of data, Microsoft has deployed azure flat network storage which is also known as Quantum 10 or Q10 network. It is a mesh grid network that allows very high bandwidth connectivity. HDinsight is streaming data from the storage node (Azure blob) to the compute node (HDFS nodes).
For processing, data is transferred to HDFS as it is highly optimized for computation of data but it is an expensive way to maintain HDFS cluster after completion of processing. HDinsight provide collective benefit to user by storing data on azure blob and processing on HDFS. HDinsight infrastructure is located on the computing nodes and the data resides in blog storage. For computation, it transfer data from storage to computing nodes and ensure that the transfer should be fast. For the need of fast transfer of data, Microsoft has deployed azure flat network storage which is also known as Quantum 10 or Q10 network. It is a mesh grid network that allows very high bandwidth connectivity. HDinsight is streaming data from the storage node (Azure blob) to the compute node (HDFS nodes).
HDinsight provide
features like monitoring and automated deployment of Hadoop cluster with its
ecosystems. Such features are very helpful for testing of application developed
and deployed on HDinsight. Testers can leverage benefitsof monitoring for
different benchmarking &other non-functional aspects. In big data testing, test
infrastructure plays a very important role.Test Infrastructure should be
scalable enough to validate functional and non-functional aspects of
application and certify production ready releases. HDinsight is a service over
Microsoft azure (cloud) and allow test engineers to add or remove any number of
instances from existing cluster.
Test Environment
should be efficient enough in terms of configuration and memory to process
large amount of data. HDinsight provide automated cluster deployment, so there
is no need to worry about manual configuration of Hadoop and it’s ecosystems on
different number of nodes.
Monitoring is
the primaryneed for a cluster during data processing to ensure efficient
utilization of cluster resources. HDinsight provides monitoring features which
could help in performance monitoringandwill provide real time details about the
cluster performance.
A testing user
can design different use cases based on application needs (in terms of Hadoop
ecosystems) and deploy his cluster in an automated manner. This solves the
purpose of test environment which create a platform for big data testing on
Azure.
Certain other benefits
are-
- Provides Open Database Connectivity (ODBC) drivers to integrate Business Intelligence (BI) tool
- Full set of components in Hadoop ecosystem like pig scoop or hive
- Provides a Sqoop connector
- Simplified configuration and post-processing of Hadoop jobs
- Provides JavaScript and Hive interactive consoles to make it more usable
Concluding the
same, HDinsight is suited best for development as well as testing of
applications based on Hadoop. Looking at
the 3 v’s of big data and testing challenges associated with it, HDinsight
provides monitoring and performance tuning features together to test the
scalability of applications. Test engineer can deploy test environment on azure
based on the test scenarios and scale it to any number of nodes and at the
last, terminate the instances as per the needs or after execution of test cycles.
References-

No comments:
Post a Comment