Click here to Skip to main content
Click here to Skip to main content
Go to top

Hierarchical Cluster Engine Project

, 13 Jan 2014
Rate this:
Please Sign up or sign in to vote.
Engine that builds network hierarchical infrastructure, remote distributed computation clusters, native Sphinx search engine support, ZMQ sockets, stable connections, JSON messages transport, flexible structure and more...

Introduction

The HCE project - it is engine that tries to solve many modern problems in area of distributed data mining, remote parallel data processing, text mining, cluster computing, mesh network infrastructure, hierarchical network structures, reducing of reverse data processing in multi-node environment, and related.

To be continued...

Background

This project became the successor of Associative Search Machine (ASM) full-text web search engine project that was developed from 2006 to 2012 by IOIX Ukraine.

The main idea of this new project – to implement the solution that can be used to: construct custom network mesh or distributed network cluster structure with several relations types between nodes, formalize the data flow processing goes from upper node level central source point to down nodes and backward, formalize the management requests handling from multiple source points, support native reducing of multiple nodes results (aggregation, duplicates elimination, sorting and so on), internally support powerful full-text search engine and data storage, provide transactions-less and transactional requests processing, support flexible run-time changes of cluster infrastructure, have many languages bindings for client-side integration APIs in one product build on C++ language…

HCE Application Area

  • As a network infrastructure and messages transport layer provider – the HCE can be used in any big-data solution that needs some custom network structure to build distributed high-performance easy scalable vertically and horizontally data processing or data-mining architecture.
  • As a native internally supported full text search engine interface provider – the HCE can be used in web or corporate network solutions that needs smoothly integrated with usage of natural target project specific languages, fast and powerful full text search and NOSQL distributed data storage. Now the Sphinx (c) search engine with extended data model internally supported.
  • AS a Distributed Remote Command Execution service provider – the HCE can be used for automation of administration of many host servers in ensemble mode for OS and services deployment, maintenance and support tasks.

Hierarchical Cluster as Engine

  • Provides hierarchical cluster infrastructure – nodes connection schema, relations between nodes, roles of nodes, requests typification and data processing sequences algorithms, data sharding modes, and so on.
  • Provides network transport layer for data of client application and administration management messages.
  • Manages native supported integrated NOSQL data storage (Sphinx (c) search index and Distributed Remote Command Execution).
  • Collect, reduce and sort results of native and custom data processing.
    Ready to support transactional messages processing.

    HCE Key Functional Principles

  • Free network cluster structure architecture. Target applied project can construct specific schema of network relations that succeeds on hardware, load-balancing, file-over, fault-tolerance and another principles on the basis of one simple Lego-like engine.
  • Stable based on ZMQ sockets reversed client-server networking protocol with connection heart-beating, automated restoration and messages buffering.
  • Easy asynchronous connections handling with NUMA oriented architecture of messages handlers.
    Unified I/O messages based on json format.
  • Ready to have client APIs bindings for many programmer languages covered by ZMQ library. Can be easily integrated and deployed.

API Bindings

HCE supposes to use several programming languages that is covers by API bindings. Now API implemented only for PHP language, but for Java and Python will be ready soon...

API bindings provide several client-side APIs for request/response interactions and examples of usage of API in dedicated applications. There are applications for node management, Sphinx searcher, DRCE exec manager and some tools for interactions and data transformations. All APIs are documented and json format messages protocol defined. Please, see more detailed the documentation files and doxygen generated.

The PHP API is structural code implementation. The set of global definition allows to get complex search or DRCE requests very easy and shirt way, but flexible options. For example, search requests processing by searcher can be done in two step. First step - is preparation of request:

<?php $ret_arrays_mask=(HCE_SPHINX_SEARCH_RET_TYPE_MI_INFO | HCE_SPHINX_SEARCH_RET_TYPE_RI_INFO | 
  HCE_SPHINX_SEARCH_RET_TYPE_AT_INFO | HCE_SPHINX_SEARCH_RET_TYPE_WI_INFO).'';
$parameters_array=hce_sphinx_search_prepare_parameters($query_string, 
   array(HCE_SPHINX_SEARCH_FIELD_TYPE_MASK=>$ret_arrays_mask,
   HCE_SPHINX_SEARCH_FIELD_MAX_RESULTS=>$MAX_RESULTS,
   HCE_SPHINX_SEARCH_FIELD_FILTERS=>$filters,
   HCE_SPHINX_SEARCH_FIELD_ORDER=>$order,
   HCE_SPHINX_SEARCH_FIELD_TIMEOUT=>$sphinx_timeout));
$Request_body=hce_sphinx_search_create_json($parameters_array);

And the next step is request execution with waiting on response in loop of specified iterations:

<?php require_once '../inc/hce_node_api.inc.php';
>require_once '../inc/search.ini.php';

$hce_connection=hce_connection_create(array('host'=>$Connection_host, 'port'=>$Connection_port, 
  'type'=>HCE_CONNECTION_TYPE_ROUTER, 'identity'=>$Client_Identity));
  
if(!$hce_connection['error']){
  if($LOG_MODE!=3){
    echo 'Client ['.$Client_Identity.'] conected, start to send '.
      $MAX_QUERIES.' message requests...'.PHP_EOL.PHP_EOL;
}

$t=time();
$Timedout=0;

for($i=1; $i<=$MAX_QUERIES; $i++){
    //$Request_Id=uniqid('ID-'.$i.'-'.date('H:i:s').'-', TRUE);
    $Request_Id=hce_unique_message_id(1, $i.'-'.date('H:i:s').'-');
    $msg_fields=array('id'=>$Request_Id, 'body'=>$Request_body);
    hce_message_send($hce_connection, $msg_fields);
    
    if($LOG_MODE!=3){
      echo 'request message '.$i.' ['.$Request_Id.'] sent...'.PHP_EOL;
    }
    $hce_responses=hce_message_receive($hce_connection, $RESPONSE_TIMEOUT);
    if($hce_responses['error']===0){
        foreach($hce_responses['messages'] as $hce_message){
          //var_dump($hce_message);
          if($LOG_MODE==3){
          echo $hce_message['body'];
        }else{
          $rjson=hce_sphinx_search_parse_json($hce_message['body']);
          if($LOG_MODE!=0){
            echo 'Message: id=['.$hce_message['id'].'], body=['.$hce_message['body'].']'.PHP_EOL;
          }
          if($LOG_MODE==4){
            echo var_export($rjson, true).PHP_EOL;
          }
          echo 'Documents:'.count($rjson[HCE_SPHINX_SEARCH_RESULTS_MI]).PHP_EOL;
        }
      }
    }else{
      if($hce_responses['error']==HCE_PROTOCOL_ERROR_TIMEOUT){
        $Timedout++;
        if($LOG_MODE!=3){
          echo 'request timeout'.PHP_EOL;
        }
      }else{
        if($LOG_MODE!=3){
          echo 'request unknown error'.PHP_EOL;
        }
      }
    }

    if($REQUEST_DELAY>0){
      sleep($REQUEST_DELAY);
    }
  }

  hce_connection_delete($hce_connection);
  if($LOG_MODE!=3){
    echo PHP_EOL.'Finished '.$MAX_QUERIES.' queries, '.(time()-$t).' sec, '.
      floor($MAX_QUERIES/(time()-$t+0.00001)).' rps, '.$Timedout.' timedout'.PHP_EOL;
  }
}else{
  if($LOG_MODE!=3){
    echo 'Connection create error '.$hce_connection['error'].PHP_EOL;
  }else{
    echo json_encode(array('data'=>array('match_info'=>array(), 
    'word_info'=>array(), 'doc_info'=>array())));
  }
}

As can be seen from this fragment of code of Searcher application (see full code in doxygen generated), client side API makes very easy the implementation of custom client application to make complex Sphinx index search requests and integration in to the target project.

The Demo Test Suit (DTS) archive for PHP language that is included in to the Debian OS package and in to the tarball archive of HCE contains complete set of API, demo versions of client side applications, test scripts and utilities as well as tests for two different clusters schema, Sphinx search tests and DRCE tests with implementation of executable tasks algorithms in languages like Bash, PHP, Python, Ruby, Perl and Java.

Points of Interest

The way to create this system was long and not direct. During seven years of development of closed Web Search System ASM with complete own solution for distributed multi-threaded crawling, document indexation, index architecture for full-text search, document storage system, linguistic dictionary system, search machines conveyers, distributed index and search processing and many related services like Relevant Words service, own networking engine including self recoverable connections, multi-threaded TCP sockets handling and so on - teams have collected huge experience in several key areas related with distributed computing, text mining, computer linguistics and especially networking engines based on TCP transport.

The HCE project - it is trying to look at a problem from another corner of view. As distinct from the ASM, the HCE is not a closed complete in-box ready solution that can be used by end-user. It is a lego system that allows to construct very different systems for very different applied usage aims. The set of tools and examples of implementation is a possible way, but not a strict dedicated implementation.

So, HCE opens a door to many experimental projects and investigations in area of network clusterization and distributed data processing as well as it is a basic ground for building big-data full-text engine, flexible mesh structured data processing network and so on...

For more details, please visit the main HCE project site, documentation, try to download and to install hce-node and to play with Demo Test Suit.

Main application hce-node updated to v1.1.1, changelog see at:

http://hierarchical-cluster-engine.com/blog/2014/01/10/hce-node-application-updated-to-v-1-1-1/

Best wishes and modern experience with HCE project.

Gennady Baranov and HCE project team.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Member 10492436
Team Leader IOIX Ukraine
Ukraine Ukraine
http://hierarchical-cluster-engine.com/gennady-baranov-cv/

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web02 | 2.8.140905.1 | Last Updated 13 Jan 2014
Article Copyright 2013 by Member 10492436
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid