Click here to Skip to main content
Click here to Skip to main content

Building on My SQL Storage Engines and API’s to conduct complex analysis using sql query language

, 29 Jan 2014
Rate this:
Please Sign up or sign in to vote.
This will provide opportunity to calculate aggregates, joins on the data retrieved from API’s which was never done.

Introduction 

MySQL provides support for many storage engines which manage storage and retrieval. MyISAM and InnoDB are common. On table creation engine can be specified. A basic storage engine allows read only tables. Most engines read data from files and feed to MYSQL. On the other hand, our idea is to retrieve data from API’s and feed it to MYSQL. This will allow us to conduct complex analysis using sql on data retrieved from API’s. The idea is to write an engine for each set of related API’s but this does not mean that we have to write each engine from scratch but we are proposing a generic component which can be used by all storage engine and it will simplify the task of writing engine very much.

Background   

The idea came into my mind when I was thinking about doing analysis on running processes in the system. I needed up to date list of running processes in the system on mysql console. Also I needed to conduct complex analysis like count of modules in the process. In order to provide simplistic view of data, we expose tables and write an engine which calls the API’s to retrieve data. Here API’s were already available. A conventional approach could be to create a table and insert into it row for new process and delete row for terminated process. 

  • One drawback of this approach is that you have to insert and delete rows into the table which are time consuming. 
  • Another drawback is that you need to trigger that logging application which inserts and delete rows from the table before querying the table
  • Another drawback is that even if you need the information about a single process, all processes information is maintained in table
  • Another drawback is that you can get conflict in process id’s in case one process was terminated but a new process with the same id came for insertion before the deletion of the terminated process.
  • Another drawback is the table may be locked during insertion or deletion which can create performance bottlenecks for the retrieval entity. 

Instead the proposed approach avoids these problems as follows:

  • No need to insert and delete rows from the table
  • No need to trigger the logging application before querying the table
  • No need to get info about all processes when single process info is required
  • No inconsistency in process id’s as it represents a state of the system
  • No bottlenecks, as there are no insertion and deletion. 

Why is this topic important to our audience and how will they benefit? 

 A specific problem which the proposed approach solves better than the conventional techniques is that we have a time series data in tables. When a table size reaches a certain limit a new table is created. The requirement was to do analysis on that data using Tableau Business Intelligence tool. The tableau generates sql queries. For convenience of the user a single table was required instead of the underlying many tables. Programmatically queries can be generated to retrieve data for certain duration and display in web pages but tableau cannot utilize those programs. One conventional technique is to replicate data in a single table but this will void the concept behind maintaining multiple tables. Another conventional technique is to use views but the query behind view need to be programmatically generated and this will void the use of view. The proposed technique exposes a single table and uses available API’s to retrieve data from multiple tables. The solution is to write an engine. The simplest engine is to feed all the data and let MySQL to filter but this is highly inefficient. The proposed storage engine builds on the power of mysql query parser. During MYSQL query parsing the where clause is converted to functions/API calls to retrieve only required data to some extent. The idea can be generalized to any other API’s without having to rewrite the core logic of generating API calls. 

What are the key points the reader should remember after reading this article? 

This idea will solve problems like joining data from completely disparate sources. This will provide opportunity to calculate aggregates, joins on the data retrieved from API’s which was never done. This will allow simplistic picture of data and hide the complex details of database design. This will allow us to avoid time consuming joins by combining two tables into one.  This will build on already written API’s to hide the complex queries behind and provide simple tables to user for even more complex analysis. 

How our Storage Engine is different from others

Now I will tell how our proposed engines are different from generally available storage engines. Blackhole engine as the name suggests is used when the applications want the information to disappear automatically. Federated storage engine provides a view of remote MYSQL table. Maria storage engine provides millisecond support. XtraDB storage engine is a replacement for innodb. Falcon aims as a better transactional storage engine. PrimeBase XT does not provide durability in ACID compliance. AWSS3 storage engine is used to access data from Amazon Web Services S3 and exposes the data as tables. What is different in our proposed engines as compared to AWSS3 is that we provide a generic component which can be used by all the engines and simplify writing a storage engine. Mdbtools storage engine can be used to move data from Microsoft Access to mysql. Ritmark FS storage allows to upload files on ftp. Q4M provides message queuing functionality.  

Proposed Storage Engines   

 Our proposed storage engine support tables that are populated on the fly and don’t retain any data. The proposed storage engine builds on the power of mysql query parser. During MYSQL query parsing the where clause is converted to functions/API calls. The functions/APIs which don’t use global or static variables and behave like mathematical functions are the candidate for complex analysis. Those functions/APIs return data structures, xml etc. when traversed can be converted into relations (tables). Different functions can be combined to feed data to a table .The functions input parameters (not necessarily all) are passed in the where-clause and functions result is shown as table. For example a storage engine named ProcessInfoEngine is created along with following tables:  

Create table RunningProcessList(
PROCESS NAME  varchar(2048),
Process ID int,
Thread count int,
Parent process ID  int,
Priority Class int,
)engine= ProcessInfoEngine;

Create table ProcessModulesList
(
Process ID int,
MODULE NAME varchar(2048),
Executable varchar(2048),
Baseaddress varchar (255),
Basesize int
) engine= ProcessInfoEngine;
When the user issues a query select * from RunningProcessList the control passes to ProcessInfoEngine which checks the table name and where clause.
int ha_example::rnd_init(bool scan)
{
	DBUG_ENTER("ha_example::rnd_init");

	if(_stricmp(table_share->table_name.str,"RunningProcessList")==0)
	{
		THD *thd=this->ha_thd();
		SELECT_LEX *select_lex=&thd->lex->select_lex;
		m_processListIndex=0;
		if(select_lex->where==0||select_lex->where->RunningProcessListSize==0)
		{
			GetAllProcessList(m_processList);
		}
		else
		{
			GetProcessListSpecifiedInWhereClause(m_processList,select_lex->where->RunningProcessList,select_lex->where->RunningProcessListSize);
		}
	}
	DBUG_RETURN(0);
}

In this case where clause is empty therefore the engine takes a snapshot of all running processes using following win32 api:

  • CreateToolhelp32Snapshot
  • Process32First
  • Process32Next

Note: Following code is adapted from [1]

BOOL GetAllProcessList(vector< runningprocess> &procList )
{
	HANDLE hProcessSnap;
	HANDLE hProcess;
	PROCESSENTRY32 pe32;
	DWORD dwPriorityClass;

	// Take a snapshot of all processes in the system.
	hProcessSnap = CreateToolhelp32Snapshot( TH32CS_SNAPPROCESS, 0 );
	if( hProcessSnap == INVALID_HANDLE_VALUE )
	{
		return( FALSE );
	}

	// Set the size of the structure before using it.
	pe32.dwSize = sizeof( PROCESSENTRY32 );

	// Retrieve information about the first process,
	// and exit if unsuccessful
	if( !Process32First( hProcessSnap, &pe32 ) )
	{
		CloseHandle( hProcessSnap );          // clean the snapshot object
		return( FALSE );
	}

	// Now walk the snapshot of processes
	do
	{
		// Retrieve the priority class.
		dwPriorityClass = 0;
		hProcess = OpenProcess( PROCESS_ALL_ACCESS, FALSE, pe32.th32ProcessID );
		if( hProcess == NULL )
		{
		}
		else
		{
			dwPriorityClass = GetPriorityClass( hProcess );
			CloseHandle( hProcess );
		}
		struct RunningProcess myRunningProcess;
		myRunningProcess.ProcessName=pe32.szExeFile;
		myRunningProcess.ParentProcessID=pe32.th32ParentProcessID;
		myRunningProcess.PriorityClass=dwPriorityClass;
		myRunningProcess.ProcessID=pe32.th32ProcessID;
		myRunningProcess.ThreadCount=pe32.cntThreads;
		procList.push_back(myRunningProcess);
	} while( Process32Next( hProcessSnap, &pe32 ) );

  CloseHandle( hProcessSnap );
  return( TRUE );
} 

The engine feeds the process information rows to mysql one by one. MySQL shows the table RunningProcessList data

int ha_example::rnd_next(uchar *buf)
{
	int rc;
	DBUG_ENTER("ha_example::rnd_next");
	MYSQL_READ_ROW_START(table_share->db.str, table_share->table_name.str,
					TRUE);
	if(m_processListIndex<m_processlist.size()) **field="table-">field;	  
		bitmap_set_bit(table->write_set, (*field)->field_index);
		(*field)->set_notnull();
		(*field)->store(m_processList[m_processListIndex].ProcessName.c_str(),strlen(m_processList[m_processListIndex].ProcessName.c_str()), system_charset_info);
		field++;

		bitmap_set_bit(table->write_set, (*field)->field_index);
		(*field)->set_notnull();
		(*field)->store(m_processList[m_processListIndex].ProcessID,true);
		field++;

		bitmap_set_bit(table->write_set, (*field)->field_index);
		if(m_processList[m_processListIndex].ThreadCount!=-1)
		{
			(*field)->set_notnull();
			(*field)->store(m_processList[m_processListIndex].ThreadCount,true);
		}
		else
		{
			(*field)->set_null();
		}
		field++;

		bitmap_set_bit(table->write_set, (*field)->field_index);
		(*field)->set_notnull();
		(*field)->store(m_processList[m_processListIndex].ParentProcessID,true);
		field++;

		bitmap_set_bit(table->write_set, (*field)->field_index);
		(*field)->set_notnull();
		(*field)->store(m_processList[m_processListIndex].PriorityClass,true);
		field++;

		m_processListIndex++;
		rc=0;
	}
	else
	{
		rc= HA_ERR_END_OF_FILE;
	}
	MYSQL_READ_ROW_DONE(rc);
	DBUG_RETURN(rc);
}    
 
PROCESS NAMEProcess IDThread countParent process IDPriority base
[System Process]0x0000000040x000000000
smss.exe0x000001240x0000000411
csrss.exe0x0000019490x0000018C13
wininit.exe0x0000027430x0000018C13
services.exe0x000002B470x000002749
lsass.exe0x000002CC70x00000274 9
explorer.exe0x0000062C300x000005CC8

When user issues a query like select * from RunningProcessList where processid=0x00000000. One option is to feed all running processes to mysql and let mysql filter the other process id’s. Another option is to call OpenProcess api and pass processid=0x00000000 as parameter and feed only one row. 

void GetProcessListSpecifiedInWhereClause(vector < RunningProcess > & procList,  int *RunningProcessList,int RunningProcessListSize)
{
	for(int i=0;i < RunningProcessListSize;i++)
	{
		HANDLE hProcess;
		hProcess = OpenProcess( PROCESS_ALL_ACCESS, FALSE,RunningProcessList[i]  );
		if( hProcess != NULL )
		{
			char lpExeName[MAX_PATH];
			DWORD dwSize=MAX_PATH;
			dwSize=QueryFullProcessImageName(hProcess,0,lpExeName,&dwSize);
			if(dwSize==0) continue;
			ULONG_PTR ppid=GetParentProcessId(hProcess);
			DWORD dwPriorityClass;
			dwPriorityClass = GetPriorityClass( hProcess );
			struct RunningProcess myRunningProcess;
			basic_string < char > str=lpExeName;
			basic_string < char   >::size_type index;
			static const basic_string < char > ::size_type npos = -1;
			index=str.find_last_of('\\');
			if ( index != npos )
			{
				myRunningProcess.ProcessName=str.substr(index+1,str.length()-index);
			}
			else
			{
				myRunningProcess.ProcessName=str;
			}
			myRunningProcess.ParentProcessID=ppid;
			myRunningProcess.PriorityClass=dwPriorityClass;
			myRunningProcess.ProcessID=RunningProcessList[i];
			myRunningProcess.ThreadCount=-1;
			procList.push_back(myRunningProcess);
			CloseHandle( hProcess );
		}
	}
}

Note: Following code is taken from [2] 

ULONG_PTR GetParentProcessId(HANDLE hProcess) // By Napalm @ NetCore2K
{

	ULONG_PTR pbi[6];
	ULONG ulSize = 0;
	LONG (WINAPI *NtQueryInformationProcess)(HANDLE ProcessHandle, ULONG ProcessInformationClass,
	PVOID ProcessInformation, ULONG ProcessInformationLength, PULONG ReturnLength); 

	*(FARPROC *)&NtQueryInformationProcess = 
	GetProcAddress(LoadLibraryA("NTDLL.DLL"), "NtQueryInformationProcess");

	if(NtQueryInformationProcess)
	{
		if(NtQueryInformationProcess(hProcess, 0,
		&pbi, sizeof(pbi), &ulSize) >= 0 && ulSize == sizeof(pbi))
		return pbi[5];
	}
	return (ULONG_PTR)-1;

}

The result of the query is: 

PROCESS NAMEProcess IDThread countParent process IDPriority base
[System Process]0x0000000040x000000000

Now the question is from where process list specified in where clause come from? The answer is that during SQL query parsing the where clause is converted to running process list. The SQL grammar is augmented with semantic rules. The changes in semantic rules are highlighted in bold below: 

expr:
          expr or expr %prec OR_SYM
          {
            /*
              Design notes:
              Do not use a manually maintained stack like thd->lex->xxx_list,
              but use the internal bison stack ($$, $1 and $3) instead.
              Using the bison stack is:
              - more robust to changes in the grammar,
              - guaranteed to be in sync with the parser state,
              - better for performances (no memory allocation).
            */
            Item_cond_or *item1;
            Item_cond_or *item3;
            if (is_cond_or($1))
            {
              item1= (Item_cond_or*) $1;
              if (is_cond_or($3))
              {
                item3= (Item_cond_or*) $3;
                /*
                  (X1 OR X2) OR (Y1 OR Y2) ==> OR (X1, X2, Y1, Y2)
                */
                SemanticRule($1,$3,$3,false); 
                item3->add_at_head(item1->argument_list());
                $$ = $3;
              }
              else
              {
                /*
                  (X1 OR X2) OR Y ==> OR (X1, X2, Y)
                */
                HandleCase($3);
                SemanticRule($1,$3,$1,false);
                item1->add($3);

                $$ = $1;
              }
            }
            else if (is_cond_or($3))
            {
              item3= (Item_cond_or*) $3;
              /*
                X OR (Y1 OR Y2) ==> OR (X, Y1, Y2)
              */
              HandleCase($1);
              SemanticRule($1,$3,$3,false);
              item3->add_at_head($1);
              $$ = $3;
            }
            else
            {
              /* X OR Y */
			  
              $$ = new (YYTHD->mem_root) Item_cond_or($1, $3);
              if ($$ == NULL)
                MYSQL_YYABORT;				
              HandleCase($1);
              HandleCase($3);
              SemanticRule($1,$3,$$,false);
			}
          }
        | expr XOR expr %prec XOR
          {
            /* XOR is a proprietary extension */
            $$ = new (YYTHD->mem_root) Item_func_xor($1, $3);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | expr and expr %prec AND_SYM
          {
            /* See comments in rule expr: expr or expr */
            Item_cond_and *item1;
            Item_cond_and *item3;
            if (is_cond_and($1))
            {
              item1= (Item_cond_and*) $1;
              if (is_cond_and($3))
              {
                item3= (Item_cond_and*) $3;
                /*
                  (X1 AND X2) AND (Y1 AND Y2) ==> AND (X1, X2, Y1, Y2)
                */
                SemanticRule($1,$3,$3,true);
                item3->add_at_head(item1->argument_list());
                $$ = $3;
              }
              else
              {
                /*
                  (X1 AND X2) AND Y ==> AND (X1, X2, Y)
                */
                HandleCase($3);
                SemanticRule($1,$3,$1,true);
                item1->add($3);
                $$ = $1;
              }
            }
            else if (is_cond_and($3))
            {
              item3= (Item_cond_and*) $3;
              /*
                X AND (Y1 AND Y2) ==> AND (X, Y1, Y2)
              */
              HandleCase($1);
              SemanticRule($1,$3,$3,true);
              item3->add_at_head($1);
              $$ = $3;
            }
            else
            {
              /* X AND Y */
              $$ = new (YYTHD->mem_root) Item_cond_and($1, $3);
              if ($$ == NULL)
                MYSQL_YYABORT;
              HandleCase($1);
              HandleCase($3);
              SemanticRule($1,$3,$$,true);
            }
          }
        | NOT_SYM expr %prec NOT_SYM
          {
            $$= negate_expression(YYTHD, $2);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | bool_pri IS TRUE_SYM %prec IS
          {
            $$= new (YYTHD->mem_root) Item_func_istrue($1);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | bool_pri IS not TRUE_SYM %prec IS
          {
            $$= new (YYTHD->mem_root) Item_func_isnottrue($1);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | bool_pri IS FALSE_SYM %prec IS
          {
            $$= new (YYTHD->mem_root) Item_func_isfalse($1);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | bool_pri IS not FALSE_SYM %prec IS
          {
            $$= new (YYTHD->mem_root) Item_func_isnotfalse($1);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | bool_pri IS UNKNOWN_SYM %prec IS
          {
            $$= new (YYTHD->mem_root) Item_func_isnull($1);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | bool_pri IS not UNKNOWN_SYM %prec IS
          {
            $$= new (YYTHD->mem_root) Item_func_isnotnull($1);
            if ($$ == NULL)
              MYSQL_YYABORT;
          }
        | bool_pri
        ;

where_clause:
          /* empty */  { Select->where= 0; }
        | WHERE
          {
            Select->parsing_place= IN_WHERE;
          }
          expr
          {
            SELECT_LEX *select= Select;
            select->where= $3;
            HandleCase($3);
            select->parsing_place= NO_MATTER;
            if ($3)
              $3->top_level_item();
          }
        ;

The following function extracts process id from equal expression For example processid=0x00000000.  

void Item_func_eq_case(Item *item)
{
	Item_func * itemFunction=dynamic_cast<item_func*>(item);
	Item **arguments=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
	}
	else
	{
		return;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return;
	}
	if(dynamic_cast <item_field*>(arguments[1]))
	{
		return;
	}
	else
	{
		value = arguments[1];
	}
	
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"ProcessID")==0)
	{
		item->RunningProcessList=(int*)sql_alloc(sizeof(int)*1);
		item->RunningProcessListSize=1;
		item->RunningProcessList[0]=value->val_int();
	}
}

The following function extracts list of process ids from in expression. For example  processid in (0x00000000,0x00000124)  

void Item_func_in_case(Item *item)
{
	Item_func * itemFunction=dynamic_cast<item_func*>(item);
	Item **arguments=0;
	int inArgcount=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
		inArgcount=itemFunction->arg_count;
	}
	else
	{
		return;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return;
	}
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"ProcessID")==0)
	{
		item->RunningProcessList=(int*)sql_alloc(sizeof(int)*inArgcount);
		item->RunningProcessListSize=0;
		LONG index;
		for (index = 1; index < inArgcount; index++)
		{
			if(dynamic_cast < item_field* >(arguments[index]))
			{
				continue;
			}
			else
			{
				value = arguments[index];
			}
			item->RunningProcessList[item->RunningProcessListSize]=value->val_int();
			item->RunningProcessListSize++;
		}
	}
}

The following function calls one of the above functions depending upon type of Item 

void HandleCase(Item *item)
{
	if(dynamic_cast<item_func_in*>(item))
	{
		Item_func_in_case(item);
	}
	else if(dynamic_cast<item_func_eq*>(item))
	{
		Item_func_eq_case(item);
	}
}

 The following function performs and between the arrays containing running processes. The process id's common to both arrays are in the resulting running process array. 

void CombineAnd(Item *arg1,Item *arg2,int *&resultRunningProcessList,int &resultRunningProcessListSize)
{
	if(arg1->RunningProcessListSize+arg2->RunningProcessListSize>0)
	{
		resultRunningProcessList=(int*)sql_alloc(sizeof(int)*(arg1->RunningProcessListSize+arg2->RunningProcessListSize));
		resultRunningProcessListSize=0;
		for(int i=0;i<arg1->RunningProcessListSize;i++)
		{
			for(int j=0;j<arg2->RunningProcessListSize;j++)
			{
				if(arg1->RunningProcessList[i]==arg2->RunningProcessList[j])
				{
					resultRunningProcessList[resultRunningProcessListSize]=arg1->RunningProcessList[i];
					resultRunningProcessListSize++;
					break;
				}
			}
		}
	}

}

The following function performs or between the arrays containing running processes. The process id's common to both arrays are included one time and rest are also the part of  the resulting running process array.  

void CombineOR(Item *arg1,Item *arg2,int *&resultRunningProcessList,int &resultRunningProcessListSize)
{
	if(arg1->RunningProcessListSize+arg2->RunningProcessListSize>0)
	{
		resultRunningProcessList=(int*)sql_alloc(sizeof(int)*(arg1->RunningProcessListSize+arg2->RunningProcessListSize));
		resultRunningProcessListSize=0;
		for(int i=0;i<arg1->RunningProcessListSize;i++)
		{
			resultRunningProcessList[resultRunningProcessListSize]=arg1->RunningProcessList[i];
			resultRunningProcessListSize++;
		}
		for(int j=0;j<arg2->RunningProcessListSize;j++)
		{
			resultRunningProcessList[rultRunningProcessListSize]=arg2->RunningProcessList[j];
			resultRunningProcessListSize++;
		}
	}
}

The following function calls above functions depending upon the type of the operator (And,Or). 

void SemanticRule(Item *arg1,Item *arg2,Item *result,bool isAnd)
{
	int *resultRunningProcessList;
	int resultRunningProcessListSize;
	if(isAnd)
	{
		CombineAnd(arg1,arg2,resultRunningProcessList,resultRunningProcessListSize);
	}
	else
	{
		CombineOR(arg1,arg2,resultRunningProcessList,resultRunningProcessListSize);
	}
	result->RunningProcessList=resultRunningProcessList;
	result->RunningProcessListSize=resultRunningProcessListSize;
}

The user can enter complex queries. During parsing where clause is converted to calls to OpenProcess API and CreateToolhelp32Snapshot, Process32First, Process32Next when needed. For example if a portion of where clause contains process id the OpenProcess API should be called otherwise CreateToolhelp32Snapshot, Process32First, Process32Next should be called. Similarly, When the user issues a query select * from ProcessModulesList the control passes to ProcessInfoEngine which checks the table name and where clause. In this case where clause is empty therefore the engine takes a snapshot of all running processes using following win32 api 

  • CreateToolhelp32Snapshot 
  • Process32First
  • Process32Next
  • Module32First
  • Module32Next 
  • The engine feeds the process module information rows to mysql one by one. MySQL shows the table ProcessModulesList data

    Process IDMODULE NAMEExecutableBase addressBase size
    0x00000000ntdll.dllC:\Windows\SysWOW64\ntdll.dll0x774300001572864
    0x00000000 kernel32.dllC:\Windows\syswow64\kernel32.dll0x750000001048576
    0x00000ED0EXCEL.EXEC:\Program Files (x86)\Microsoft Office\Office12\EXCEL.EXE0x2F8F000018411520
    0x00000ED0ntdll.dllC:\Windows\SysWOW64\ntdll.dll0x774300001572864
    0x00000ED0kernel32.dllC:\Windows\syswow64\kernel32.dll0x750000001048576
    0x00000ED0ADVAPI32.dllC:\Windows\syswow64\ADVAPI32.dll0x76480000655360
    0x00000ED0msvcrt.dllC:\Windows\syswow64\msvcrt.dll0x765E0000704512

    The user can also join RunningProcessList and ProcessModulesList. For example select * from RunningProcessList, ProcessModulesList where RunningProcessList .Process ID= ProcessModulesList .Process ID The engine has nothing to do with the join it is done byMYSQL.

    The idea can be extended to API’s available:

    • Flickr API(almost all the functionality that runs flickr.com is available through the API)
    • Graph API
    • foursquare API
    • Tumblr API 
    • The Remember The Milk API
    • Google+ API
    • FriendFeed API
    • Geolocation API
    • The Goodreads API

    For each source of API’s/functions a storage engine can be created like Flick Storage engine. Flick Storage engine will call Flickr API [3] to retrieve data from Flickr. The user will write MYSQL queries to retrieve data from Flickr. For example: Create a table named “Group” with engine=Flickr_Storage_Engine having following columns:

    • Nsid
    • Name 
    • Iconfarm 
    • Iconserver
    • Admin
    • Eighteenplus
    • invitation_only
    • members
    • pool_count
    • api_key
    • user_id
    • extras

    The user writes following query: select * from Group where api_key=’ABCD’ and user_id=1; During MYSQL parsing the above query generates calls to Flickr API. Flickr storage engine calls api flickr.people.getGroups passing api_key and user_id as parameter. The API returns following xml

    <group nsid="17274427@N00" name="Cream of the Crop - Please read the rules" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="11935" pool_count="12522">
      <group nsid="20083316@N00" name="Apple" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="11776" pool_count="62438">
      <group nsid="34427469792@N01" name="FlickrCentral" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="168055" pool_count="5280930">
      <group nsid="37718678610@N01" name="Typography and Lettering" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="17318" pool_count="130169">
    </group></group> </group></group></groups>

    Flickr engine parses the xml and feeds the rows to mysql. The result of the query is: 

    Nsid

    Name

    Iconfarm

    Iconserver

    Admin

    Eighteenplus 

    invitation_only

    members

    pool_count

    17274427@N00

    Cream of the Crop - Please read the rules

     

    1

    1

    0

    0

    0

    11935

     

    12522

     

    20083316@N00

    Apple

     

    1

    1

    0

    0

    0

    11776

     

    62438

     

    34427469792@N01

     

    FlickrCentral

     

    1

    1

    0

    0

    0

    168055

     

    5280930

     

    37718678610@N01

     

    Typography and Lettering

     

    1

    1

    0

    0

    0

    17318

     

    130169

     

    This is a simple query but user can write complex queries involving joins and aggregates. The user can even join the data retrieved from different API’s.

    Where to get the source code:

    Download the attachment with this article. It contains following files:
    • ha_example.cc
    • ha_example.h
    • item.cc
    • item.h
    • sql_yacc.yy

    Download MYSQL source code from the following link: http://sourceforge.net/projects/mysql.mirror/files/MySQL%205.6.13/

    • Copy ha_example.cc to mysql-5.6.13\storage\example
    • Copy ha_example.h to mysql-5.6.13\storage\example
    • Copy item.cc to mysql-5.6.13\sql
    • Copy item.h to mysql-5.6.13\sql
    • Copy sql_yacc.yy to mysql-5.6.13\sql

    How to build the source code:

    Please follow the instruction given at: http://dev.mysql.com/doc/refman/5.6/en/installing-source-distribution.html

    Conclusion

    We have found a way to harness the power of mysql parser to do analysis on data that was previously never done. This opens up the horizon for new kinds of analysis This approach can be generalized. I want to write a series of articles on the topic

    Tutorial on Storage Engines:

    For more information: http://dev.mysql.com/doc/internals/en/custom-engine.html

    References:

    • [1]http://msdn.microsoft.com/en-us/library/windows/desktop/ms686701(v=vs.85).aspx
    • [2] http://stackoverflow.com/questions/185254/how-can-a-win32-process-get-the-pid-of-its-parent
    • [3]http://www.flickr.com/services/developer/

    License

    This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    About the Author

    SarmadAsghar
    Team Leader VF
    Pakistan Pakistan
    No Biography provided

    Comments and Discussions

     
    Questioni am not quite fully understand what is meaning of this sentences Pinmemberray_linn8-May-14 19:37 
    AnswerRe: i am not quite fully understand what is meaning of this sentences PinprofessionalSarmadAsghar11-May-14 2:24 

    General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

    Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

    | Advertise | Privacy | Mobile
    Web03 | 2.8.140721.1 | Last Updated 29 Jan 2014
    Article Copyright 2014 by SarmadAsghar
    Everything else Copyright © CodeProject, 1999-2014
    Terms of Service
    Layout: fixed | fluid