Introduction
Dealing with proxies can be a rather difficult process if one is unaware of the processes required. However, thanks to the RFCs 1919, 1928, 1929, 1961, 1945, and 2616, the process becomes a rather easy process. RFC 1738 defines URIs. As a method for dealing with proxies, I introduce a simple Whois client that handles a variety of methods for retrieving information. It will connect directly to the server to retrieve information, work with an HTTP proxy, or work with SOCKS versions 4, 4a, or 5 proxy. Here, we will only deal with the TCP connections.
Background - HTTP Proxy
An HTTP proxy is nothing more than a server that (usually) disallows communication on all ports except 80 (HTTP), and 443 (HTTPS). When dealing with this type of proxies, you have to use standard HTTP commands in order to retrieve information. To retrieve information from a proxy, you send a request message in the form of (copied directly from the RFC):
Request = Request-Line ; Section 5.1
*(( general-header ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.5" target=_blank rel=xref>4.5</A>
| request-header ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.3" target=_blank rel=xref>5.3</A>
| entity-header ) CRLF) ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.1" target=_blank rel=xref>7.1</A>
CRLF
[ message-body ] ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.3" target=_blank rel=xref>4.3</A>
////////////////////////////////////////////////////////////
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
////////////////////////////////////////////////////////////
Method = "OPTIONS" ; Section 9.2
| "GET" ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.3" target=_blank rel=xref>9.3</A>
| "HEAD" ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.4" target=_blank rel=xref>9.4</A>
| "POST" ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.5" target=_blank rel=xref>9.5</A>
| "PUT" ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.6" target=_blank rel=xref>9.6</A>
| "DELETE" ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.7" target=_blank rel=xref>9.7</A>
| "TRACE" ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.8" target=_blank rel=xref>9.8</A>
| "CONNECT" ; Section <A href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.9" target=_blank rel=xref>9.9</A>
| extension-method
extension-method = token
//////////////////////////////////////////////////////
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
/////////////////////////////////////////////////////////////////////
message-body = entity-body
| <entity-body encoded as per Transfer-Encoding>
/////////////////////////////////////////////////////////////////////
A common method takes the form of:
GET /apnic-bin/whois.pl?searchtext=215.13.20.155 HTTP/1.1\r\n
host: www.apnic.net:80\r\n
Proxy-Authorization: Basic base64EncodedUsername:base64EncodedPassword\r\n
\r\n
Where each line is one message sent to the server.
Background - SOCKS Proxy
Dealing with SOCKS proxies is a bit more difficult as there are three different versions. There is version 4, 4a which is an extension to 4, and version 5. Versions 4 and 4a use pretty much the same message format, while version 5 uses its own format entirely. The first byte in each version is the version number. For versions 4 and 4a, this is set to 0x04, while in version 5, this is set to 0x05. However, the similarity ends there. The format for version 4 is:
VER|CMDCODE|DSTPORT|DSTIP|USERID|NULL
VER: Version 1 byte set to 0x04
CMDCODE: 0x01 for CONNECT 0x02 for BIND (1 byte)
DSTPORT: destination port (2 bytes)
DSTIP: 4 bytes of the destination ip address
USERID: Ascii codes of characters of username
(variable length, may be zero bytes for no username)
NULL: 1 bytes of zero bits.
for example:
0x04| 0x01|0x00 0x50| 0x7F 0x00, 0x00 0x01 | 0x00
This represents a connect command to port 80
of 127.0.0.1 with no username
0x04| 0x01|0x00 0x50| 0x7F 0x00, 0x00 0x01 | 0x4F 0x77 0x6E 0x65 0x72 | 0x00
This represents a connect command to port 80
of 127.0.0.1 with username = Owner
Version 4a is pretty similar. It uses the same format as version 4, except if it can not determine the IP address for a given URL, then it sets the DSTIP to 0x00 0x00 0x00 0x**, with the last byte set to anything other than 0x00. Then, after the NULL byte terminating the user ID, it sends the domain name, terminated with another NULL byte.
0x04| 0x01|0x00 0x50| 0x7F 0x00, 0x00 0x01 | 0x00 |
0x77 0x77 0x77 0x2E 0x61 0x2E 0x63 0x6F 0x6D | 0x00
This represents a connect command to port 80
of 127.0.0.1 with no username going to www.a.com
0x04| 0x01|0x00 0x50| 0x7F 0x00, 0x00 0x01 | 0x4F 0x77 0x6E 0x65 0x72 | 0x00 |
0x77 0x77 0x77 0x2E 0x61 0x2E 0x63 0x6F 0x6D | 0x00
This represents a connect command to port 80
of 127.0.0.1 with username = Owner going to www.a.com
Socks version 5 uses its own protocol. The version byte is set to 0x05. The first packet sent contains the authentication method(s) supported. A response is received from the server indicating the selected protocol. The client enters the authentication sub-stage. After authenticating, the client sends a request packet. The server sends a reply packet, after which (if successful) the client sends packets to the proxy just as if it were sending the packets directly to the requested domain. For each Socks version, the USERID is the user ID as known by the operating system.
VER|NMETHODS|METHODS
VER: 1 byte set to 0x05
NMETHODS: 1 byte set to the number of methods that follows
METHODS: one or more bytes containg the methods supported
0x00 : no authentication
0x01 : GSSAPI
0x02 : Username / Password
0x03-0x7F : IANA Assigned protocols
0x80-0xFE : Private methods
Server replies with:
VER|METHOD
VER: Version 1 byte set to 0x05
METHOD: 0x00-0xFE indicating the selected method
or 0xFF indicating no acceptable method
for example:
0x05|0x02|0x00 0x02
This indicates that the client accepts 2 authenication methods,
no authentication and username/password.
After this, the client enters the authentication stage, if needed. If no authentication is selected, the client is free to send the request packet. For Username / Password authentication:
The client sends a packet containing
the username and password which looks like:
VER|ULEN|USERID|PLEN|PASSWORD
VER: Version 1 byte set to 0x05
ULEN: 1 byte indicating the number of characters in the username
USERID: Ascii numbers of the individual characters. Variable length.
PLEN: 1 byte indicating the password length
PASSWORD: Variable length ascii characters.
for example:
0x05| 0x05 | 0x4F 0x77 0x6E 0x65 0x72 | 0x08 |
0x70 0x61 0x73 0x73 0x77 0x6F 0x72 0x64
This represents a 5 letter user id of Owner,
with password set to password.
After authentication, the next step is to send a request to the server. These take the form of:
VER|CMD|0x00|ADDRESSTYPE|DSTADDR|DSTPORT
VER: 1 byte with value 0x05
CMD: 0x01 for CONNECT, 0x02 for BIND, 0x03 for UDP Associate
0x00: is a NULL value reserved byte
ADDRESSTYPE: 0x01 for IPV4 address, 0x03 for Domain name, 0x04 for IPV6 address
DSTADDDR: Variable length destination ip address or domain name
DSTPORT: 2 bytes indicating destination port value
For example:
0x05 | 0x01 | 0x00 | 0x03 | 0x77 0x77 0x77 0x2E
0x61 0x2E 0x63 0x6F 0x6D | 0x00 0x50
indicating a connect command using
the domain name of www.a.com to port 80
The next step is the server's response:
VER|REPLYCODE|0x00|ADDRESSTYPE|BNDADDR|BNDPORT
VER: version 1 byte set to 0x05
REPLYCODE:
0x00: succeeded
0x01: general SOCKS server failure
0x02: connection not allowed by ruleset
0x03: Network unreachable
0x04: Host unreachable
0x05: Connection refused
0x06: TTL expired
0x07: Command not supported
0x08: Address type not supported
0x00: NULL valued byte
ADDRESSTYPE: 0x01 for IPV4 address,
0x03 for Domain name,
0x04 for IPV6 address
BNDADDR: server bound address
BNDPORT: server bound port in network format
In case 0x00 for the reply code, transmition
may proceed and all data transmitted will be directed
to the destination server. In all other cases,
the connection must be closed.
Using the code
The code contains a command prompt Whois client. It supports a direct connection to the Whois server, a connection to certain servers through a HTTP proxy connection, and a connection through SOCKS proxies. The code contains two classes, one for a SOCKS connection, the other for a HTTP connection.
Using the Socks client is as simple as:
CSocksProxyClient socks;
socks.SetSocksVersion(sv);
if (username != NULL)
{
socks.SetUserName(username);
}
if (password != NULL)
{
socks.SetPassword(password);
}
socks.SetProxy(proxy,proxyport);
if (socks.Connect(host,port))
{
char msg[2048];
sprintf(msg, "%s\n", domain);
socks.Send(msg,(int)strlen(msg));
}
Using the HTTP proxy client is simply:
char msg[MAX_PATH];
CHttpProxyClient httpProxy;
httpProxy.SetHttpVersion(HTTP_11);
bool bUseAuthorization = false;
if (username != NULL)
{
httpProxy.SetUserName(username);
bUseAuthorization = true;
}
if (password != NULL)
{
httpProxy.SetPassword(password);
bUseAuthorization = true;
}
if (!httpProxy.SetProxy(proxy,proxyport))
return;
if(!httpProxy.Connect())
return;
host = strlwr(host);
if (strcmp(host,"whois.internic.net")==0)
{
sprintf(msg,"%s%s%s","http://reports.internic"
".net/cgi/whois?whois_nic=",domain,"&type=domain");
}
else if (strcmp(host,"whois.register.com")==0)
{
sprintf(msg,"%s%s%s","http://whois.register.com/"
"cgi/whois?whois_nic=",domain,"&type=domain");
}
else if (strcmp(host,"whois.ripe.net")==0)
{
sprintf(msg,"%s%s","http://www.ripe.net/perl/"
"whois?form_type=simple&full_query"
"_string=&searchtext=",domain);
}
else if (strcmp(host,"whois.arin.net")==0)
{
sprintf(msg,"%s%s","http://ws.arin.net/"
"cgi-bin/whois.pl?queryinput=",domain);
}
else if (strcmp(host,"whois.apnic.net")==0)
{
sprintf(msg,"%s%s","http://www.apnic.net/"
"apnic-bin/whois.pl?searchtext=",domain);
}
else
{
sprintf(msg,"http://%s",host);
}
if (!httpProxy.Request(HTTP_GET,msg,NULL,bUseAuthorization))
return;
For the Whois portion, the process is merely a matter of getting a connection to the server, and then sending the domain name / IP address to look up. If connected through a HTTP proxy, then you have to translate the Whois query into an HTTP query.
Whois directly to the server looks like:
SOCKET socket_descriptor;
struct hostent *he;
struct sockaddr_in host_address;
if( !( he = gethostbyname( host ) ) ){
fprintf( stderr, "Could not resolve host" );
exit( 0 );
}
if( ( socket_descriptor = socket( AF_INET,
SOCK_STREAM, 0 ) ) == -1 ){
fprintf( stderr, "Could not "
"establish socket connection." );
exit( 0 );
}
host_address.sin_family = AF_INET;
host_address.sin_port = htons( port );
host_address.sin_addr = *( (struct in_addr *)he->h_addr );
memset(& (host_address.sin_zero ), '\0', 8 );
if( connect( socket_descriptor,
(struct sockaddr *)&host_address,
sizeof( struct sockaddr ) ) == -1 ){
fprintf( stderr, "Could not connect to host." );
exit( 0 );
}
char msg[80];
sprintf( msg, "%s\n", domain );
send( socket_descriptor, msg, (int)strlen( msg ), 0 );
That's all there is to it!
Note: The Whois code to do a Whois without a proxy server was copied from someone else (though I lost the reference). The code to do Whois through proxies is entirely mine.
Programming using MFC and ATL for almost 12 years now. Currently studying Operating System implementation as well as Image processing. Previously worked on DSP and the use of FFT for audio application. Programmed using ADO, ODBC, ATL, COM, MFC for shell interfacing, databasing tasks, Internet items, and customization programs.