Most personal computers have a dynamic IP address and firewall protection. Therefore, instant data exchange between them (particularly screen sharing) requires some mediator with a permanent IP address accessible for outbound requests from both parties and capable to transfer data between the parties. Cloud computing solutions provide such a mediator, e.g., Microsoft Azure AppFabric Service Bus. Other options are also available. Skype provides a ready-made infrastructure not only for data exchange but also for streaming of remote machine screen images free of charge. However, currently this is passive screen sharing; i.e., one Skype user can see the screen of another Skype user but cannot control the remote machine.
This article presents a way to automate Skype to achieve active screen sharing, allowing one Skype user to control the machine of another Skype user with Skype built-in screen sharing.
Skype itself does not provide too many options for its automation. Some program API was announced in the Skype site , but is not yet available (at least for ordinary users). The only Skype API I found was the Skype4COM.dll in-process COM object. Skype4COM permits operations like management of Skype user accounts, calls, etc. But most Skype settings are not addressed, and screen sharing is left completely out of its scope. Clearly, other automation techniques should be combined with Skype4COM to achieve active screen sharing.
Skype automation in this article includes the following techniques working together:
- usage of Skype4COM.dll provided by the Skype company,
- control of Skype windows from outside of the process with Windows messages,
- activation of Skype application menu commands (to utilize this technique, the Skype setting Tools->Options...->General settings->Visual style of the window should be set to Classic Windows),
- emulation of user actions like mouse clicks and movements, and writing text to the appropriate Windows controls, and finally
- injection of foreign code into a running Skype process.
The main idea of the project is to employ Skype communication infrastructure for all data exchange between the sides. The screen image of the target machine (the machine whose screen is exposed) is transferred to a remote machine as a video. On the remote machine, the Skype window containing the image is subclassed by the code injected into the Skype process. The subclassing window procedure senses mouse movements and clicks, text input, etc., and generates commands for the target machine to actually implement these actions there. The commands constitute short text messages containing the name of the action ("function") and the relevant parameters ("arguments"). For example, the command to perform mouse left button click provides coordinates of the click point as parameters. Commands are transferred between the machines as Skype text messages.
How Does It Work?
In our scenario, in order to get technical support, a Skype user shares the screen of his/her machine with another Skype user referred below to as adviser. Appropriate applications run on the adviser and user machines, namely AdviserSkypeDriver.exe and UserSkypeDriver.exe (these applications have nothing to do with kernel drivers). SkypeDrivers share several components like
WindowFinderNET to find a window by its window class,
HumanActionSimulation to reproduce mouse clicks and movements and text typing, and
SkypeAutoHelper to actually control Skype by means of
Both SkypeDrivers have a "Start / Attach to Skype" button. By pressing this button, the adviser and user either start or attach the driver to an already running instance of Skype and set Skype to View->Compact View mode (this is done to simplify further actions). If the driver is attached to Skype for the first time, then a warning dialog box appears requesting for permission. The driver automatically presses the "Allow access" button. Upon success of the attach operation, the drivers enable their buttons "Close Skype".
Note: Currently, the above operations are partly based on timeouts; so they take some time (and may even fail, alas), so please be patient.
Now the adviser and the user should type a string of the Skype handle of the other party to the only text box in their respective SkypeDriver forms. At this point, the user ceases his/her activity, and the adviser takes control over the user machine.
When the adviser types the Skype handle of the available user button "Call", the
AdviserSkypeDriver main form gets enabled. The adviser presses the button "Call", starting a call to the user. On the user side,
UserSkypeDriver accepts the call from adviser (with help from its Skype4COM component API) and starts screen sharing, automatically performing two successive mouse left button clicks on the "Share" and "Show entire screen" owner drawn buttons within the Skype main window (Fig. 2).
To ensure accurate hits, the target window is first made the top-most one, set to a predefined fixed size, and moved to the upper-left corner of the screen. This ensures that each object in the target window has a fixed offset from the original top-left point of the screen and may be easily hit by the mouse.
When the user's Skype is set to screen sharing mode, the adviser places image of the user screen to a separate window by pressing "Pop-out" and resizes this image to its actual size by pressing "Actual size" (Fig.3; in future, the above two operations can be automated with the emulation of mouse clicks similar to what was done on the user side).
And finally, the adviser presses the button "Automate" on the main form of the AdviserSkypeDriver application to inject SkypePlugin.dll into the Skype process (the injection technique is discussed in detail in [2, 3]).
SkypePlugin subclasses outer (frame, of
WNDCLASSTLiveConversationWindow) and inner (view) windows containing the image of the remote screen. New subclassing window procedures handle Windows messages relating to mouse clicks and movements, keys pressing, characters insertion, etc. The subclassing window procedure for the outer window also creates a managed object of the
SkypeHandlerNET component wrapped in a COM Callable Wrapper (CCW). SkypeHandlerNET.dll is a managed assembly, and its objects should be wrapped in CCW to be viewed as COM by the unmanaged
SkypeHandlerNET component is responsible for communication with the AdviserSkypeDriver application. WCF with fast Named Pipe (net.pipe) binding is used for this communication. The WCF service is hosted by the AdviserSkypeDriver application whereas SkypeHandlerNET acts as a client (the usage of a managed object inside injected code is discussed in ).
The automated Skype window with the screen image is enlarged, and its caption is extended with the (Automated) suffix. Now, the actions of the adviser such as mouse left and right clicks (double click is not implemented yet) and text typing in the automated window will be transmitted to the user machine with a Skype text message and actually performed there by
UserSkypeDriver. To switch the mouse to control the automated window itself (e.g., move image inside the window, or press its owner drawn buttons), the mouse action should be combined with the Shift button pressed on the keyboard of the adviser machine.
Note: Coordinates of a point on the image are properly translated to screen coordinates on the target machine only when one corner of the image coincides with the appropriate corner of the automated window (in reality, these are either the top-left or the bottom-left corners).
Text messages used to transfer commands have the following format:
command(param0, param1, ..., paramN-1)integer
command indicates the name of the action to be performed on the remote side,
(param0, param1, ..., paramN-1) constitutes the action parameters placed within parenthesis and separated by commas, and
integer (between 1 and 100) is used to distinguish between successive messages (sometimes, the Skype4COM component erroneously generates several notifications on receiving the same message). For example, the message lck(759,498)49 instructs the receiving party to perform a mouse left click at the point with screen coordinates (759,498), the message mv(752,170,777,70)33 causes the receiving party to simulate the following sequence of actions: "press mouse left button at (752,170) - move mouse to (777,70) - release mouse left button"; the message s(a, b, c)71 leads to typing a string between parenthesis ("a, b, c" in this case, commas will be considered as part of the string and not as delimiters) at the current caret location on the receiving side.
To Run the Demo
The demo for the article consists of two folders: Adviser and User. Skype4COM.dll should be taken from the ..\Common folder of the source project and added to both the Adviser and User folders before any other actions (this DLL was removed from the demo to reduce the size of the upload). On the adviser machine, the contents of the Adviser folder should be placed in the same folder with Skype.exe (in the default installation of Skype, this is [Disk Letter]:\Program Files\Skype\Phone folder) and run the COMRegistration.cmd command (batch) file once after installation. After that, AdviserSkypeDrive.exe is ready to work. On the user machine, the contents of the User folder should be placed in a separate folder, and UserSkypeDrive.exe is ready to work without additional preparations (UserSkypeDrive.exe itself registers Skype4COM.dll as in-process COM). The demo is targeted at 32-bit platforms.
Discussion and Further Development
This article and code is just a concept proof for remote machine control with Skype screen sharing. Apart from bugs fixing (surely there are plenty of them :) ), the code may be enhanced in many ways. Some of them are listed below:
- Additional features may be added (e.g., mouse double click).
- Performance may be improved for text transmission. Currently, the adviser's commands to the user in the form of text messages are generated by the code injected into Skype. Then these commands are transferred with WCF to AdviserSkypeDriver and from there, they are sent back to the Skype process by activation of the Skype4COM API. Probably, there is a way to send text message to the user directly from the injected code.
- Investigation of the Skype streaming infrastructure and its usage for transfer of proprietary data may bring some fruits.
- Different kinds of logs and reports may be generated and sent to a third Skype account (different from the adviser and user), for example, for billing purposes in case of commercial use.
- Complex commands combining simple commands (like for example, several successive mouse clicks) may be added to achieve a specific goal (e.g., open a particular log).
This article suggests a way to control a remote machine with automated Skype. The standard and free of charge streaming infrastructure of Skype is used for screen sharing, and Skype instant text messages are employed as a media for commands transfer. Dedicated applications for user and adviser using various automation techniques are presented. This approach may be useful for customer support combining active screen sharing with audio conversation between the user and the adviser.
I would like to express my gratitude to telemagic for a very useful discussion and suggestions on the topic of this article.
- Introducing the SkypeKit Beta Program.
- Jeffrey Richter, Christophe Nasarre. Windows via C/C++. Fifth edition. Microsoft Press, 2008.
- Igor Ladnik. Automating Windows Applications. CodeProject.
- Igor Ladnik. Automating Windows Applications Using the WCF Equipped Injected Component. CodeProject.