Saturday, September 17, 2016

Emulating user interaction with browsers (click-bots)

There are a number of ways this can be done and they vary in complexity and effectiveness significantly.

  1. You can control/drive any browser via a browser extension registered online (e.g. chrome store) and installed the usual way.
    • Problem 1: extension needs to be approved and signed by the browser owners
    • Problem 2: you need an extension for each browser you want to control
  2. Chrome specifically can be controlled/driven via DevTools Remoting which is also how Selenium does it with its ChromeDriver. Ref: Chrome Debugger Protocol
    • Problem: Only works with Chrome
  3. Set the browser into extension development mode by programmatically manipulating its configuration files and add an unsigned extension for controlling/driving the browser either by manipulating the browser configuration files and/or Windows registry or by scripting a hidden drag-and-drop operation of the extension file onto the browser window while it displays the appropriate extension installation page.
    • Problem: Although very doable implementation, especially for both the two major browsers out there (Chrome and FF) is on the hard side.
  4. Drive/control the browser, or any other Window really, via the Win32 SendInput() function. For this to work you need to defeat the Operating System's protection against click-bots which restricts window focus giving to non-programmatic GUI interactions (i.e. actions directly performed by a real user like clicking on a real mouse device). There is a diminishing repertoire of ways to do that but there are still ways to do it even on Windows 10 (e.g. DLL injection and dialog launch from within the hijacked process). Interactions can be accomplished either via keystroke sequences sent (with the correct timing) to the browser Window instance or via relative/absolute coordinates clicks (hard to claculate the exact spot given different window/screen sizes but still doable). Can work in all cases and for all browsers as it does not depend on other software or auxiliary APIs that may or may not be available.
    • Problem. Hardest of all four to do especially because of the window focus protection defeating requirement but also because of the complications related to correctly calculating coordinates where clicks (and not keystrokes) are absolutely needed.