We have been using WebDriver (/Selenium) for doing functional testing of web applications. I have personally been involved in using WebDriver on .NET to automate testing of several .NET web applications.Â But in my spare time, I’ve discovered another use for WebDriver, which is automating interactions with websites.
At its most basic, this is just glorified scraping. But I’ve been discovering that in the years since I last tried this sort of thing, getting programs to interact with human-orientated websites is much easier. Partly, this is due to WebDriver’s high-level library for matching on elements in a variety of ways. But it’s also because the web is genuinely a bit more semantic than it used to be. These days, if you are interacting with a well-written site, then semantic css classes abound that make it very easy to pull out the information you need.
The two uses I’ve been playing around with are interacting with a phpBB driven forum – it can read posts and respond to simple commands – and exploring a university website to download lecture slides automatically for courses I’m interested in. This second use brought me a problem that there don’t seem to be a wealth of good answers to on the web, so I thought I would post about it here.
The problem is that of downloading files with WebDriver. The standard answer you get on StackOverflow and similar is that WebDriver can’t download files (there is no standard for interacting with the browser’s save dialogs). You find suggestions that you set up your browser to have a default action for certain file types that you are interested in, so that the save dialog can be by-passed. This isn’t very satisfactory – especially as the lecture slide files I was interested in didn’t have useful names, so I wanted to be able to use Â names I had picked up from the other elements on the page.
The other solution is: don’t use WebDriver. Just fire up some tool like wget, or use a library in your language of choice. The problem with this is that the university files are password protected – you need to be logged in. It feels like I will be duplicating a lot of work if I log in under WebDriver, navigate the site, and then have to repeat a bunch of it to download the file.
However, it’s actually very simple. You can pull the cookie state that WebDriver is using out easily, and then just pass that along with your request for the file. This means that you are using the same session as WebDriver, and that everything Just Works.
Here’s a code snippet for how I did this in .NET: