This section details a number of time saving workflows regarding internet based research. These came out of a current research
Recently we've been doing a lot of work with link verification. We need to know if links are alive or dead. Let's think about the straightforward way of doing this:
This will get you where you need to be, but in a mind numbing fashion. Let's allow technology and problem solving to do this for us.
Assuming you only need to know whether or not a webpage is live, you can use an online tool called "URLitor". This site allows you to paste in a list of URLs and it will return you a list (or spreadsheet) detailing which worked.
Using this you can move through the list much more quickly, without the need for repetitive actions.
This also can be automated to a certain extent. It would be best to run the URL validator first as described above, so you can create two categories.
What we want to do is open up lots of links—say 10-20—at once. This way you can just move from one to the next and do whatever you need to do. Any one of these online tools will do the job:
Sometimes you may find yourself needing to do lots of very similar searches. Say you need to find the website for each county in a state. Of course you should start by searching to find an existing list, but if that's not available, then you're looking at typing the search query, "BLANK county ca county website" into Google about 54 times.
If you're facing doing the same thing over an over with just one or two things changing, then you're looking at a prime candidate for automation.
Search engines can be utilized without typing anything into their search box. Search engines use what's called a "query string". Query strings are a way of using the URL of your browser to pass data to the search engine's software. For example, if you go to Google and search for "jackalopes", you'll get a results page full of info on the noble jackalope. Take a look at the address bar and you'll see something starting with,
https://www.google.com/search?q=jackalopes
This is the query string. Note that it comprises two parts, https://www.google.com/search?q=
and jackalopes
which is our actual query term. For fun, try replacing jackalopes
with something else like rexfidobuster
and see what happens.
Knowing this, we can understand each of our dozens of searches as just a URL, and as we've established, we can open lots of those at once.
As with any data job, you need to get everything nice and neat. We need to do a few things
I just did a search for list of CA counties
and found something that I was able to paste directly into a spreadsheet
To create the query string, we're going to make an Excel formula that will format the query to work in a query string and then append that to the search URL.
We're assuming you have a reasonable understanding of how Excel formulas work so we'll just give it to you. Assuming that your first county is on A2, the formula to create the query string would be,
=CONCATENATE("https://google.com/search?q=ca+",SUBSTITUTE(A2," ","+"))
Which gives you the string,
https://google.com/search?q=ca+Los+Angeles+County
The substitution is necessary to replace spaces with +
's since spaces are illegal characters.
This is still inefficient. You would have to click on the search result you want which is probably the first one. Probably the first one...Hey why not just do an "I'm Feeling Lucky" search? Remember I'm Feeling Lucky? It opens the first result automatically. Doing an IFL search is not much different. Just use this formula instead:
=CONCATENATE("https://google.com/search?btnI=&q=ca+",SUBSTITUTE(A2," ","+"))
Note that the only thing that's changed is that ?q=
is now ?btnI=&q=
.
Go ahead and fill the formula down and you have all the URLs you need. Use one of the aforementioned link openers and start copying.