URL Process Examples

URL Processes

This section details a number of time saving workflows regarding internet based research. These came out of a current research

Checking Lots of Links

Recently we've been doing a lot of work with link verification. We need to know if links are alive or dead. Let's think about the straightforward way of doing this:

  1. Have a spreadsheet of links
  2. Copy a link
  3. Paste into browser
  4. See if it works
  5. Mark accordingly on sheet
  6. Repeat

This will get you where you need to be, but in a mind numbing fashion. Let's allow technology and problem solving to do this for us.

If You Just Need to Know They Work

Assuming you only need to know whether or not a webpage is live, you can use an online tool called "URLitor". This site allows you to paste in a list of URLs and it will return you a list (or spreadsheet) detailing which worked.

Using this you can move through the list much more quickly, without the need for repetitive actions.

If You Need to Actually Visit Lots of Pages

This also can be automated to a certain extent. It would be best to run the URL validator first as described above, so you can create two categories.

What we want to do is open up lots of links—say 10-20—at once. This way you can just move from one to the next and do whatever you need to do. Any one of these online tools will do the job:

Doing Lots of the Same Search

Sometimes you may find yourself needing to do lots of very similar searches. Say you need to find the website for each county in a state. Of course you should start by searching to find an existing list, but if that's not available, then you're looking at typing the search query, "BLANK county ca county website" into Google about 54 times.

If you're facing doing the same thing over an over with just one or two things changing, then you're looking at a prime candidate for automation.

A Word on Search Engines

Search engines can be utilized without typing anything into their search box. Search engines use what's called a "query string". Query strings are a way of using the URL of your browser to pass data to the search engine's software. For example, if you go to Google and search for "jackalopes", you'll get a results page full of info on the noble jackalope. Take a look at the address bar and you'll see something starting with,

https://www.google.com/search?q=jackalopes

This is the query string. Note that it comprises two parts, https://www.google.com/search?q= and jackalopes which is our actual query term. For fun, try replacing jackalopes with something else like rexfidobuster and see what happens.

Knowing this, we can understand each of our dozens of searches as just a URL, and as we've established, we can open lots of those at once.

Preparing Your Query Strings

As with any data job, you need to get everything nice and neat. We need to do a few things

  1. Obtain a list of counties in California
  2. Paste them into Excel
  3. Create a search string column

Obtain List

I just did a search for list of CA counties and found something that I was able to paste directly into a spreadsheet

Create The Query String

To create the query string, we're going to make an Excel formula that will format the query to work in a query string and then append that to the search URL.

We're assuming you have a reasonable understanding of how Excel formulas work so we'll just give it to you. Assuming that your first county is on A2, the formula to create the query string would be,

=CONCATENATE("https://google.com/search?q=ca+",SUBSTITUTE(A2," ","+"))

Which gives you the string,

https://google.com/search?q=ca+Los+Angeles+County

The substitution is necessary to replace spaces with +'s since spaces are illegal characters.

Make It Better

This is still inefficient. You would have to click on the search result you want which is probably the first one. Probably the first one...Hey why not just do an "I'm Feeling Lucky" search? Remember I'm Feeling Lucky? It opens the first result automatically. Doing an IFL search is not much different. Just use this formula instead:

=CONCATENATE("https://google.com/search?btnI=&q=ca+",SUBSTITUTE(A2," ","+"))

Note that the only thing that's changed is that ?q= is now ?btnI=&q=.

Search On!

Go ahead and fill the formula down and you have all the URLs you need. Use one of the aforementioned link openers and start copying.