Maximum Weight Bipartite Matching implementation in Java

It’s been too long, but I’ve finally contributed to an open source project again. A number of years ago I implemented a polynomial-time algorithm for find the maximum weight matching of a bipartite graph, based on the Hungarian algorithm.

Well, the code wasn’t being used anymore, and my employer was kind enough to allow me to open source it. I ended up adding it to the jgrapht project. Feels good to be contributing to open source again. The jgrapht project is a very capable theoretical graph library, with a lot of handy algorithms, including some for finding spanning trees and shortest paths.

The jgrapht project is on github now here.

Unit testing – Build up vs tear down approaches

Suppose we had to unit test the following method:

public boolean result() {
   return a() && b() && c();
}

There are a number of different approaches that could be used to test this method. In this case, there are 2^3 = 8 possible test cases, as each of a(),b(),c() could return either true or false:

a | b | c | result
------------------
f   f   f   f
f   f   t   f
f   t   f   f
f   t   t   f
t   f   f   f
t   f   t   f
t   t   f   f
t   t   t   t

For a method like this, it’s not too much work to write a test case for all 8 possibilities. In real life though, we simply can’t test everything. We need to focus our testing to get the best coverage and catch the most likely errors.

Suppose we can’t test all 8 test cases. How should we test the method? What test cases should we focus on? One approach I often see is what I call a “build up” approach:

@Test
public test_result() {
        //assume a,b,c default to false
	assertFalse(result());
	setA(true);
	assertFalse(result());
	setB(true);
	assertFalse(result());
	setC(true);
	assertTrue(result());
}

I classify this as a “build up” approach because the test case builds the required conditions, one by one, step by step, and tests the method during the build up until the conditions are all met.

The problem is, I don’t think this is great test coverage. To see why, think about the whole purpose of the method: return true when a(),b(),and c() are all true, and false otherwise. With the tests above, how much confidence should a tester have that all three conditions a(),b(),and c() are required for result() to return true? All we really know is that setting c() to true caused result() to return true. We don’t have any insight if c() by itself was the sufficient condition, or if all of a(),b(),and c() were.

a | b | c | result
------------------
f   f   f   f *
f   f   t   f
f   t   f   f
f   t   t   f
t   f   f   f *
t   f   t   f
t   t   f   f *
t   t   t   t *

* = test cases covered

See how column c and the result column are identical for the test cases we covered? The method could be re-written as the following and still pass all tests:

public boolean result() {
   //this still passes the 4 test cases above
   return c();
}

We want to avoid the direct correlation between the result and one of the conditions.

My preferred approach is the “tear down” method. Instead of building up the conditions one by one, I instead build up all the conditions at the beginning, and then tear down each condition individually, and ensure I get the desired result:

@Test
public void test_result() {
	setA(true);
	setB(true);
	setC(true);
	assertTrue(result());

	//now take each condition away individually and test
	setA(false);
	assertFalse(result());
	
	setA(true);
	setB(false);
	assertFalse(result());

	setB(true);
	setC(false);
	assertFalse(result());
}

You can see with this approach, we still have the same number of tests (4), but we cover the arguably more important test cases:

a | b | c | result
------------------
f   f   f   f
f   f   t   f
f   t   f   f
f   t   t   f *
t   f   f   f
t   f   t   f *
t   t   f   f *
t   t   t   t *

* = test cases covered

There is now no direct correlation between and single condition (either a(), b() or c()) and the result. It would be much more difficult to re-write the result() method and have it pass the four test cases above. This is what we want; a unit test that documents a method’s intentions, and makes it difficult to make a mistake during any future changes, refactoring, or re-implementation.

A robust JavaScript regular expression for email addresses

When developing a full-scale web application, generally input validation is done with a combination of client-side and server-side code, maybe using a validation framework. For example, mailcheck.js looks pretty cool, as it can help validate and suggest common email typos by trying to verify if the domain exists.

Sometimes though, you just need a quick and easy way to validate input (in particular email addresses) client-side. I found a pretty robust regular expression at regular-expressions.info. It will match 99.99% (good enough for me) of email addresses out there. They’ve got a detailed explanation of why it’s so difficult to match email addresses if you’re interested. I used their regular expression and made a few tweaks to get it working in Javascript in-browser. Try it out with any email address you like. The code is below.

The code to do this:

<label for="emailTextBox">Email:</label>
<input type="text" style="width:40%;" id="emailTextBox"></input>
<input type="submit" value="Validate Email" id="emailInput" onclick="validate();"/>
<strong><p id="result"></p></strong>
<script>
function validate() {
   var email = document.getElementById("emailTextBox").value;
   var matched = email.match(/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/g);
   if (matched == null)
     document.getElementById("result").innerHTML="Not a valid email address!";
   else
     document.getElementById("result").innerHTML="It's valid: " + matched;

   return true;
}
</script>

Questions and comments always welcome.

Adding Facebook Login to Google Docs Forms – Part 4

The final step in the process is to use the Facebook Javascript API to make calls to retrieve some information from the user’s public profile. Their API is documented here so I’ll just give a small example.

From Part 3 in the series, we’ll add a little Javascript to our statusChangeCallback function, which was called every time the user logs in or out.

In the block that handles successful login, we’ll show a popup with some of the user’s information. Here’s the code snippet:

    if (response.status === 'connected') {
      // Logged into your app and Facebook.

      //...[snip]

      FB.api('/me?fields=first_name,gender,age_range,timezone,locale,link',
             function(response) {
               alert('FYI Here is some info about you: ' + 
                     'Gender:' + response.gender +
                     ', Age Range: ' + response.age_range.min + ' - ' + 
                     response.age_range.max + 
                     ', Timezone: ' + response.timezone + 
                     ', Locale: ' + response.locale +
                    ', Your profile: ' + response.link);
             });

Depending on the user’s privacy settings, some of their information will be unavailable through the API.
You can try out the completed form and view all the final source code at ahokas.ca/form.

This pretty much wraps up the series on adding Facebook login to a Google form. The summary then:

  • You cannot modify the Google form to add a Facebook login widget directly to the form
  • The approach is to use an iFrame to wrap the Google form in another site
  • The site contains a bit of Javascript / HTML that allows the user to login, then displays the Google form to the user

As always, feel free to comment below if you have any questions.

Adding Facebook Login to Google Docs Forms – Part 3

Next up is actually adding the Facebook Login widget to the page. The code for this is fairly lengthy, so I’ll highlight the steps and basic strategy (if you just want to try it out, check out ahokas.ca/form)

  1. Register for a developer account at developers.facebook.com
  2. Create a Facebook app and get a unique app id for it
  3. Update your html file to look like the following, using your app id on the line “appId : ‘XXXXXXXXX’, ”
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Google form embedded</title>
    <script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
  <style>
    body {
      height: 1000px;
      padding: 0;
      margin: 0;
    }

    #login_header {
      background: #3EA100;
      color: white;
      width: 800px;
      font-size: 25px;
      line-height: 1.6em;
      font-weight: bold;
      text-align: center;
      padding: 10px;
      margin: 0 auto;
      border-radius: 25px;
      height: 100px;
    }
    </style>

  </head>
  <body>

<script>
  window.fbAsyncInit = function() {
    FB.init({
      appId      : 'XXXXXXXXXXXXXX',
      xfbml      : true,
      version    : 'v2.2'
    });
    FB.getLoginStatus(function(response) {
      statusChangeCallback(response);
    });
  };

  (function(d, s, id){
     var js, fjs = d.getElementsByTagName(s)[0];
     if (d.getElementById(id)) {return;}
     js = d.createElement(s); js.id = id;
     js.src = "//connect.facebook.net/en_US/sdk.js";
     fjs.parentNode.insertBefore(js, fjs);
   }(document, 'script', 'facebook-jssdk'));


  // This is called with the results from from FB.getLoginStatus().
  function statusChangeCallback(response) {
    if (response.status === 'connected') {
      // Logged into your app and Facebook.
      FB.api('/me?fields=first_name,gender,age_range,timezone,locale,link', function(response) {
        console.log('Successful login for: ' + response.name);
        document.getElementById('status').innerHTML =
        'Thanks for logging in, ' + response.first_name + '. Please complete our survey.';

        $("#survey").show();
      });
    } else if (response.status === 'not_authorized') {
      // The person is logged into Facebook, but not your app.
      document.getElementById('status').innerHTML = 'Please log ' +
        'into this app.';
      $("#survey").hide();
    } else {
      // The person is not logged into Facebook, so we're not sure if
      // they are logged into this app or not.
      document.getElementById('status').innerHTML = 'Please log ' +
        'into Facebook to complete our survey.';
      $("#survey").hide();
    }
  }

  // This function is called when someone finishes with the Login
  // Button
  function checkLoginState() {
    FB.getLoginStatus(function(response) {
      statusChangeCallback(response);
    });
  }


</script>
    <div id="login_header">
      <div class="fb-login-button" onlogin="checkLoginState();" scope="public_profile,email" data-max-rows="1" data-size="medium" data-show-faces="false" data-auto-logout-link="true"></div>
      <div id="status">
      </div>
    </div>
    <p>
    <div id="survey" style="display: none">
      <iframe style="margin: 10px 20%" src="https://docs.google.com/forms/d/1QK5j8aPC_PvoB0j2Nubkw-agWZ2QJKXbw6QUR_OOtCs/viewform?embedded=true" width="60%" height="600" frameborder="1" marginheight="0" marginwidth="0">Loading...</iframe>
    </div>
  </body>
</html>

The basic workflow is as follows:

  1. The Google Form starts in a hidden div so the end user can’t see it
  2. After successful Facebook login, the Javascript callback method displays the hidden div (and therefore the Google Form) using jQuery
  3. If the user logs out, the div and Google Form are hidden again

Want to see it in action? Try it out at ahokas.ca/form

Next up will be retrieving and storing information from the user’s public Facebook profile.

Adding Facebook Login to Google Docs Forms – Part 2

In my last article, I gave an outline of how we can enhance a Google form with a Facebook login. The steps were as follows:

  1. Create a Google form as usual
  2. Create a plain, empty html file on a web server.
  3. Embed the Google form into the empty html file (as an iframe)
  4. Within the html file, add a Facebook login widget via the Facebook Javascript SDK

The first three steps are pretty easy so we’ll tackle those all at once. I created a sample Google form and made it publicly available:

graeme’s cool form

Next, we need a plain html file:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Graeme's cool form embedded</title>
  </head>
  <body>
  </body>
</html>
</code>

Open up your favourite text editor, and paste the above into it, and save it to your computer as a .html file.

Now we just embed the Google form into the HTML file above. Google actually makes it really easy to do this. When editing the Google form, click File, Embed, and copy the URL that Google gives you. It should start with <iframe.

embedform

embedform2

Now just put that URL right after the <body> tag in the html file:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Graeme's cool form embedded</title>
  </head>
  <body>
<iframe src="https://docs.google.com/forms/d/1QK5j8aPC_PvoB0j2Nubkw-agWZ2QJKXbw6QUR_OOtCs/viewform?embedded=true" width="760" height="500" frameborder="0" marginheight="0" marginwidth="0">Loading...</iframe>
  </body>
</html>
</code>

That’s it! The Google form is now embedded in a website. Now, if you have your own web server, you can put that file up and try it out. If you don’t, no worries. Just open it up with a web browser on your local computer to see how it looks.

Depending on the size of your form, you may need to adjust the size of the “window” (iframe) that your form is displayed in. You’ll notice the default width (760) and height (500). Adjust those as you like to make your form fit nicely.

That’s it for now. Next up: adding Facebook login. As always, feel free to contact me.

Adding Facebook Login to Google Docs Forms – Part 1

Google Forms are a great, capable tool. They are quick to create, and are used for a number of purposes such as surveys and polls.

They are limited, however, in how much customization can be done. There are some add ons available for Google Docs, but customizing forms is mostly limited to cosmetic changes.

Recently, a potential client wanted to add a Facebook login to a Google form. The client uses Google Forms for marketing surveys, and in addition to collecting the survey data, the (enhanced) Google Form would also collect data available from the user’s Facebook profile page via the Facebook API. The Facebook login to the survey would allow a deeper insight into the marketer’s audience. Since Google Docs can use Google’s powerful Javascript-like Apps Script, the client had thought it might be possible to use the standard Facebook Javascript SDK and add a login widget to the Google form.

Unfortunately, as it turns out, this is not possible. Google’s Apps Script doesn’t provide a full web-browser equivalent to the developer of the Google Doc. Creating things like custom html, css, and executing arbitrary Javascript just isn’t possible with the Google Docs Apps Script API.

There is a work-around however. It’s a bit of work to set up, but does allow a Facebook login to be attached to a Google form (sort of):

1. Create a Google form as usual
2. Create a plain, empty html file on a web server.
3. Embed the Google form into the empty html file (as an iframe)
4. Within the html file, add a Facebook login widget via the Facebook Javascript SDK

With some styling work the Facebook login can be made to look decently integrated into the Google form.

Once a user logs into Facebook, they complete the form as usual. The html file (via some Javascript) could use the Facebook API to retrieve and then save the demographic information available from the user’s public Facebook profile. This information could be stored locally on the web server, or uploaded to a Google Spreadsheet via Google APIs.

In the coming blog series, I’ll give example code (and build a demo site) showing the above approach in action.

As always, feel free to contact me if you have any questions!

Website scraping with a Google Spreadsheet

In a recent project, a client was doing some market research. They had a (long!) list of website URLs, and needed to get all email addresses displayed anywhere on the page of each of the sites. I created a Google Spreadsheet to accomplish this. The requirements were as follows:

  • the client wanted to be able to add and remove URLs to a spreadsheet over the course of several days
  • as new URLs were added (typed or pasted) to the spreadsheet, the spreadsheet should scrape the email addresses from each site and put them (comma separated) into the corresponding second column of the sheet.
  • if a URL was removed from the spreadsheet, the corresponding email addresses should be removed as well
  • display only unique email addresses (no duplicates) per URL

The requirements were fairly straightforward, but making it responsive and easy to use for an end user was tricky. My approach:

  • Create a trigger to call a custom function that will run anytime the spreadsheet is edited (the on edit event)
  • The function looks at the modified cell (or each cell if a range of cells were modified, for example if someone pasted 10 rows into the spreadsheet) and perform an HTTP GET of each of the URLs
  • The function will then parse the HTML returned looking for email addresses, remove duplicates, and populate the corresponding cell in the spreadsheet with a comma separated list

It turns out parsing out email addressees from a string of characters isn’t completely straight forward either. I used a regex for email listed at http://www.regular-expressions.info/email.html, which will match 99.99% of all email addresses in existence today (good enough!).

I created the following two functions. The first gets called via the “on edit” event (anytime the spreadsheet is modified). It goes through the modified cells, and calls the second function, which performs the HTTP GET and parses all email addresses out of the response.

//triggered via the "on edit" event
function myOnEdit(e){

  var range = e.range;
  var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
  
  // only update if first column is edited (but not header row)
  if (range.getColumn() > 1 || range.getRow() == 1)
    return 0;

  //loop through each of the updated cells
  for (var i = 1; i <= range.getNumRows(); i++) {
    var updatedCell = range.getCell(i,1);

    // get cell to update (beside the edited one)
    var cellToUpdate = sheet.getRange(updatedCell.getRow(), 2);
  
    if (updatedCell.getValue().valueOf() == "") {
      //cell was emptied out.. so delete the value
      cellToUpdate.setValue(null);
    } 
    else {
      //set value by scraping emails
      cellToUpdate.setValue(scrapeEmails(updatedCell.getValue()));
    }
  }
}

//SCRAPE URL for EMAILS
function scrapeEmails(url) {

   var response = UrlFetchApp.fetch(url).getContentText();

   // from http://www.regular-expressions.info/email.html
   // matches 99.99% of all email address formats used today

   var matched = response.match(/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/g);
   
   if (matched == null)
     return null;
   
   //remove duplicate emails from the list
   var unique = matched.reduce(function(a,b){if(a.indexOf(b)<0)a.push(b);return a;},[]);
   
   return unique.join();
}

Here's a screenshot of the spreadsheet in use. Any URLs entered into column A automatically get scraped and column B populated.

scrapemails

Any questions about the above or if you need help getting it working for yourself, let me know! I'd be happy to help.

A simple shared TODO list with daily email summaries in Google Apps

Last time I gave an example of email notification workflows using Google Apps. This time we’ll extend the email sending capability of Google Apps with another real-life example: a shared Google docs TODO list.

Suppose you and your team use a simple shared Google spreadsheet to help manage tasks that need to be performed:

todolist

Team members add and remove tasks to the shared list throughout the day as they complete their work. At the beginning of each day, it would be nice to have a summary email in everyone’s inbox of what they need to work on each day. This would save each person the time of looking through the list of tasks for items that belong to them. Luckily, this is pretty easy to create with a little Google Apps Scripting.

function sendTodoReminderEmails() {

 var ss = SpreadsheetApp.getActiveSpreadsheet();
 var sheet = ss.getSheets()[0];

 // get all the data in spreadsheet
 var range = sheet.getDataRange();
 var values = range.getValues();

 var todo = {};
 
 // Column A is task, column B is person's email, assumes header row in table
 for (var row = 1; row < values.length; row++) {
   var task = values[row][0];
   var email = values[row][1];
   
   if (!todo.hasOwnProperty(email)) todo[email] = [];
   todo[email].push(task);
   
 }
 
 //send out summary emails
 for (var email in todo) {
   var emailBody = 'Hey! Here are your current tasks:\n\n';
   for (var task in todo[email]) {
     emailBody += todo[email][task] + "\n"
   }
   MailApp.sendEmail(email, 'Your daily todo list', emailBody);
 }

}


Add the above code by clicking "Tools", "Script Editor" in your Google Spreadsheet. Then, click "Resources", "Current Project's Triggers", to set up this function to run daily at 6am:

dailyemailtodotrigger

Now, every day in your inbox you'll get emails like this:

todoemailinbox

Cool!

How to set up Email notification workflow in Google Apps Spreadsheets

I often use Google Apps Spreadsheets for collaboration on a document with others. I created a simple custom function that will email me any time someone opens (or alternatively, edits) the spreadsheet, so I know when I should go in to check on changes made by others. My function emails me the name of the spreadsheet, the email address of the user (when possible), and a convenient link to spreadsheet as well. Here’s how:

1. From your Google Spreadsheet, click Tools, script editor.

2. Add the following code, changing the email address (YOUREMAIL@gmail.com) to the address that will receive the notifications:

function sendEmail() {

     //------------------------------------------------------------
     // set the email address you'd like to be notified at below
     var emailAddress = 'YOUREMAIL@gmail.com';
     //------------------------------------------------------------
     
     var editingUser = Session.getActiveUser().getEmail();

     var emailSubject = 'Hey! ('+editingUser+') edited the spreadsheet ' + 
                         SpreadsheetApp.getActiveSpreadsheet().getName();

     var emailBody = 'Link: ' + SpreadsheetApp.getActiveSpreadsheet().getUrl();

     MailApp.sendEmail(emailAddress, emailSubject, emailBody);
}

3. Click Save and give the script a name.
collab_email
4. Test the email function by clicking Run, sendEmail. It will ask you to authorize this function, as it will be sending emails as you. Click Continue, then Accept.
collab_auth_required

collab_requestforpermission
5. Now you need to create a trigger to bind the function to an event (for example, someone opening the spreadsheet). Click Resources, current project triggers.

6. Set up a new trigger. The “Event” should be “From Spreadsheet”, “On Open”. If you like, you can set up an email notification if the trigger fails. Since we tested the trigger function above, this isn’t really necessary.
set up trigger
7. Done! Now close down the spreadsheet, open it up again, and you should receive an email.

Enjoy!

PS: there are a couple caveats with this approach:

1. Google Apps only gives you access to the user who opens the Spreadsheet if you are a Google Apps for Business user and the user is in the same organization as you. If you give access to the spreadsheet to other gmail / Google users or anonymous users, for security purposes it will not be able to tell you who is opening or editing the document.

2. The method above uses installed triggers. Google Apps also has simple triggers, but these are not allowed to do things that require authorization, like send emails, and are therefore not suitable.