The Complete Works of William Shakespeare make an outstanding dataset for projects like this, which looked at how often various “couples” in Shakespeare spend talking to each other.
There’s a number of reasons, of course, why the actual “results” are only somewhat interesting. The amount of lines exchanged between two characters is not really an indicator of their compatibility or the strength of their relationship, as is demonstrated by the finding that Romeo and Juliet don’t spend all that much time together. You could alter your hypothesis, for example, and maybe look at the average number of lines per scene? Obviously characters that only have 3 scenes together are going to have less lines than those that have 5 or more.
I’m also disappointed that they didn’t do every play. Why, in such a finite dataset as this, don’t you do a complete analysis? Where is Much Ado About Nothing? I’d like to see them release the source code. It could be fun to play with.
The project also reminds me of the Bechdel Movie Test, which measures how frequently women communicate with each other about a subject other than men. How cool would it be for scriptwriters to upload their draft into a test like this to see how they do?