Reproducible Research Course by Eric C. Anderson for (NOAA/SWFSC)


Managing 100s of pull requests for homeworks (2014-10-17)

OK, I have been working on a little R-package called rrhw that will facilitate writing homework sets within .Rmd files that the students will insert their answers into, inside a submit_answer function that will capture their expression and evaluate it. With this framework I will be able to record everyone’s answers into a data frame and use the same submit_answer() functions to print out the results in a nice web page so that students can quickly see the range of answers that were given, as well as which ones give the right value and which the wrong, etc.

Students will be submitting these things as pull requests on branches that are named for each different homework assignment. So, with 50 pull requests coming in per assignment, I don’t want to handle that on github. It’s gotta be scriptable. I am hoping to be able to query github to quickly assess

  1. which of the students have filed pull requests for the homework set.
  2. what time they issued those pull requests (make sure they are on time, and also to allow them to update their answers and have a corrected version that might be from a different time…that might be a nightmare. It would probably be better to look at the time of the latest commit.)
  3. Check to make sure that they haven’t committed more than the one file that I want them to submit.
  • And, if possible, send them an automatic message telling them to clean up their pull request.
  1. Grab all the branches of a certain name and process them.

So, one possibility is git-pulls. It certainly will let me pull stuff down, but probably won’t let me automatically issue a comment on a student commit if it needs to be revised before I am willing to pull it.

Trying out git-pulls

Let’s get it and start playing with it:

sudo gem install git-pulls

Then:

2014-10-17 11:24 /rep-res-course/--% (master) git pulls update
Updating eriqande/rep-res-course
Checking for forks in need of fetching
  fetching aclemento/rep-res-course
remote: Counting objects: 72, done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 72 (delta 15), reused 64 (delta 7)
Unpacking objects: 100% (72/72), done.
From https://github.com/aclemento/rep-res-course
 * [new branch]      ex-test    -> refs/pr/aclemento/rep-res-course/ex-test
 * [new branch]      master     -> refs/pr/aclemento/rep-res-course/master
  fetching eca-home/rep-res-course
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 2), reused 5 (delta 2)
Unpacking objects: 100% (6/6), done.
From https://github.com/eca-home/rep-res-course
 * [new branch]      master     -> refs/pr/eca-home/rep-res-course/master
 * [new branch]      mock-homework-1 -> refs/pr/eca-home/rep-res-course/mock-homework-1
 * [new branch]      patch-3    -> refs/pr/eca-home/rep-res-course/patch-3
Open Pull Requests for eriqande/rep-res-course
5    10/16 Testing the trial homework.         aclemento:ex-test                          

OK, that seems to grab everything down and put it somewhere local. That is not so good if a student has committed a huge blob or something silly.

But, it does somehow get the stuff local, becuase when you run update again it doesn’t show anything new.

2014-10-17 11:25 /rep-res-course/--% (master) git pulls update
Updating eriqande/rep-res-course
Checking for forks in need of fetching
Open Pull Requests for eriqande/rep-res-course
5    10/16 Testing the trial homework.         aclemento:ex-test                                 

Apparently, it puts all the information into .git/pulls_cache.yml, which doesn’t get committed into your repository. But is there nonetheless. that is cool.

2014-10-17 11:31 /.git/--% (GIT_DIR!) du -h pulls_cache.yml
400K  pulls_cache.yml

Info about the commit in the pull request

I can do this:

2014-10-17 12:13 /rep-res-course/--% (master) git-pulls show 5
Number   : 5
Label    : aclemento:ex-test
Creator  : aclemento
Created  : 2014-10-16 19:54:31 UTC

Title    : Testing the trial homework.



------------

cmd: git diff HEAD...e4370bd874543a888adfa99160921d9852c9d1de
 exercises/trial_homework.rmd | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

So, I could probably do this:

git-pulls show 5 | tail -n 2 | awk '{printf("%s\t", $1);} END {printf("\n")}' 

Note that if the diff has nothing in it then that will screw things up

2014-10-17 12:41 /rep-res-course/--% (master) git-pulls show 2
Number   : 2
Label    : eca-home:patch-3
Creator  : eca-home
Created  : 2014-10-01 16:49:14 UTC

Title    : Silly change

Just making a pull request.

------------

cmd: git diff HEAD...a522320f97f9c5b7e2ed37147389537d5c93a1f3

So, I will have to make it tolerant of that. Although I think that might only occur if I have merged the pull request in and then not changed that part of the file since then.

Checkout out the pull requests

I tried this:

2014-10-17 12:48 /rep-res-course/--% (master) git-pulls checkout
Checking out all open pull requests for eriqande/rep-res-course
> ex-test into pull-ex-test
error: the requested upstream branch 'origin/ex-test' does not exist
hint: 
hint: If you are planning on basing your work on an upstream
hint: branch that already exists at the remote, you may need to
hint: run "git fetch" to retrieve it.
hint: 
hint: If you are planning to push out a new local branch that
hint: will track its remote counterpart, you may want to use
hint: "git push -u" to set the upstream config as you push.

That is not so great.

A Function

It is a few days later, but this seemed a good place to put this. Here is some R code showing a function I wrote that will grab data on open pull requests and put them in a data frame:

> open.ex.test <- rrhw::grab_open_pull_requests("^ex-test$")

Updating eriqande/rep-res-course
Checking for forks in need of fetching
Open Pull Requests for eriqande/rep-res-course
10   10/19 My homework from 19 Oct 2014        abfleishman:ex-test                               
9    10/19 All the answers                     dtsavage:ex-test                                  
Read 1 item
Read 1 item

> open.ex.test
  state pr_num pr_date        user  branch                                   commit num_files        files
6  open     10   10/19 abfleishman ex-test 9c986fc668765102a2210d17bb99b57ebaa63123         1 lectures....
7  open      9   10/19    dtsavage ex-test 42ceffd2340f9ee19cfd4e2f87ce0f47e378bb25         1 lectures....

Retake on checking stuff out

So, as long as I am using git-pulls I don’t have to define any extra remotes to the student forks. It turns out that they all go into a file like .git/refs/pr/abfleishman/rep-res-course/ex-test where abfleishman is the github name of the student. I can just check these out simply like this:

git checkout -b abfleishman-ex-test refs/pr/abfleishman/rep-res-course/ex-test 

Notifying students

Check this out. I can email stuff to my noaa gmail account like this:

echo "gotta check this out " | mail -s "testing mail" eric.anderson@noaa.gov

So, I ought to be able to email things to students, like requests to delete extraneous files from homework branches. It would be better if I could submit a comment to the pull request from the command line directly to github, but I am not sure how that would work. If I could figure out how GitHub created the reply-to email for its notifications I would be able to do that, but I can’t figure that out. Oh well. It will probably be OK to just sent email directly to the student.


comments powered by Disqus