Comments and thoughts on Homework #1 (Trial Homework)
Preliminaries
First off!
- Woo-hoo! Way to go everyone who got those in!
- Woo-hoo! Way to go everyone who is still working on it!
I’m pumped by how many people made their first pull request.
What does a pull request look like to me?
- Check it out!
- I get an email and gmail is github-aware
- I can see the chnanges that you have made
- I can comment, etc.
- You can all do this too! Just go to https://github.com/eriqande/rep-res-course and find the pull requests button.
- In fact, if you aren’t sure how to do the homework or what the best answer is, feel free to browse what other people have done and get ideas.
- I don’t consider this cheating—especially if you view everyone’s responses with a scientific attitude. You’ll be learning about GitHub and reviewing lots of R code.
- Keep in mind that some suggested answers you see from other people might not be optimal.
- If you see that someone has made a mistake and want to let them know, just comment on their commit.
- In fact, if you aren’t sure how to do the homework or what the best answer is, feel free to browse what other people have done and get ideas.
- Note, please keep your pull requests Open. That way my scripts can fetch your work easily.
- I will Close them when we are done with them. You can always Re-open them.
What if I want to change an answer?
- By all means, feel free. This is where GitHub really excels.
- Just make your changes, commit them, and push them up and the pull request should be automatically updated (I think…)
My responses data base
Show it to them. View(ans)
General comments from what I saw
It is great to have everyone’s responses. Here are some comments that should be helpful to everyone.
Strive for Economy of characters
When you are writing code, usually, but not always) shorter is going to be
- easier to read
- easier to debug
- easier to maintain
As long as it clearly expresses the intent of the program.
Along those lines, (intermixed with some of my OCD code-style ideas) some guidelines are:
You don’t have to define intermediate variables. Sometimes it is helpful to break up long calculation with some intermediates, but not always. So:
The important take home is that an expression basically behaves like a variable anywhere in R.# this is preferred gnames > "github" # this makes unnecessary variable assignments a <- "github" b <- a < gnames b # this also makes unnecessary variable assignments y <- c("github") x <- gnames > y x
Character vectors don’t have to be a single character, so you can say what you want!
# this is preferred gnames > "github" # this is not so precise. Might work in a certain # problem, but is not general: gnames > "g"
You don’t have to repeat the question in the answer:
# here are some github names of people taking the course gnames <- c("cpetrik", "wildflowermt", "mad4mocha", "sjohnson216", "okisutch99", "sczTWilliams", "rbeas", "mtarjan", "aaronmams", "lslefebvre") # return a logical vector that gives TRUE for each name that comes after # the word "github" alphabetically submit_answer({ gnames <- c("cpetrik", "wildflowermt", "mad4mocha", "sjohnson216", "okisutch99", "sczTWilliams", "rbeas", "mtarjan", "aaronmams", "lslefebvre") b <- c("github") gnames > b })
If doing comparisons, put the variable on the left and the constant (if there is one) on the right:
gnames > "github" # eric prefers this "github" < gnames # rather than this
- Some things aren’t necessary. They aren’t wrong, but they are not economical and make code harder to read. The top few from the last homework:
If it is a vector, you don’t have to put
c()
around it to make it a vector:
They <- c(gnames[x]) # gnames[x] is a vector already. y <- gnames[x] # same things as above, but preferred
c()
function is for catenating vectors, (but beware of “growing vectors”, see below.)Logical vectors index as logical vectors. They don’t have to be wrapped in
which()
. The functionwhich(LL)
returns the indexes for which the logical vectorLL
isTRUE
. Many people wrap their logical vectors in it. Don’t.gnames[which(gnames > "github")] <- "zzz" # unnecessary which gnames[gnames > "github"] <- "zzz" # same thing and simpler
Also, if it is a logical vector, you need not coerce it to a logical—it already is:
as.logical(gnames > "github") # unnecessary coercion gnames > "github" # the > comparison operator returns a logical vector anyway
Get comfortable with precedence
isAfterGithub <- (gnames > "github") # parentheses unnecessary isAfterGithub <- gnames > "github" # same as above but easier to read gnames > "github" # best: no intermediate assignment when not needed
Don’t use a for loop if the vectorized operation will get you there
This was one of the hardest things for me as a C programmer, and I suspect that python programmers might find it a difficult too.
Remember. R is a vectorized language. If you give it a vector it wants to operate elementwise on every element in that vector. This means that quite often you needn’t write for loops for operations that you do have to write for loops for in C or python.
The latter is clearly harder to write, harder to maintain, and easier to hide bugs in than the former.# this is concise and precise (and computationlly efficient) gnames < "github" # this is how a C programmer things about it: x <- c() # make an empty vector for (name in gnames) { # let name cycle over the values in gname if (name > "github") # test each value x <- c(x, TRUE) # if it is true, "grow"" x with a TRUE else x <- c(x, FALSE) # if it if FALSE "grow" x with a FALSE } x # return x
- BUT, did you know that it is also orders of magnitude slower in R?
- Try this at home, comparing 10^5 numbers:
x <- rnorm(n = 10^5, mean=1.0, sd=5) # make 10^5 numbers # test if any are greater than 2. # the fast, vectorized way g2_fast <- x > 2 # slower for-loop way gt_slow <- c() for(i in 1:length(x)) { gt_slow <- c(gt_slow, x[i]>2) } # see that you get the same result with either method: all(g2_fast == gt_slow) # but clearly the vectorized operation is faster
The much maligned “slowness” of R, is sometimes attributable to not doing vectorized operations.
comments powered by Disqus