library(tidyverse)
dir.create("outputs/201", recursive = TRUE, showWarnings = FALSE)
dir.create("intermediates/201", recursive = TRUE, showWarnings = FALSE)
depths <- read_tsv("stored_results/201/read-depths-by-base-pair-and-pair-type.tsv.gz")
d2 <- depths %>%
mutate(
fall_mispair_ppn = fall.aberrant.bam / fall.2Mb.bam,
spring_mispair_ppn = spring.aberrant.bam / spring.2Mb.bam
)
Now, plot those alongside one another.
quick_lines <- function(d2) {
ggplot() +
geom_line(data = d2 %>% filter(!is.na(spring_mispair_ppn)), mapping = aes(x = POS, y = spring_mispair_ppn), colour = "gold") +
geom_line(data = d2 %>% filter(!is.na(fall_mispair_ppn)), mapping = aes(x = POS, y = -fall_mispair_ppn), colour = "blue")
}
mp <- quick_lines(d2)
ggsave(mp, filename = "outputs/201/mispair_spring_v_fall.png", width = 15, height = 7)
mp
So, those are quite symmetrical. What we need to do here is filter the results down to cases where we have at least a read depth of X and a one of fall or spring prortion being greater than P while the absolute difference between then two is at least D:
filt_rd <- function(data, X, P, D) {
data %>%
filter((fall.2Mb.bam > X | spring.2Mb.bam > X) &
(spring_mispair_ppn > P | fall_mispair_ppn > P) &
(abs(spring_mispair_ppn - fall_mispair_ppn) > D))
}
First just try filtering:
filt1 <- filt_rd(d2, 35, 0.8, 0.5)
# that suggests a single interesting region:
filt1 %>% select(POS, starts_with("fall"), starts_with("spring"))
That is one spot where the falls have plenty of reads aligning, but the springs have very few (only 1 or two). However, all of the reads aligning in springs are mispaired.
It turns out that is the exception that proves the rule. That segment is a 22-bp deletion carried on the \(E\)-lineage haplotype—it isn’t an indication of an inversion of any sort.
dsplit <- depths %>%
mutate(
fall_split_ppn = fall.split_reads.bam / fall.2Mb.bam,
spring_split_ppn = spring.split_reads.bam / spring.2Mb.bam
)
And plot those all along the 2 Mb:
split_ppn <- ggplot() +
geom_line(data = dsplit %>% filter(!is.na(spring_split_ppn)), mapping = aes(x = POS, y = spring_split_ppn), colour = "gold") +
geom_line(data = dsplit %>% filter(!is.na(fall_split_ppn)), mapping = aes(x = POS, y = -fall_split_ppn), colour = "blue")
ggsave(split_ppn, filename = "outputs/201/split_read_ppn_spring_v_fall.png", width = 15, height = 7)
split_ppn
That looks a lot like the mispaired reads plot. So, let’s filter down in the same way and see if we find any that are really different between falls and springs.
filt_for_splits <- function(data, X, P, D) {
data %>%
filter((fall.2Mb.bam > X | spring.2Mb.bam > X) &
(spring_split_ppn > P | fall_split_ppn > P) &
(abs(spring_split_ppn - fall_split_ppn) > D))
}
filt2 <- filt_for_splits(dsplit, 35, 0.7, 0.3)
filt2
This indicates two regions worth following up on. One is at POS = 12,234,265. The other is a range from 12,332,752-12,332,787.
POS = 12,234,265 is in the middle of a deletion (from 21 to 27 bp in various individuals), and there are a handful of reads from springers that span that deletion and also have supplementary alignments (most on different chromosomes, and no clusters to other nearby locations on Chr. 28). Conclusion = this is not an indication of an inversion.
As for 12,332,752-12,332,787, looking at this in IGV, there are very few reads mapping there from spring run, and those that do have very low MAPQs. The supplemental alignments are not to a consistent location on Chr 28, so there seems no way these could indicate an inversion breakpoint in this region.
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 3.6.2 (2019-12-12)
## os macOS Sierra 10.12.6
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/Denver
## date 2020-05-14
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
## backports 1.1.6 2020-04-05 [1] CRAN (R 3.6.2)
## broom 0.5.6 2020-04-20 [1] CRAN (R 3.6.2)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
## cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
## colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
## DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.0)
## dbplyr 1.4.3 2020-04-19 [1] CRAN (R 3.6.2)
## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
## dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.0)
## ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
## fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
## farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.0)
## forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.0)
## fs 1.4.1 2020-04-04 [1] CRAN (R 3.6.2)
## generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
## ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.0)
## glue 1.4.0 2020-04-03 [1] CRAN (R 3.6.2)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
## haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.0)
## hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.0)
## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
## httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
## jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.0)
## knitr 1.28 2020-02-06 [1] CRAN (R 3.6.0)
## labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0)
## lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
## lubridate 1.7.8 2020-04-06 [1] CRAN (R 3.6.2)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
## modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
## nlme 3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
## pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 3.6.2)
## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
## Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0)
## readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
## reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
## rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0)
## rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
## rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.0)
## rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.0)
## scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
## stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
## tibble * 3.0.1 2020-04-20 [1] CRAN (R 3.6.2)
## tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.0)
## tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.0)
## tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.0)
## vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.0)
## withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.2)
## xfun 0.13 2020-04-13 [1] CRAN (R 3.6.2)
## xml2 1.3.2 2020-04-23 [1] CRAN (R 3.6.2)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
##
## [1] /Users/eriq/Library/R/3.6/library
## [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Running the code and rendering this notebook required approximately this much time on a Mac laptop of middling speed:
Sys.time() - start_time
## Time difference of 3.538503 mins