add new tracks to IGV server on the fly with php

use php files rather than xml files

if wanting to post bigwigs (bw) and bed files to IGV, place php file such as shown below into the directory where you add these bw and beds, then simply list these php files in your igv_registry.txt file!

<Global name="my_expts" version="1">
 <Category name="chip_seq">

<?php
 $dir = "./";
 $files = scandir($dir);
 foreach($files as $flname){
 if (preg_match("@.+\.(bw)$@i",$flname) |
 preg_match("@.+\.(bed)$@i",$flname)) {
 ?>
 <Resource name="<?php echo($flname); ?>" 
  path="http://your.server.net/igvdata/chipseq/<?php echo($flname); ?>">
 </Resource>
 <?php
 }
 }
 ?>

</Category>
 </Global>

when you add files to the ‘chipseq’ directory, they will update to the IGV server menu automatically

getting a UK license coming from CA/NY

As a licensed American driver, you are given one year of ‘grace’ before having to get a valid UK license. In my case, the year came and went all too quickly once I moved to Norwich (in the east of England) as the first 6 months were spent gleefully exploring with our car-share, and last six somewhat waylaid by COVID and not worrying about driving too much. I will spare you the sob-story, as we all went through the same tough stuff. The bottom line is, it’s July 2021 and I finally got my license. It was a challenge, but not so difficult, and here’s how I did it in case you’re in the same boat as me (I am a researcher doing a postdoc abroad):

step 1: use the first year (grace period) to get lots of time behind the wheel to learn about driving on the ‘wrong side’ of the road and to get comfortable with the stick shift on the opposite hand

I used a car share service called co-wheels, which was simple to set up. My wife and I, along with our dog Rosco, could go all over Norfolk and Suffolk doing day trips (£45 per day for the car) or quick trips (£5 per hour more or less). So we did — I drove a lot. That taught me loads that would not have been possible by simply driving a few hours with an instructor. We also took overnight long-haul trips to visit family in the Midlands, and then took a weeklong trip to Wales. The narrow hedgerows, blindspots, soft verges, stone walls, herds of sheep, and single track roads all seemed daunting at the beginning, and then you just… get used to it. It makes sense getting a license here requires jumping through some hoops — it’s more challenging to drive here!

step 2: get a provisional license and the DVSA book on driving, read through the book then take some practice theory tests on your phone via an app (pick an app, they’re all similar)

To get a provisional license, you simply request one through the government website and pay about £40. It doesn’t matter that you already have a license from the US. You are basically a teenager again. The ID then shows up at your door, and is practically useless (except when you really need it: to take your theory test, to do lessons, and take your driving test). The driving manual from DVSA was helpful to prepare for the theory test, which in my opinion required more prep time than the actual driving test. It’s over 300 pages, but it’s a quick read — there’s also an index, so you can just dive into the practice theory tests then flip to the relevant pages for things you can’t reason out and/or the ones that make no sense, like the difference between a zebra, pelican, toucan, and whatever the heck else kind of cross walk.

step 3: pass your theory test

just take the tests on your phone until you get only a one or two incorrect each test, then you are ready! When I took the test, there was also a section called hazard perception, which I had no preparation for whatsoever, but was able to pass by teeth-skin. So… perhaps look into trying a few hazard perception questions before you go; I suspect you can get demo ones from the same companies as supply the mock test apps.

step 4: schedule your driving test and get an instructor to do some lessons with

Once you know where your test will take place, a knowledgable instructor should be able to show you some of the possible routes that you will drive during the test. This is really valuable, as having seen the physical environment in advance you will be a lot more relaxed out on test day. Also, if you’re doing some little weird things like not checking your mirrors enough (or, obviously enough for the examiner to notice), the instructor will tell you this. 15 “minor faults” is all it takes to fail, so it’s best to just get the instructor to tell you what your problems are before the day comes. I watched some youtube videos on how to properly sequence the mirror check, turn signal, maneuver… which sounds absurd maybe but for me, just seeing someone doing it sort of reinforced it and helped in the long run.

step 5: day of test

get a bit of driving and maneuvering in before you sit for the test. My instructor was really good the day of test and gave me about an hour to reaquaint myself with the car (a Vauxhall Mokka) before the exam. Afterward, we had a quick look under the bonnet (hood) in case one of the “show me” questions involved this, and then just waited for the instructors to appear at the test center and call my name. The tell me questions are some basic ones you just have to memorize (compared to the theory test they’re very easy, and they’re only 14 of them and all are clearly listed on the DVSA gov. website).

citation classics

a fun way to pass lockdown time is perusing these little nuggets, where an author of the highly cited work gives personal background and opinion about the work itself

http://garfield.library.upenn.edu/classics.html

After only a few quick glances, I found Pat O’Farrell’s work about 2-D protein gel electrophoresis… a person very important to me because of a fleeting interaction we had in which he told me to stick with Stavros Lomvardas during my first year of grad school at UCSF when i was trying to decide where to go. Amazing… i had no idea what he had been researching so many years ago, and that it has been cited a bajillion times

download only the SRAs you want, from cmd line

go to the NCBI SRA database site, click the record you are interested in and then click the tab ‘data access’ and copy-paste the https URL into a text file

place this text file of your SRA URLs into the directory where you wish to download the data (no spaces in the list or any trailing characters, etc)

enter the command

for i in `cat srr.list` ; do wget $i ; done

et voila, go have a snack

understanding bisulfite sequencing — C-to-T vs G-to-A

Recently, I have been re-analyzing bisulfite sequencing data from a fungus with significant DNA methylation around its centromeres, but with limited mappability due to high transposon similarity. I ‘discovered’ that diferent bisulfite mapping software produced drastically different summary methylation statistics.

After about a week of comparing the output of various mapping softwares, I was faced with another discovery: I didn’t quite grasp how the bisulfite software works at the conceptual level. What an embarrassment! A… sixth year postdoc?… that doesn’t understand his own bread-and-butter methods. In any case, my initial realization stemmed from reading the perl script of our lab’s ‘in-house’ (i.e. not published and not well documented) bisulfite mapper, which works a lot like published softwares “Bismark” and “BS-seeker” and others (however it is quite different from BSMAP, Pash, and others… perhaps more on that later). With software like Bismark, all sequencing reads and the genome itself are C-to-T converted, whereupon mapping is carried out, and later resolved using read ID matching to calculate C that did not convert in reads, and was therefore methylated. However, a G-to-A converted genome is also generated. This has long bothered me… why do we need a G-to-A converted genome as well? My understanding was we need C-to-T alone because of the directionality strategy built into essentially all library preparation methods. Of course one needs directionality for RNA-seq, since you care very much about the strand that gave rise to a particular single-stranded RNA. But, to review, why directionality for bisulfite libraries? In order to make the libraries, adapters are ligated to the fragmented gDNA and the protolibraries are then denatured. That means the 5′-end of your protolibrary top strand will have the same adaptor sequence as the 5′-end of your bottom strand. For our NextSeq flowcell, these adaptors serve two critical functions: 1.) as the complementary strand that anneals the library to the flowcell, and 2.) as a sequencing primer binding site. Following bridge amplification of the libraries on-flowcell, which generates a complementary strand to your originals you added, extension from the read 1 sequencing primer re-creates the “original top” and “original bottom” — such is the beauty of directionality in the bisulfite context. The sequencing information you get from read 1-based extension exactly replicates the ‘original’ bisulfite-converted genomic DNA. On this strand, C converts to T when it’s not protected by methylation (and its derivatives), and so that is why we map these reads to a C-to-T converted genome. OK…. so why the G-to-A converted genome also?? G-to-A seems relevant, but only because the template strand covalently bound to the flowcell represents your data in that transformation…ah! but what about paired end libraries!!! Duh. You may have seen this coming; if so, my apologies. The punchline to this diatribe is: as long as you are using directional library prep methods (basically any normal kit will produce directional libraries, for instance from Tecan/Nugen, NEB, Illumina, Swift, etc.) paired end bisulfite sequencing requires the G-to-A converted read and genome sequence to map read-2 reads. All of your unmethylated bases in the ‘original strand’ (top or bottom) will register as A on the complementary strand, the ‘complementary to original’ strand, which is synthesized from the read 2 primer (in our case, built in to the NextSeq reagents). The Bismark github page has some documentation on this, but I had really not appreciated the importance of directionality in these libraries and how that translates to single-end mapping requiring only the C-to-T converted genome and reads; and likewise the relevance of G-to-A converted genomes for read 2 in paired-end libraries.

calculate mappability of a genome

I couldn’t find a simple explanation of how to run gem_mappability as was carried out in Catoni et al. 2017 EMBO (out of the Paszkowski lab). I found enough information here to get me going – below is the simplified version for those that simply want to get a bigwig or bedgraph of these values but don’t care about comparing a bajillion different k-mer sizes.

  1. download the compressed file (your specific version may be different)
  2. tar -xvjf GEM-binaries-Linux-x86_64-core_i3-20130406-045632.tbz2
  3. cd GEM-binaries-Linux-x86_64-core_i3-20130406-045632/
  4. cp all the executables in ./bin to a directory in your path, such as /usr/local/bin
  5. then run the following – change the names of your files accordingly:

gem-indexer -i /path_to_your_genome_fasta -o output_name_you_pick

gem-mappability -I output_from_above -l [you decide how big kmer should be] -o pick_a_gem_output_name

gem-2-wig -I gem_indexer_output -i gem_mappability_output -o your_wig

then convert your wig to bigWig using Jim Kent/UCSC tools

fastq trimming – Illumina truseq2 v truseq3

the 5′ end (13 nt) are identical, with some similarity further 3′ as well

When I’m not sure which adaptors were used for the construction of a sequencing library, but I know they were Illumina, I take the top ~100k reads and run trimmomatic using more-or-less default settings against the 2 different Illumina truseq adaptors that ‘ship’ with that software. Then compare how the two trim-logs look — the one that trimmed off more crap is the one to proceed with. Something like this (these are set for small RNA libraries where the inserts are very short!):

java -jar trimmomatic-0.36.jar SE -threads 5 -trimlog trimlog_test test.fastq out.test.fastq ILLUMINACLIP:TruSeq2-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:1

awk ‘$6>0 {print}’ trimlog_test | wc

genotyping in 96-well format

had to split the plates to get them to fit into small thermocycler :-<

Generating Arabidopsis triple and quadruple mutants (and beyond) requires stamina.  the mind-numbing process can be made less time-consuming by doing on-plate DNA extraction with an SDS-free extraction method, as put forth in a Ronan O’Malley et al. paper (“guide to the t-dna insertion collection” approximately). They use steel balls and a paint can shaker to achieve tissue disruption, whereas I use a PCR plate with zirconia beads after deep-freezing the leaf sample (partially submerge plate on metal plate rack in ethanol+dry ice). I then add 300ul TE with 100mM NaCl (roughly per O’Malley), vortex a bit, and spin down the plate at 4000 RPM in an eppendorf 5804. The trick post-spin is take only the very top to avoid contaminating debris. I took ~50ul here, adding it to a new plate where I’d crudely pipetman’d out 200ul cold isopropanol. Once done with this, I store o/n in the freezer, spin down at high speed as before the next day, and then decant as much iso as possible. From here, I do a 70% EtOH wash and spin as before, then dry the samples out. I found ramming air into the plate was the only way that worked for me… I titled the plate at one of our PCR vents and the remaining traces of alcohol slowly evaporated. It took a long time to dry. I resuspend in 50ul EB and proceed with PCR using multichannel pipets and life is good.  

NB: I use adhesive foil for PCR plates to cover the samples for all these steps. The foil dents heavily during the vortex, but doesn’t break open.

Once ready to run the samples on a DNA gel, you can cast the gel with an appropriately-sized comb to further streamline the process. I use a 26-well format from Biorad in their sub-cell electrophoresis get-up, then load WT in even positions, with mutant in odd positions (something I realize the O’Malley paper also recommends!)

make a freeNAS file server with old parts: I

 when my lab departed for the John Innes Centre about a year ago, there was a large quantity of raw computer parts left behind for the salvage heap.  I wanted a dedicated place to back up my laptop and home computers remotely, and had heard about freely-available network attached storage (NAS) systems, but that was where my knowledge ended.  Our lab had been using the somewhat pricey synology NAS system, and I wanted to learn if there was an open source alternative. 

FreeBSD-based FreeNAS fit the bill.  There is a TON of info and support for how FreeNAS works and how to get it going.  What I envisioned was taking some old HDDs laying around, reformatting them, and using them as the storage core of the system.  I would boot to a usb drive on which I’ve installed FreeNAS, and format things from there.   The amazing thing is… it basically worked and now I have a server for a lot of previously unsecure stuff.  Booting off of a USB loaded with the OS is actually the recommended way to go: see this simple instructional video from FreeNAS 

One detail that is worth noting is that FreeNAS storage is memory intensive.  I am not 100% clear on why, but I believe it has to do with the ZFS-based architecture (NB: zfs does not stand for anything!).  For each TB of hard drive, they recommend 1GB of RAM, starting with at least 4GB RAM as a base.  Luckily I had a bunch of DDR3 RAM laying around and a capable Asus X99 motherboard with an i7 cpu.  Once I got the thing running, it had 64GB RAM supporting 8TB of hard storage; I’ve gone about 2TB into this thing after about a year. 

the web interface for FreeNAS with some basic system specs.