How to build a new frequency file for Weeder 2.0
Weeder computes significance scores by comparing the oligo frequencies in your input sequences to their background frequencies in the promoters of the genome you are studying. Thus, you need to:
- Prepare a multi-FASTA file containing the promoters of all the genes annotated in your genome of interest. "Promoters" should be the 1000 bp regions upstream of the transcription start sites (NOT the ATG codon) of the genes, taking the sequence of the same strand of the genes. Be careful: remove as much as possible "redundant" promoters, that is, of "genes" annotated with the same TSS (differing in alternative splicings). If you genes are annotated only with coding regions (i.e. from ATG codons) you can take the 1000bp upstream (but excluding) the ATG codon, but only if you're pretty much sure that in your genome of interest the ATG codon is usually within the 1st exon of the gene. In this case the frequencies you'll get are a good approximation of those you'd get by using the TSSs. In other words, in yeast (or similar species) this won't cause any problem: in human (or other mammals) this would be a major problem.
- Download this program (works under UNIX/LINUX/MacOSX), and save it anywhere you like.
- Compile the program: g++ w2frequency_maker.cpp -o w2frequency_maker -O3
- At this point you'll get an executable file called "w2frequency_maker"
- To run it the syntax is:
./w2frequency_maker promoter_multifasta_file species_code ds|ss
where "species_code" is a two-letter (uppercase) species code you'll use for your analyses. For example, in the frequencies that come with weeder2 HS is Homo sapiens, MM is Mus musculus, AT Arabidopsis thaliana, and so on. The third parameter must be "ds" or ss". To compute double strand frequencies use ds while for single strand frequencies use ss.
- When the program has finished, in the same directory you're in you'll notice that three new files have appeared, called "XX_ds.6.freq ""XX_ds.8.freq" and "XX_ds.10.freq" (or "XX_ss.6.freq ""XX_ss.8.freq" and "XX_ss.10.freq") where XX is the species code. Copy them into the "FreqFiles" folder of Weeder 2.0, and you're done. You can use the new frequencies by specifying the new "XX" species code on the commandline.