home Bioinformatics Request for help on sequence conservation among > 1 million repeats

Request for help on sequence conservation among > 1 million repeats

I got an email from a colleague requesting some help on an informatics issue and I thought it might be useful to post it here.  I have been thinking about starting a section of this blog on “Technical Help Requests” or something like that so I guess this is a test.

Here is the request

I have a list of sequences of a set of > 1 million short repeat elements in a large eukaryotic genome, and I need to find a ~60 bp region which is most conserved among these elements. While they are “repeat elements”, they can be fairly diverse in specific sequence, but I only need a subset that contain the (near-perfect) conserved sequence. What method or software would you recommend to find this region? All the ones I usually use can’t handle that many lines of input.

6 thoughts on “Request for help on sequence conservation among > 1 million repeats

  1. To find a >60bp conserved region (or the most conserved region) among a known set of repeat elements. The conserved region is undefined. Thanks!

  2. Thanks! We’re taking a look into kallisto now for our seq list. We also just found an R package “DECIPHER” that seems pretty powerful.

Leave a Reply

%d bloggers like this: