Thresholds for power and lengths of motifs are set by user. Sets of motifs can be constructed employing different matrices.
When the standard constant M=M(W) is chosen, a large number of the obtained motifs will be "noise", despite their formal statistical significance. This fact, obstacling further alignment construction by increase of the search, is caused by the null hypothesis of independence of the sequence being aligned (indeed, the very desire to align sequences is an evidence of their dependence!). Clearly, for dependent (similar) sequences the mean mismatch weight increases as compared to formula (2). The program allows a user to account for this fact by increasing M (by setting parameter "Minimum homology ratio" in the interval from 0 to 1: 0.01 is recommended). Experiments demonstrate that this procedure allows to filter out a majority of noise motifs.
Example in FASTA format:
>FOSB_HUMAN P53539 homo sapiens (human). fosb protein MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTA ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPVVDPYDMPGTSYSTPGMSGYSSGGASGS GGPSTSGTTSGPGPARPARARPRRPREETLTPEEEEKRRVRRERNKLAAAKCRNRRRELT DRLQAETDQLEEEKAELESEIAELQKEKERLEFVLVAHKPGCKIPYEEGPGPGPLAEVRD LPGSAPAKEDGFSWLLPPPPPPPLPFQTSQDAPPNLTASLFTHSEVQVLGDPFPVVNPSY TSSFVLTCPEVSAFAGAQRTSGSDQPSDPLNSPSLLAL >FOSB_MOUSE P13346 mus musculus (mouse). fosb protein. MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTA ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTSYSTPGLSAYSTGGASGS GGPSTSTTTSGPVSARPARARPRRPREETLTPEEEEKRRVRRERNKLAAAKCRNRRRELT DRLQAETDQLEEEKAELESEIAELQKEKERLEFVLVAHKPGCKIPYEEGPGPGPLAEVRD LPGSTSAKEDGFGWLLPPPPPPPLPFQSSRDAPPNLTASLFTHSEVQVLGDPFPVVSPSY TSSFVLTCPEVSAFAGAQRTSGSEQPSDPLNSPSLLAL
DNA vs. PROTEIN: The program will count the number of A,C,G,T,U and N characters. If 80% or more of the characters in a sequence are as above, then DNA / RNA is assumed, protein otherwise.
To exclude noise motifs from the graphic, the noise threshold must be equal the upper bound. The text output contains only non-noise motifs. Anyway, if the noise upper bound is 0, all motifs consider to be non-noise.
The value of 3 - 5 is recommended.
A C D E F G H I K L M N P Q R S T V W Y A 12 C 8 22 D 10 5 14 E 10 5 13 14 F 6 6 4 5 19 G 11 7 11 10 5 15 H 9 7 11 11 8 8 16 I 9 8 8 8 11 7 8 15 K 9 5 10 10 5 8 10 8 15 L 8 4 6 7 12 6 8 12 7 16 M 9 5 7 8 10 7 8 12 10 14 16 N 10 6 12 11 6 10 12 8 11 7 8 12 P 11 7 9 9 5 9 10 8 9 7 8 9 16 Q 10 5 12 12 5 9 13 8 11 8 9 11 10 14 R 8 6 9 9 6 7 12 8 13 7 10 10 10 11 16 S 11 10 10 10 7 11 9 9 10 7 8 11 11 9 10 12 T 11 8 10 10 7 10 9 10 10 8 9 10 10 9 9 11 13 V 10 8 8 8 9 9 8 14 8 12 12 8 9 8 8 9 10 14 W 4 2 3 3 10 3 7 5 7 8 6 6 4 5 12 8 5 4 27 Y 7 10 6 6 17 5 10 9 6 9 8 8 5 6 6 7 7 8 10 20
A C D E F G H I K L M N P Q R S T V W Y A 8 C 4 13 D 2 1 10 E 3 0 6 9 F 2 2 1 1 10 G 4 1 3 2 1 10 H 2 1 3 4 3 2 12 I 3 3 1 1 4 0 1 8 K 3 1 3 5 1 2 3 1 9 L 3 3 0 1 4 0 1 6 2 8 M 3 3 1 2 4 1 2 5 3 6 9 N 2 1 5 4 1 4 5 1 4 1 2 10 P 3 1 3 3 0 2 2 1 3 1 2 2 11 Q 3 1 4 6 1 2 4 1 5 2 4 4 3 9 R 3 1 2 4 1 2 4 1 6 2 3 4 2 5 9 S 5 3 4 4 2 4 3 2 4 2 3 5 3 4 3 8 T 4 3 3 3 2 2 2 3 3 3 3 4 3 3 3 5 9 V 4 3 1 2 3 1 1 7 2 5 5 1 2 2 1 2 4 8 W 1 2 0 1 5 2 2 1 1 2 3 0 0 2 1 1 2 1 15 Y 2 2 1 2 7 1 6 3 2 3 3 2 1 3 2 2 2 3 6 11
A C D E F G H I K L M N P Q R S T V W Y A 16 C 6 26 D 8 0 18 E 9 3 12 18 F 7 5 3 3 20 G 9 2 8 7 1 18 H 7 2 9 7 8 7 22 I 8 2 5 5 10 4 5 18 K 9 1 8 11 4 6 10 5 17 L 6 1 2 4 12 3 5 12 6 17 M 8 5 4 7 9 5 7 12 8 14 21 N 8 2 12 9 6 8 11 5 10 5 6 18 P 9 1 9 8 5 7 5 4 9 7 0 7 20 Q 9 3 9 12 3 7 11 3 11 5 9 9 6 19 R 8 4 6 10 4 7 10 4 13 6 6 8 6 12 20 S 10 2 10 8 5 8 7 5 8 5 5 11 9 9 9 16 T 9 4 8 9 5 6 7 7 10 5 7 10 8 9 8 12 17 V 9 5 5 6 8 4 6 14 6 12 10 4 5 6 5 5 8 17 W 4 1 4 2 13 3 6 6 4 9 9 4 2 2 6 4 0 5 25 Y 6 2 6 6 13 4 9 7 6 7 8 7 3 5 8 6 7 8 12 20
C S T P A G N D E Q H R K M I L V F Y W X * C 12 S 0 2 T 0 2 2 P -3 0 0 8 A 0 1 1 0 2 G -2 0 -1 -2 0 7 N -2 1 0 -1 0 0 4 D -3 0 0 -1 0 0 2 5 E -3 0 0 0 0 -1 1 3 4 Q -2 0 0 0 0 -1 1 1 2 3 H -1 0 0 -1 -1 -1 1 0 0 1 6 R -2 0 0 -1 -1 -1 0 0 0 2 1 5 K -3 0 0 -1 0 -1 1 0 1 2 1 3 3 M -1 -1 -1 -2 -1 -4 -2 -3 -2 -1 -1 -2 -1 4 I -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -2 -2 -2 2 4 L -2 -2 -1 -2 -1 -4 -3 -4 -3 -2 -2 -2 -2 3 3 4 V 0 -1 0 -2 0 -3 -2 -3 -2 -2 -2 -2 -2 2 3 2 3 F -1 -3 -2 -4 -2 -5 -3 -4 -4 -3 0 -3 -3 2 1 2 0 7 Y 0 -2 -2 -3 -2 -4 -1 -3 -3 -2 2 -2 -2 0 -1 0 -1 5 8 W -1 -3 -4 -5 -4 -4 -4 -5 -4 -3 -1 -2 -4 -1 -2 -1 -3 4 4 14 X -3 0 0 -1 0 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -2 -4 -1 * -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 1
A C D E F G H I K L M N P Q R S T V W Y A 10 C 0 10 D 0 0 10 E 0 0 0 10 F 0 0 0 0 10 G 0 0 0 0 0 10 H 0 0 0 0 0 0 10 I 0 0 0 0 0 0 0 10 K 0 0 0 0 0 0 0 0 10 L 0 0 0 0 0 0 0 0 0 10 M 0 0 0 0 0 0 0 0 0 0 10 N 0 0 0 0 0 0 0 0 0 0 0 10 P 0 0 0 0 0 0 0 0 0 0 0 0 10 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 10 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10