classify.py Classifies sequences in an alignment according to one or two "rules" specifying amino acid composition at defined positions. Rules are written as nested, parenthesized pairs of expressions; an expression consists of a position followed by a character (e.g., 3R). Build the rules by joining pairs of expressions with parentheses. Input file should contain aligned sequences in fasta format; fasta format sequences can also be supplied via stdin, eg 'cat file.fasta | classify.py ...') If only one rule is given, sequences satisfying rule 1 are scored as false for rule 2. If two rules are given, sequences are scored for both rules independently; in this case a sequence may satisfy neither, one, or both rules. Use extract.py to recover lists of sequences from input aligments. For example: classify.py protease.fasta -r1="(63L)" -pick=a | extract.py protease.fasta -l=- -out=pro_63L.fasta 1A 2B 3C 4D 5E 6F 7G (1A 2B) (3C 4D) (5E 6F) 7G ((1A 2B) (3C 4D)) ((5E 6F) 7G) (((1A 2B) (3C 4D)) ((5E 6F) 7G)) Use '&' and '|' (logical AND, OR) to join pairs. (((1A & 2B) | (3C & 4D)) | ((5E | 6F) & 7G)) Negate expressions with a leading ^ (((1A & ^2B) | (3C & 4D)) | ((5E | ^6F) & 7G)) Examples (enclose all rules on the command line with single or double quotes): 1) classify.py protease.fasta -out=pro_63p.classified -r1="(63P)" 2) ... -r1="(63P)" -r2="(63L)" 3) ... -r1="(63P | 63S)" -r2="(63L | 63S)" In this case, both expressions may be true. 4) ... -r1="(((63L | 63P) | (93R & ^97K)) & 72L)" 5) ... -r1="((63L | 63P) | ((93R & ^97K) & 72L))" Note that rules 4 and 5 above give different results. -r1 Rule 1 -r2 Rule 2 (optional) -n { [s] | i | 0 } Output format for name. s print the name of the sequence i print the index of the sequence 0 print nothing -r { [num] | ab | 0 } Output format of result. num if r2 is not supplied, prints a 1 or 0 after the name according to the truth of r1; if r2 is supplied, prints a 1 or 0 for the result of each of the rules. ab prints 'A' in the first column after the name if r1 is true and 'B' in the next column if 1) r1 is false and no r2 is supplied, or 2) r2 is true. 0 print nothing -pick { [ab] | a | b } Selects which lines (each line corresponding to a sequence) to print. ab Print all lines regardless of result. a Print line if rule 1 is satisfied. b Print line if rule 2 is satisfied. -neg { [^] | ! } Character used to negate an expression. ! Requires protection with a backslash when used on the command line. -out Name of output file. Prints to stdout if unspecified. -h Prints documentation. -v { [1] } Verbosity; use for debugging. -version print version info and exit $Id: classify.py,v 1.1 2004/10/10 02:53:09 nghoffma Exp $