This document illustrates some common formats used for sequences representation.
ID MMVASPHOS standard; RNA; EST; 140 BP. AC X97897; DE M.musculus mRNA for protein homologous to DE vasodilator-stimulated phosphoprotein SQ Sequence 140 BP; 25 A; 58 C; 39 G; 17 T; 1 other; ttctcccaga agctgactct atggngaccc cgagagagac tgagcagaac 60 ccccgcaccc ctgcacttcc aatcaggggc gccccgggag cactccccgt 120 ccgccctccg cgcagccatg 140 //
>MMVASPHOS ttctcccagaagctgactctatggngaccccgagagagactgagcagaacctggagccag ccccgcacccctgcacttccaatcaggggcgccccgggagcactccccgtggcgcgccgc ccgccctccgcgcagccatg
!!NA_SEQUENCE 1.0 (No documentation) dna1.txt Length: 88 Nov 22, 2001 14:38 Type: N Check: 3818 ..
1 TAGTCGTAGT CGGAGCGATG CTGACGATGA CGATGACGAT CGTAGCTGAT
51 CGATCGAGCT GATGCTGATC GAGCTAGCTG ATCGATCG
#sample1 TTCAAGAGAAACAGCGGCCAAGGAAAAGACTCGGCATGATTGTCCATAGCTTACAAAGCG #sample2 TTCAAGAGAAACAGCGGCTGGGGGAAAGACTCGTCCTGATTGCCTGTAGATGGTAAAGCG
LOCUS HUMHBV1 130 bp DNA PRI 17-JUN-1993 DEFINITION Human DNA/endogenous Hepatitis B virus (HBV) DNA, left host viral junction. ACCESSION M15770 BASE COUNT 32 a 43 c 29 g 26 t ORIGIN 1 agcgggcagt gcagctgctt ggacagcagg ggtgtttctt caacccaggc 61 ctcctgtcac aacaggccca ttcaattctg aacctgcaag ccaactccaa 121 cctcttttcc cagggggaac caaaaaccct //
; comment U03518 AACCTGCGGAAGGATCATTACCGAGTGCGGGTCCTTTGGGCCCAACCTCCCATCCGTGTC TATTGTACCCTGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCTCTG TGAGTTGATTGAATGCAATCAGTTAAAACTTTCAACAATGGATCTCTTGGTTCCGGC1
>P1;CCHU cytochrome c [validated] - human MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE*
ENTRY CCHU #type complete TITLE cytochrome c [validated] - human ACCESSIONS A31764; A05676; I55192; A00001 SUMMARY #length 105 #molecular-weight 11749 #checksum 3247 SEQUENCE 5 10 15 20 25 30 1 M G D V E K G K K I F I M K C S Q C H T V E K G G K H K T G 31 P N L H G L F G R K T G Q A P G Y S Y T A A N K N K G I I W 61 G E D T L M E Y L E N P K K Y I P G T K M I F V G I K K K E 91 E R A D L I A Y L K K A T N E ///
ttctcccagaagctgactctatggngaccccgagagagactgagcagaacctggagccag ccccgcacccctgcacttccaatcaggggcgccccgggagcactccccgtggcgcgccgc ccgccctccgcgcagccatg Warning: This format cannot handle more than one sequence per file.
ID 100K_RAT STANDARD; PRT; 149 AA. AC Q62671; DE 100 kDa protein (EC 6.3.2.-). SQ SEQUENCE 149 AA; 17004 MW; D06484B8BC29112E CRC64; MMSARGDFLN YALSLMRSHN DEHSDVLPVL DVCSLKHVAY VFQALIYWIK PQLERKRTRE LLELGIDNED SEHENDDDTS QSATLNDKDD ESLPAETGQN SITIRPPDDQ HLPTANTCIS RLYVPLYSSK QILKQKLLLA IKTKNFGFV //
Nicolas Joly ([email protected]), Institut Pasteur.