Home | News | Sequences & Secondary Structures | Alignments | 3D structures | Help | Links


Help Page


Are your data files arriving truncated?

An ongoing problem with this database is that some people (not many, but a few) get the data files truncated. All of the data files in this database are text files, and unfortunately there is a difference in the end-of-line code in Mac, PC, and Unix. On Macs, this is <CR> (Carriage Return). In Unix, it's <LF> (Line Feed). In PC, it's <CR><LF> (both carriage return and line feed). These are normally converted correctly by your browser. However, some browsers use the Content-Length information in the html header to determine how much of the file to read before translating <EOL> characters. On a PC (the files are served from a Mac), these browsers will truncate the file by the the number of lines in the file; i.e. if the file is 290 lines long, a PC will lose the last 290 characters.

Until I can come up with a good solution, if you run into this problem you have two choices:

  1. First, try setting your preferences to explicitly treat .gb, .ct, .rnaml, and .pdb files as text files. In most cases, this seems to fix the problem. If you like, you can also set these file types to be opened directly in whatever program you want to use; for example, to open .gb files in BioEdit, .ct files in RNAVis, .rnaml files in s2s, and .pdb files in VMD.
  2. Or, just use another browser or another computer - I've only heard complaints about this problem from people using Mozilla or Explorer on PCs.


Data file types

Sequences of RNAs and proteins are provided in Genbank format. This file format can be imported by almost any sequence manipulation program.

RNA secondary structures are given in Connect (.ct) format and RNA Markup Language (.rnaml) format.

For additional information about this Connect format see ____________. This format is used by a variety of RNA secondary structure analysis programs. These structures do not contain pseudoknots, as most of these programs cannot deal with pseudoknotted RNAs; the bases in helixes P4, P6, and P21 (when present) have been de-basepaired. The files contain the ENERGY= header line that is required of some of the programs that use CONNECT files even though it is not part of the formal structure of the CONNECT format.

For additional information about RNAML format, see the RNAML web site or Waugh, et al., 2002 RNA 8:707-717. See the Links Page for programs these can be viewed in. RNAML files contain all known basepairs, and so will contain pseudoknots.

Some of the secondary structures are also available in Adobe Acrobat (.pdf) format for viewing directly in your browser, Acrobat, or Safari. Please note that these representations are not as up-to-date as the machine-readable files.

All three-dimensional structures and models are provided in Protein DataBase (PDB) format. These can be viewed by most 3D structure viewing program, including PyMol, VMD, RasMol, iMol, etc.


A layout of the site

The menu bar at the top of page (below the banner) stays in place for all of the pages on this site and contains links to the main areas of the site. Any time you're in the RNase P Database, you can click on any of the items on the menu to go jump directly to that section.

The first item in this menu is a link to the site Home page, the main doorway to the site. This page is essentially a Table of Contents, and often also contains timely annoucements and other useful information.

The News & Info page contains a list of newly added data, changes & updates, and other timely information. Routine users of the database will find this a very useful 'first stop', since each of the listings is linked directly to the appropriate place in the database.

The Sequences & Secondary Structures page is a list of organisms with information in the database, arranged in phylogenetic groups. Each of these phylogenetic groups has its own page, the Sequences & Secondary Structures sub-pages, that can be accessed by clicking on the name of the group. Clicking on a genus name takes you to the appropriate page and, if that page is a long one, to the place on that page where the data for that genus is.

Sequences & Secondary Structures sub-pages

Each main phylogenetic group has its own Sequences & Secondary Structures sub-page. Near the top of these pages, below the banner and menu bar, are links to the 'Prior page' and 'Next page' - click on these to browse the individual pages; the order of the pages is the same as on the main Sequences & Secondary Structures page. Each of the sequences & structures sub-pages contains a table with rows arranged something like this:

Organism RNA Protein
Genus/species
Strain
Seq ID
Seq citation ID
Structure files
Seq ID
Seq citation ID
Seq file
Bacterium bacteriales
faux
A123456
123456
.gb | .ct | .rnaml | .pdf
A123456
123456
rnpA.gb

Organism information

The first column of this section usually contains the names of organisms by genus & species. Species names are linked to the NCBI/Entrez Taxonomy record for that organism - click on the name to get more information about the organism or look for other publications or sequences from that organism. In instances where the organisms a sequence comes from is unknown (e.g. 'natural populations' or contaminants sequences), clone designations are given.

The next column is the strain of that species. If that specific strain contains an entry in the NCBI Taxonomy Browser, then the strain name is linked to that record.

RNA information

The first column of this section contains the Genbank accession number (Seq ID) for the RNA sequence. Clicking on this number will take you to the official sequence record via NCBI/Entrez. More often than not, the Genbank sequence record is more than just the RNA sequence, usually the entire continuous sequence that was determined - in some cases, it might be the entire genome sequence of that organism! Be aware that the RNA-encoding region is very often not annotated in the GenBank data file.

The next column is the PubMed citation ID number for the RNA sequence. Clicking on this number will take you to the official citation record via NCBI/Entrez. More often than not, the citation record is a paper describing more than just this RNA sequence - in many cases, it might be the entire genome sequence of that organism, and the RNase P RNA sequence may not even be mentioned!

The next column contains a set of links for accessing the secondary structure files.

The .gb link will give you the extracted sequence of the RNA, in Genbank format.

The .ct link will give you the sequence and secondary structure of the RNA in CONNECT format. For additional information about this format see ____________. This format is used by a variety of RNA secondary structure analysis programs. These structures do not contain pseudoknots, as most of these programs cannot deal with pseudoknotted RNAs; helixes P4, P6, and P21 (when present) have been de-basepaired. The files contain the ENERGY= header line that is required of some of the programs that use CONNECT files even though it is not part of the formal structure of the CONNECT format.

The .rnaml link will give you the sequence and secondary structure of the RNA in RNA XML format. For additional information about this format, see the RNAML web site or Waugh, et al., 2002 RNA 8:707-717. See the Links Page for programs these can be viewed in.

Some of the secondary structures are also available in .pdf format for viewing directly in your browser or Adobe Acrobat.

Protein information

The first column of this section contains the Genbank accession number (Seq ID) for the protein sequence. Clicking on this number will take you to the official sequence record via NCBI/Entrez. More often than not, the Genbank sequence record is more than just the RNase P protein sequence, usually the entire continuous sequence that was determined - in some cases, it might be the entire genome sequence of that organism! In the NCBI/Genbank file there will be a link to the PubMed record for the citation for this sequence.

The next column is the PubMed citation ID number for the protein sequence. Clicking on this number will take you to the official citation record via NCBI/Entrez. More often than not, the citation record is a paper describing more than just this protein sequence - in many cases, it might be the entire genome sequence of that organism!

The last column contains a link to the amino seqeunce in GenBank (.gb) format.

On the Archaea and Nuclear pages, the protein information is divided into columns for each of the identified homologous proteins.

The Alignments page contains a variety of alignments from published data and the alignments used in this database. These alignments are in GenBank (.gb) format.

3D Structures Page

The 3D Structures page contains a variety of 3D structures from the literature. These are provided in Protein Data Bank (.pdb) format.

The Help page is a collection of additional helpful information on how to use this site, file format information, etc.

The Links page is a collection of links to useful web sites that will be useful to users of this database, including links to software for viewing and manipulating sequence, alignment, and and structure files provided here.


Last updated
James W. Brown
Department of Microbiology, NC State University
Raleigh, NC 27695 USA
james_brown@ncsu.edu
www.mbio.ncsu.edu/JWB