cogent3.core.sequence.DnaSequence#
- class DnaSequence(moltype: MolType[Any], seq: str | bytes | ndarray[tuple[Any, ...], dtype[integer]] | SeqViewABC, *, name: str | None = None, info: dict[str, Any] | Info | None = None, annotation_offset: int = 0, annotation_db: AnnotationDbABC | None = None)#
Holds the standard DNA sequence.
- Attributes:
annotation_dbthe annotation database for the collection
annotation_offsetThe offset between annotation coordinates and sequence coordinates.
- info
- moltype
- name
Methods
add_feature(*, biotype, name, spans[, ...])add a feature to annotation_db
annotate_matches_to(pattern, biotype, name)Adds an annotation at sequence positions matching pattern.
can_match(other)Returns True if every pos in self could match same pos in other.
can_mispair(other)Returns True if any position in self could mispair with other.
can_pair(other)Returns True if self and other could pair.
complement()Returns complement of self, using data from MolType.
copy([exclude_annotations, sliced])returns a copy of self
copy_annotations(seq_db)copy annotations into attached annotation db
count(item)count() delegates to self._seq.
count_ambiguous()Returns the number of ambiguous characters in the sequence.
count_degenerate()Counts the degenerate bases in the specified sequence.
count_gaps()Counts the gaps in the specified sequence.
count_kmers([k, use_hook])return array of counts of all possible kmers of length k
count_variants()Counts number of possible sequences matching the sequence, given any ambiguous characters in the sequence.
counts([motif_length, include_ambiguity, ...])returns dict of counts of motifs
degap()Deletes all gap characters from sequence.
diff(other)Returns number of differences between self and other.
disambiguate([method])Returns a non-degenerate sequence from a degenerate one.
distance(other[, function])Returns distance between self and other using function(i,j).
frac_diff(other)Returns fraction of positions where self and other differ.
frac_diff_gaps(other)Returns frac.
frac_diff_non_gaps(other)Returns fraction of non-gap positions where self differs from other.
frac_same(other)Returns fraction of positions where self and other are the same.
frac_same_gaps(other)Returns fraction of positions where self and other share gap states.
frac_same_non_gaps(other)Returns fraction of non-gap positions where self matches other.
frac_similar(other, similar_pairs)Returns fraction of positions where self[i] is similar to other[i].
from_rich_dict(data)create a Sequence object from a rich dict
gap_indices()Returns array of the indices of all gaps in the sequence
gap_vector()Returns vector of True or False according to which pos are gaps or missing.
get_drawable(*[, biotype, width, vertical])make a figure from sequence features
get_drawables(*[, biotype])returns a dict of drawables, keyed by type
get_features(*[, biotype, name, start, ...])yields Feature instances
get_in_motif_size([motif_length, warn])returns sequence as list of non-overlapping motifs
get_kmers(k[, strict])return all overlapping k-mers
get_name()Return the sequence name -- should just use name instead.
get_translation([gc, incomplete_ok, ...])translate to amino acid sequence
get_type()Return the sequence type as moltype label.
has_annotation_db()returns True if self has annotation db
has_terminal_stop([gc, strict])Return True if the sequence has a terminal stop codon.
is_annotated([biotype])returns True if sequence parent name has any annotations
is_degenerate()Returns True if sequence contains degenerate characters.
is_gapped()Returns True if sequence contains gaps.
is_strict()Returns True if sequence contains only monomers.
is_valid()Returns True if sequence contains no items absent from alphabet.
iter_kmers(k[, strict])generates all overlapping k-mers.
make_feature(feature, *args)return an Feature instance from feature data
matrix_distance(other, matrix)Returns distance between self and other using a score matrix.
must_pair(other)Returns True if all positions in self must pair with other.
mw([method, delta])Returns the molecular weight of (one strand of) the sequence.
parent_coordinates([apply_offset])returns seqid, start, stop, strand of this sequence on its parent
parse_out_gaps()returns Map corresponding to gap locations and ungapped Sequence
rc()Converts a nucleic acid sequence to its reverse complement.
replace_annotation_db(value[, check])public interface to assigning the annotation_db
resolved_ambiguities()Returns a list of sets of strings.
reverse_complement()Converts a nucleic acid sequence to its reverse complement.
sample(*, n, with_replacement, motif_length, ...)Returns random sample of positions from self, e.g. to bootstrap.
shuffle()returns a randomized copy of the Sequence object
sliding_windows(window, step[, start, end])Generator function that yield new sequence objects of a given length at a given interval.
strand_symmetry([motif_length])returns G-test for strand symmetry
strip_bad()Removes any symbols not in the alphabet.
strip_bad_and_gaps()Removes any symbols not in the alphabet, and any gaps.
strip_degenerate()Removes degenerate bases by stripping them out of the sequence.
to_array([apply_transforms])returns the numpy array
to_dna()Returns copy of self as DNA.
to_fasta([make_seqlabel, block_size])Return string of self in FASTA format, no trailing newline
to_html([wrap, limit, colors, font_size, ...])returns html with embedded styles for sequence colouring
to_json()returns a json formatted string
to_moltype(moltype)returns copy of self with moltype seq
to_phylip([name_len, label_len])Return string of self in one line for PHYLIP, no newline.
to_rich_dict([exclude_annotations])returns {'name': name, 'seq': sequence, 'moltype': moltype.label}
to_rna()Returns copy of self as RNA.
trim_stop_codon([gc, strict])Removes a terminal stop codon from the sequence
with_masked_annotations(biotypes[, ...])returns a sequence with annot_types regions replaced by mask_char if shadow is False, otherwise all other regions are masked.
with_termini_unknown()Returns copy of sequence with terminal gaps remapped as missing.
write(filename[, format_name])Write the sequence to a file.
gapped_by_map
gapped_by_map_motif_iter
gapped_by_map_segment_iter
to_dict