pyTEnrich class & functions¶

pyTEnrich.Analyser¶

class pyTEnrich.Analyser.Analyser(peak_vs_subfams, peak_vs_fams, db_container)¶

Bases: object

Launch analysis, statistical enrichment on a Genomic_region_container. Adjustment of p-values are launched on each containers independently. This object contains all containers and do counting / stats of the overlaps

Parameters

peak_vs_subfams (Genomic_region_container) – The container object with all relevant information for the comparison peak_in_te to be done at the TE subfamily level. Contains overlap between groups, probabilities based on genome occupancy, etc…
peak_vs_fams (Genomic_region_container) – same as above, but with TE families instead of TE subfamilies.
db_container (Db_container) – contains loaded databases with associated summaries

counting(intersect_file)¶

Here we open and read the file after bedtools intersection, to increment our containers

Parameters: intersect_file (str) – path to intersection files

fdr(p_vals)¶: Compute adjusted p-values using Benjamin-Hochsberg procedure

get_pd_stats(genomic_container, bedname)¶

Main function to generate stats with a binomial model

Parameters

genomic_container (Genomic_region_container) – containers associated with a specific grouping of TEs (e.g. TE subfamily) to be used for counting and to make stats.
bedname (str) – name of the sample (linked to input bed file) for which we compute the stats.

Returns

object containing stats results, one line for each TE group

Return type

pd_stats (pandas.DataFrame)

get_significance(pval)¶

Draw significance as stars according to input p-value

Parameters: pval (float) – input p-value
Returns: string containing n.s//*/*/**
Return type: significance (str)

single_task(bedname)¶

This function is handling all the analysis (subfam/fam) for one single bed. This is the function launche by multi-processing unit.

Parameters: bedname (str) – sample name on which we compute stats for each TE grouping type

write_stats(out_dir)¶: Make enrichment statistics and write down results

pyTEnrich.Bedtools_launcher¶

class pyTEnrich.Bedtools_launcher.Bedtools_launcher(bedtools_options, db_container=None)¶

Bases: object

This class handles calls to bedtools to make intersection between input bed files and TE database

Parameters

bedtools_options – options used for bedtools intersection between TE and bed files
db_container – Db_container object with all database and input/output information

clean_up_temp()¶: Clean up temp files

intersect()¶: Make intersection using multi-intersect bedtools

reformat_intersect()¶: If only one bed file is provided, reformat intersection file

pyTEnrich.Db_container¶

class pyTEnrich.Db_container.Db_container(out_dir, in_dir=None, genome_subset=None, te_db=None, size_genome=2861328253, idx_sfam=7, idx_fam=6)¶

Bases: object

This class load / initialize databases used for enrichment analysis

Parameters

out_dir (str) – output directory
te_db (str) – path to transposable element database to be loaded (should be bed format)
genome_subset (str) – path to genome subset bed file
in_dir (str) – path to input directory where input bed files should be
size_genome (int) – genome size in bp - used to compute probabilities in Analyser
idx_sfam (int) – index indicating column with TE subfamily names in te_db file
idx_fam (int) – index indicating column with TE family names in te_db file

check_te_db()¶: Check is TE database exists and has good shape

clean_up_temp()¶: Clean up temp files in output directory

compute_size_genome()¶: Here we use the genome subset file to compute the new genome size to consider for enrichment analysis

get_names(list_beds)¶

Return names as [str] for each bed files

Parameters: list_beds (list) – list containing path to input bed files

handle_genome_subset()¶: If a genome subset is provided, this function launch new TE summaries generation

make_bed_summary()¶: Make a summary for input bed files (bp coverage, peak average size)

make_peak_subset()¶: Subset multiple bed files using a genome subset

make_single_peak_subset(bed)¶

Do subset for a single bed file

Parameters: bed – input bed file to be subseted

make_te_subset()¶: Subset TE database with genome subset and make new TE genome occupancy summaries Rely on a predefine perl script utils/make_ref_TE.pl to compute genome occupancy

make_te_summary()¶: make subfam / fam summaries using utils/make_ref_file.pl

sort_genome_subset()¶: If not set to None, sort the genome subset with UNIX sort

pyTEnrich.Genomic_region_container¶

class pyTEnrich.Genomic_region_container.Genomic_region_container(peak_summary)¶

Bases: object

Contains all Genomic_regions objects and control them - e.g. contains all subfams or all fams (one container by grouping type)

increment_peak_n_i(name1, name2)¶: Add one to the overlap on the bed side

increment_te_n_i(name1, name2)¶

Add one to the overlap between group name1 and group name2

Parameters

name1 – reference genomic region, it needs to be in summary
name2 – the other region to which we intersect - usually comes from input bed files

load_te_summary(summary_file)¶: Load TE summary information to the right Genomic_region object

pyTEnrich.Genomic_regions¶

class pyTEnrich.Genomic_regions.Genomic_regions(name, list_targets)¶

Bases: object

For group of regions (TE family, subfamily or group of peaks from same TF), define the number of overlap, the name and the “targets”

Parameters

name – name of the transposon group (e.g. subfam name)
list_targets – list of targets for a given group

Returns

total number of element in this transposon group n_i: dictionnary containing number of intersection between this group and group named as hashtag name: this group’s name (e.g. TE family name)

Return type

n_T

increment_n_i(name_bed)¶

add one to an intersection. Name correspond to bed file (e.g. TF name) n_i corresponds to observed intersection with bed file

Parameters: name_bed – string with name associated with bed file (hashtag to use in n_i dictionnary)

pyTEnrich.funs¶

pyTEnrich.funs.basen_no_ext(my_str)¶

Return the name of the file without directory path nor extension

Parameters: my_str (str) – string to convert

pyTEnrich.funs.create_dir(d)¶: Function to create directory

pyTEnrich.funs.logger(comment)¶

Function to print Class::Name - Comments from any classes

Parameters: comment (str) – comment to add to the logger

pyTEnrich.funs.test_gz_file(filepath)¶: Detect gzip compressed file