Convert Gene IDs between databases
convertID.Rd
convertID takes a phylomap with gene IDs from one database
and converts them gene IDs used in another database. This function wraps around the packages biomartr
, stringr
and dplyr
.
Usage
convertID(
phylomap = phylomap,
mart = "ENSEMBL_MART_ENSEMBL",
dataset = NULL,
attributes = c("ensembl_gene_id", "ensembl_peptide_id"),
filters = "uniprot_gn_id",
split_uniprot_gene = TRUE
)
Arguments
- phylomap
a phylomap dataset, e.g.
phylomapr::Homo_sapiens.PhyloMap
, whose GeneIDs are to be converted.- mart
a character string specifying the mart to be used. Users can obtain available marts using
biomartr::getMarts()
.- dataset
a character string specifying the dataset within the mart to be used, e.g.
dataset = "hsapiens_gene_ensembl"
.- attributes
a character vector specifying the attributes that shall be used, e.g.
attributes = c("ensembl_gene_id", "ensembl_peptide_id")
.- filters
a character vector specifying the filter (query key) for the BioMart query,
e.g. filter = "uniprot_gn_id"
.- split_uniprot_gene
a Boolean value specifying whether the uniprot geneIDs (e.g. sp|A0A061ACU2|PIEZ1_CAEEL) should be split (via
stringr::str_split(x, "[|]"))[2])
)
Details
Gene IDs differ between databases used. Through convertID
, users can obtain
the corresponding Gene IDs from a different database, e.g. ENSEMBL.
Note: the lowest (oldest) phylostratum for each gene is chosen.
References
Lotharukpong JS et al. (2023) (unpublished)
Drost HG, Paszkowski J. Biomartr: genomic data retrieval with R. Bioinformatics (2017) 33(8): 1216-1217. doi:10.1093/bioinformatics/btw821.
Examples
# load the first 100 genes from the Homo_sapiens.PhyloMap.
phylomap_example <- head(phylomapr::Homo_sapiens.PhyloMap, 100)
# convert phylomap from uniprot to ensembl gene IDs.
converted_phylomap_example <- convertID(
phylomap = phylomap_example,
mart = "ENSEMBL_MART_ENSEMBL",
dataset = "hsapiens_gene_ensembl",
filters = "uniprot_gn_id"
)
#> Starting Gene ID conversion...
#> Starting BioMart query ...
#>
#>
#> Please cite: Drost HG, Paszkowski J. Biomartr: genomic data retrieval with R. Bioinformatics (2017) 33(8): 1216-1217. doi:10.1093/bioinformatics/btw821.
# Previously
head(phylomap_example)
#> # A tibble: 6 × 2
#> Phylostratum GeneID
#> <dbl> <chr>
#> 1 1 sp|A0A024RBG1|NUD4B_HUMAN
#> 2 1 sp|A0A075B6H7|KV37_HUMAN
#> 3 1 sp|A0A075B6H8|KVD42_HUMAN
#> 4 1 sp|A0A075B6H9|LV469_HUMAN
#> 5 1 sp|A0A075B6I0|LV861_HUMAN
#> 6 1 sp|A0A075B6I1|LV460_HUMAN
# Converted
head(converted_phylomap_example)
#> # A tibble: 6 × 2
#> Phylostratum GeneID
#> <dbl> <chr>
#> 1 1 ENSG00000177144
#> 2 1 ENSG00000211623
#> 3 1 ENSG00000211632
#> 4 1 ENSG00000211633
#> 5 1 ENSG00000211637
#> 6 1 ENSG00000211638