multiMiR一 包搞定miRNA与靶基因预测问题

multiMiR一 包搞定miRNA与靶基因预测问题

multiMiR

一 包搞定miRNA与靶基因预测问题

嗨,大家好!做过miRNA相关研究的小伙伴一定都经历过这样的问题,即已知miRNA检索其靶基因都有哪些?或者是已知功能基因检索靶向其的miRNA都有哪些?目前已有的miRNA与靶基因调节关系的数据库有很多,如Tarbase、miRWalk、TargetScan、miRanda、RNAhybrid、TargetFinder、miRcode、miRDB、RNA22、starBase、miRTarBase、miRPathDB和ENCORI等。在应用时登录这些数据库逐个检索的话往往花费大量时间,尤其是某些数据库仅支持单个基因或miRNA检索时往往令人头疼。那么,今天带大家学习的是multiMiR包,内置14个miRNA相关数据库,仅一包搞定miRNA与靶基因预测问题,一起来看看吧~!

▌multiMiR基本信息

multiMiR包发布于2014年7月,当前版本于2020年4月更新,收录4 个外部数据库,其中8 个为预测的 miRNA-靶基因相互作用关系(DIANA-microT-CDS、ElMMo、MicroCosm、miRanda、miRDB、PicTar、PITA 和 TargetScan),其中有3 个为实验验证的 miRNA-靶基因相互作用关系(miRecords、miRTarBase 和TarBase),以及3个包含药物/疾病等相关信息的数据库(miR2Disease、Pharmaco-miR 和 PhenomiR),为基于miRNA-靶基因调控关系的疾病发病机制及诊疗相关研究提供极大便利。

图片[1]-multiMiR一 包搞定miRNA与靶基因预测问题-一鸣资源网

关于几个参数介绍一下:

get_multimir(url = NULL, org = “hsa”, mirna = NULL, target = NULL,disease.drug = NULL, table = “validated”, predicted.cutoff = NULL,predicted.cutoff.type = “p”, predicted.site = “conserved”,summary = FALSE, add.link = FALSE, use.tibble = FALSE, limit = NULL,legacy.out = FALSE)

org:表示物种来源,默认为人类。即org = “hsa” “human”或 “Homo Sapiens”,其他两个物种是小鼠 (“mmu”, “mouse”或”Mus musculus”)和大鼠 (“rno”, “rat”或”Rattus norvegicus”)。

mirna:表示待检索的miRNA,默认为NULL。输入miRNA可以是miRNA accession number (i.e. “MIMAT0000072”), mature miRNA ID (i.e. “hsa-miR-199a-3p”), 或二者都有 (i.e. c(“MIMAT0000065”, “hsa-miR-30a-5p”)),可以是单个miRNA或miRNA列表。

target:表示待检索靶基因,默认为NULL。输入基因可以是gene symbol (i.e. c(“TP53”, “KRAS”)),Entrez gene ID (i.e. c(578, 3845))、ensembl gene ID (i.e. “ENSG00000171791”),或三者都有 (i.e. c(“TP53”, 3845, “ENSG00000171791”)),可以是单个基因或基因列表。

disease.drug:表示检索疾病/药物相关miRNA与靶基因关系,默认为NULL。输入参数可以是 disease(s) 和/或 drug(s) (i.e. c(“bladder cancer”, “cisplatin”))。

table:表示检索的数据库类型。默认validated,即实验验证的miRNA与靶基因关系数据库(”mirecords”, “mirtarbase”, and “tarbase”), 还可以是predicted,即网站预测的miRNA与靶基因关系数据库(“dianamicrot”, “elmmo”, “microcosm”, “miranda”, “mirdb”, “pictar”, “pita”, and “targetscan”), 或者是disease.drug,即疾病/药物相关性miRNA与靶基因关系数据库 ( “mir2disease”, “pharmacomir”, and “phenomir”), 若输入all表示检索以上全部数据库,或者输入某个数据库名称进行单个数据库检索。

predicted.cutoff和predicted.cutoff.type:表示数据库检索范围,默认为NULL,即若predicted.cutoff.type=”p”,检索范围为预测评分TOP20%,若predicted.cutoff.type=”n”,检索范围为预测评分TOP300000。

predicted.site:表示predicted数据库中检索保守(”conserved”)或非保守(”nonconserved”)位点,或不限制保守或非保守位点(”all”)。

summary:表示是否统计检索结果,默认FALSE。

1

示例一:检索给定miRNA所有经过验证的靶基因

以hsa-miR-182-5p为例,检索经实验验证的miRNA与靶基因调控关系,检索结果example1为S4格式文件,目录下data为检索结果列表。

图片[2]-multiMiR一 包搞定miRNA与靶基因预测问题-一鸣资源网

2

示例二:检索与给定药物/疾病相关的 miRNA-靶基因

以检索与顺铂有关的人类miRNA-靶基因相互作用关系为例,定义isease.drug参数为顺铂”cisplatin”,table 参数为’disease.drug’,结果有45个。

图片[3]-multiMiR一 包搞定miRNA与靶基因预测问题-一鸣资源网

# 检索与顺铂有关的人类miRNA-靶基因相互作用关系example2 <- get_multimir(org = “hsa”, disease.drug = ‘cisplatin’, # 顺铂 table = ‘disease.drug’, summary = TRUE)# Searching mir2disease …# Searching pharmaco_mir …# Searching phenomir …table(example2@data$type) # 查看检索结果# disease.drug # 45 example2_result <- example2@datahead(example2_result)# database mature_mirna_acc mature_mirna_id target_symbol target_entrez# 1 pharmaco_mir MIMAT0000772 hsa-miR-345-5p ABCC1 4363# 2 pharmaco_mir MIMAT0000720 hsa-miR-376c-3p ALK7 # 3 pharmaco_mir MIMAT0000423 hsa-miR-125b-5p BAK1 578# 4 pharmaco_mir hsa-miR-34 BCL2 596# 5 pharmaco_mir MIMAT0000318 hsa-miR-200b-3p BCL2 596# 6 pharmaco_mir MIMAT0000617 hsa-miR-200c-3p BCL2 596# target_ensembl disease_drug paper_pubmedID type# 1 ENSG00000103222 cisplatin 20099276 disease.drug# 2 cisplatin 21224400 disease.drug# 3 ENSG00000030110 cisplatin 21823019 disease.drug# 4 ENSG00000171791 cisplatin 18803879 disease.drug# 5 ENSG00000171791 cisplatin 21993663 disease.drug# 6 ENSG00000171791 cisplatin 21993663 disease.drug

3

示例三:检索靶向调节给定基因的miRNA

以检索网站预测的小鼠中靶向调节Gnb1基因的miRNA为例,检索范围设置为预测评分TOP 35%,检索结果绘制upset展示。

# 检索网站预测的小鼠中靶向调节Gnb1基因的miRNAexample3 <- get_multimir(org = “mmu”, # 物种来源 hsa/mmu/rno target = “Gnb1”, # 目标靶基因 table = “predicted”, # 预测的miRNA-target summary = TRUE, predicted.cutoff = 35, # 检索范围为预测评分TOP 35% predicted.cutoff.type = “p”, # 检索范围为预测评分TOP 35% predicted.site = “all”) # conserved/nonconserved/all# Searching diana_microt …# Searching elmmo …# Searching microcosm …# Searching miranda …# Searching mirdb …# Searching pictar …# Searching pita …# Searching targetscan …table(example3@data$type) # 查看检索结果# predicted # 716 example3_result <- example3@datahead(example3_result)# database mature_mirna_acc mature_mirna_id target_symbol target_entrez# 1 diana_microt MIMAT0000663 mmu-miR-218-5p Gnb1 14688# 2 diana_microt MIMAT0017276 mmu-miR-493-5p Gnb1 14688# 3 diana_microt MIMAT0000656 mmu-miR-139-5p Gnb1 14688# 4 diana_microt MIMAT0014946 mmu-miR-3074-2-3p Gnb1 14688# 5 diana_microt MIMAT0000144 mmu-miR-132-3p Gnb1 14688# 6 diana_microt MIMAT0020608 mmu-miR-5101 Gnb1 14688# target_ensembl score type# 1 ENSMUSG00000029064 0.975 predicted# 2 ENSMUSG00000029064 0.964 predicted# 3 ENSMUSG00000029064 0.96 predicted# 4 ENSMUSG00000029064 0.921 predicted# 5 ENSMUSG00000029064 0.92 predicted# 6 ENSMUSG00000029064 0.918 predictedexample3_sum <- example3@summary # 统计检索结果head(example3_sum)apply(example3@summary[, 6:13], 2, sum)# diana_microt elmmo microcosm miranda mirdb pictar # 105 108 5 49 59 18 # pita targetscan # 175 197

获得8个数据库预测结果,接下来绘制upset图,首先准备upset绘图文件,如下图所示。

图片[4]-multiMiR一 包搞定miRNA与靶基因预测问题-一鸣资源网

以已知miRNA列表预测靶基因为例,结果14个数据库预测结果中,实验验证的有24806个,网站预测的有37331个,疾病/药物相关的有451个。

# 已知miRNA列表预测靶基因example4 <- get_multimir(org = “hsa”, mirna = DE.miRNA.up, table = “all”, # 检索14个数据库 summary = TRUE, predicted.cutoff.type = “p”, predicted.cutoff = 10, use.tibble = TRUE)# Searching mirecords …# Searching mirtarbase …# Searching tarbase …# Searching diana_microt …# Searching elmmo …# Searching microcosm …# Searching miranda …# Searching mirdb …# Searching pictar …# Searching pita …# Searching targetscan …# Searching pharmaco_mir …# Joining, by = c(“database”, “mature_mirna_acc”, “mature_mirna_id”, “target_symbol”, “target_entrez”, “target_ensembl”, “type”)# Joining, by = c(“database”, “mature_mirna_acc”, “mature_mirna_id”, “target_symbol”, “target_entrez”, “target_ensembl”, “type”)save(example4, file = “example4.Rdata”) # 时间较长,建议及时保存检索结果load(file = “example4.Rdata”)table(example4@data$type) # disease.drug predicted validated # 451 37331 24806 example4_result <- example4@datahead(example4_result)example4_sum <- example4@summaryhead(example4_sum)apply(example4@summary[, 6:23], 2, sum)# diana_microt elmmo microcosm mir2disease miranda mirdb # 9015 18378 696 114 4048 2722 # mirecords mirtarbase pharmaco_mir phenomir pictar pita # 115 4175 9 328 25 1656 # tarbase targetscan disease.sum predicted.sum validated.sum all.sum # 20516 791 22 21626 22970 45847# 已知mRNA列表检索miRNAexample4_2 <- get_multimir(org = “hsa”, target = DE.entrez.dn, table = “all”, # 检索14个数据库 summary = TRUE, predicted.cutoff.type = “p”, predicted.cutoff = 10, use.tibble = TRUE)save(example4_2, file = “example4_2.Rdata”) # 时间较长,建议及时保存检索结果load(file = “example4_2.Rdata”)table(example4@data$type) example4_result2 <- example4@datahead(example4_result2)example4_sum2 <- example4@summaryhead(example4_sum2)apply(example4@summary[, 6:23], 2, sum)

2

示例五:检索一组 miRNA 和一组基因之间的相互作用

以示例数据进行演示,示例数据DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因,加载方式为load(url(“http://multimir.org/bladder.rda”))。结果显示,网站预测、实验验证和疾病/药物相关的miRNA与靶基因调节关系有160个、98个和442个。

# 加载示例数据load(url(“http://multimir.org/bladder.rda”))# 数据库检索example5 <- get_multimir(org = “hsa”, mirna = DE.miRNA.up, target = DE.entrez.dn, table = “all”, summary = TRUE, predicted.cutoff.type = “p”, predicted.cutoff = 10, use.tibble = TRUE)save(example5, file = “example5.Rdata”)load(file = “example5.Rdata”)table(example5@data$type) # disease.drug predicted validated # 442 160 98 example5_result <- example5@datahead(example5_result)example5_sum <- example5@summaryhead(example5_sum)apply(example5@summary[, 6:19], 2, sum)# diana_microt elmmo mir2disease miranda mirdb mirtarbase # 27 92 114 13 11 8 # phenomir pita tarbase targetscan disease.sum predicted.sum # 328 14 90 3 16 88 # validated.sum all.sum # 91 198

接下来,筛选出实验验证的检索结果,有85个miRNA与靶基因调节关系,其中7个miRNA和41个mRNA,可按照不同数据库展示检索结果。

# 筛选实验验证的结果 result <- select(example5, keytype = “type”, keys = “validated”, columns = columns(example5)) # 去重unique_pairs <- result[!duplicated(result[, c(“mature_mirna_id”, “target_entrez”)]), ] # 去重方法一unique_pairs_2 <- unique(data.frame(miRNA.ID = as.character(result$mature_mirna_id), target.Entrez = as.character(result$target_entrez))) # 去重方法二nrow(unique_pairs) # 85个miRNA-targetnrow(unique_pairs_2) # 85个miRNA-targetunique_miRNA <- unique(result$mature_mirna_id)length(unique_miRNA) # 7个miRNAunique_target <- unique(result$target_symbol)length(unique_target) # 41个mRNA# 按照不同数据库展示检索结果example5_split <- split(result, result$database) # 按照数据库分类example5_split$mirtarbase# # A tibble: 8 x 13# database mature_mirna_acc mature_mirna_id target_symbol target_entrez# # 1 mirtarbase MIMAT0000087 hsa-miR-30a-5p FDX1 2230 # 2 mirtarbase MIMAT0000259 hsa-miR-182-5p CUL5 8065 # 3 mirtarbase MIMAT0000418 hsa-miR-23b-3p RRAS2 22800 # 4 mirtarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998 # 5 mirtarbase MIMAT0000418 hsa-miR-23b-3p SWAP70 23075 # 6 mirtarbase MIMAT0000087 hsa-miR-30a-5p PEG10 23089 # 7 mirtarbase MIMAT0000245 hsa-miR-30d-5p PEG10 23089 # 8 mirtarbase MIMAT0000420 hsa-miR-30b-5p PEG10 23089 # # … with 8 more variables: target_ensembl , experiment ,# # support_type , pubmed_id , type , score ,# # disease_drug , paper_pubmedID example5_split$tarbase# # A tibble: 90 x 13# database mature_mirna_acc mature_mirna_id target_symbol target_entrez# # 1 tarbase MIMAT0000418 hsa-miR-23b-3p LIMA1 51474 # 2 tarbase MIMAT0000449 hsa-miR-146a-5p LIMA1 51474 # 3 tarbase MIMAT0000259 hsa-miR-182-5p FOXN3 1112 # 4 tarbase MIMAT0000449 hsa-miR-146a-5p FOXN3 1112 # 5 tarbase MIMAT0000418 hsa-miR-23b-3p BCAT1 586 # 6 tarbase MIMAT0000449 hsa-miR-146a-5p BCAT1 586 # 7 tarbase MIMAT0000080 hsa-miR-24-3p BCAT1 586 # 8 tarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998 # 9 tarbase MIMAT0000259 hsa-miR-182-5p LIMCH1 22998 # 10 tarbase MIMAT0000259 hsa-miR-182-5p IPO5 3843 # # … with 80 more rows, and 8 more variables: target_ensembl ,# # experiment , support_type , pubmed_id , type ,# # score , disease_drug , paper_pubmedID

我们在筛选一个与膀胱癌相关的检索结果,即hsa-miR-23b-3p和hsa-miR-146a-5p。再检索一下实验验证的靶基因,以hsa-miR-23b-3p为例,结果以Veen图展示,靶基因有3个,即”NOTCH1″,”PLAU”和”MET”。

# 筛选膀胱癌相关结果 mykeys <- keys(example5, keytype = “disease_drug”)mykeys <- mykeys[grep(“bladder”, mykeys, ignore.case = TRUE)]result2 <- select(example5, keytype = “disease_drug”, keys = mykeys, columns = columns(example5))unique_pairs2 <- result2[!duplicated(result[, c(“mature_mirna_id”, “target_entrez”)]), ]nrow(unique_pairs2) # 2个miRNA-target# [1] 2unique_pairs2# # A tibble: 2 x 13# database mature_mirna_acc mature_mirna_id target_symbol target_entrez# # 1 mir2disease MIMAT0000418 hsa-miR-23b-3p NA NA # 2 phenomir MIMAT0000449 hsa-miR-146a-5p NA NA # # … with 8 more variables: target_ensembl , experiment ,# # support_type , pubmed_id , type , score ,# # disease_drug , paper_pubmedID # 以hsa-miR-23b-3p为例检索实验验证的靶基因example5_2 <- get_multimir(org = “hsa”, mirna = ‘hsa-miR-23b-3p’, table = “validated”, summary = TRUE) apply(example5_2@summary[, 6:10], 2, sum)# mirecords mirtarbase tarbase validated.sum all.sum # 7 433 3660 3750 3750 example1_Lucifer <- example5_2@data[grep(“Luciferase”, example1_result[, “experiment”]), ]nrow(example1_Lucifer) # Luciferase实验验证结果4条# [1] 42# 绘制Veen图library(VennDiagram) library(tidyverse)db <- unique(example5_2@data$database)row1 % filter(database == db[1])row2 % filter(database == db[2])row3 % filter(database == db[3])venn.plot <-venn.diagram( x = list(target1=unique(row1$target_symbol), target2=unique(row2$target_symbol), target3=unique(row3$target_symbol)), filename =”Venn1.tif”, lty =”dotted”, lwd =0.5, col =”black”, fill =c(“dodgerblue”, “goldenrod1”, “darkorange1”), alpha =0.60, cat.col =c(“dodgerblue”, “goldenrod1”, “darkorange1″), cat.cex =1, cat.fontface=”bold”, margin =0.05, cex =1)# 获取交集venn_list <- list(target1=unique(row1$target_symbol), target2=unique(row2$target_symbol), target3=unique(row3$target_symbol))for (i in 1:length(venn_list)) { if(i == 1){interGenes <- venn_list[[1]]} else{interGenes <- intersect(interGenes, venn_list[[i]])}}interGenes[1] “NOTCH1” “PLAU” “MET”

图片[5]-multiMiR一 包搞定miRNA与靶基因预测问题-一鸣资源网

以上就是multiMiR包全部内容,开发并维护R包不易,小伙伴们使用时别忘记引用以下文献哦~!

Ru Y, Kechris KJ, Tabakoff B, Hoffman P, Radcliffe RA, Bowler R, Mahaffey S, Rossi S, Calin GA, Bemis L, Theodorescu D. The multiMiR R package and database: integration of microRNA-target interactions along with their disease and drug associations. Nucleic Acids Res. 2014;42(17):e133. doi: 10.1093/nar/gku631. Epub 2014 Jul 24. PMID: 25063298; PMCID: PMC4176155.

图片[6]-multiMiR一 包搞定miRNA与靶基因预测问题-一鸣资源网

© 版权声明
THE END