302–303NucleicAcidsResearch,2000,Vol.28,No.12000OxfordUniversityPress
TheEukaryoticPromoterDatabase(EPD)
RouaïdaCavinPérier,VivianePraz,ThomasJunier,ClaudeBonnardandPhilippBucher*
SwissInstituteofBioinformaticsandSwissInstituteforExperimentalCancerResearch,Ch.desBoveresses155,1066-Epalingess/Lausanne,Switzerland
ReceivedOctober6,1999;AcceptedOctober8,1999
ABSTRACT
TheEukaryoticPromoterDatabase(EPD)isananno-tatednon-redundantcollectionofeukaryoticPOLIIpromotersforwhichthetranscriptionstartsitehasbeendeterminedexperimentally.Accesstopromotersequencesisprovidedbypointerstopositionsinnucleotidesequenceentries.Theannotationpartofanentryincludesadescriptionoftheinitiationsitemappingdata,exhaustivecross-referencestotheEMBLnucleotidesequencedatabase,SWISS-PROT,TRANSFACandotherdatabases,aswellasbiblio-graphicreferences.EPDisstructuredinawaythatfacilitatesdynamicextractionofbiologicallymean-ingfulpromotersubsetsforcomparativesequenceanalysis.WWW-basedinterfaceshavebeendevel-opedthatenabletheusertoviewEPDentriesindifferentformats,toselectandextractpromotersequencesaccordingtoavarietyofcriteria,andtonavigatetorelateddatabasesexploitingdifferentcross-references.TheEPDwebsitealsofeaturesyearlyupdatedbasefrequencymatricesformajoreukaryoticpromoterelements.EPDcanbeaccessedathttp://www.epd.isb-sib.chDATABASEDESCRIPTION
Thetermpromoterhastwodifferentmeaningsinbiology:(i)ageneregionimmediatelyupstreamofatranscriptioninitiationsite,and(ii)acis-actinggeneticelementcontrollingtherateoftranscriptioninitiationofagene.TheEukaryoticPromoterDatabase(EPD)isadatabaseofpromotersintheformersense.InformationaboutpromotersinthelattersensecanbefoundinotherdatabasessuchasTRANSFAC(1),ooTFD(2),TRRD(3),PlantCARE(4)andPLACE(5).
EPDwasoriginallydesignedasaresourceforcomparativesequenceanalysisand,assuch,hasplayedaninstrumentalroleinthecharacterizationofeukaryotictranscriptioncontrolelements(6,7),aswellasinthedevelopmentofeukaryoticpromoterpredictionalgorithms(8).Themainpurposeofthedatabaseistokeeptrackofexperimentaldatathatdefinetranscriptioninitiationsitesofeukaryoticgenes.Thistypeoffunctionalinformationislinkedtopromotersequencesviamachine-readablepointerstopositionswithinsequencesoftheEMBLnucleotidesequencedatabase(9).
EPDisarigorouslyselected,curatedandquality-controlleddatabase.Inordertobeincluded,apromotermustfulfillanumberofconditionslaiddownintheusermanual.Mostimportantly,thetranscriptionstartsitemustbemappedexperimentallywithanestimatedprecisionof 5bporhigher.AllinformationinEPDoriginatesfromacriticalexaminationandindependentinterpretationoftheexperimentaldatapresentedinthecitedresearchpublications.PublishedconclusionsandfeaturetableannotationsinEMBLentriesareneverblindlyreliedupon.Atpresent,EPDisconfinedtopromotersrecognizedbytheRNAPOLIIsystemofhighereukaryotes(multicellularplantsandanimals).Notethatthisrestrictiondoesnotaprioriexcludeviralpromoters.
EPDisalsoastrictlynon-redundantdatabase.Thegeneralruleisthatoneentrycorrespondstoonetranscriptioninitiationsiteinagenome.Organismsaredistinguishedatthetaxonomiclevelofthespecies.Accordingtothispolicy,datafromdifferentliteraturesourcespertainingtothesametranscriptioninitiationsitesarerepresentedbythesameentry.Likewise,promotersbelongingtodifferentallelesofthesamegene,ortothesamegeneindifferentsubspecies,arecoveredbythesameentryregardlessofwhethertheydifferinsequence.Theusermanualprovidesmoredetailsabouthowcertainnon-trivialcasessuchaspromotersoftandemlyrepeatedgenesorretro-transposableelements,arehandled.
AcomprehensivedescriptionofthecontentsandformatofEPDhasbeenpublishedearlier(10).Userinterfacesandsoftwaresupportforlocalinstallationshavebeenpreviouslydescribed(11).
RECENTDEVELOPMENTSDatabase
Theobjectiveofexhaustivecross-referencingbetweenEPDpromotersandEMBLsequencesisbeinggivenhighpriorityatthemoment,especiallywithregardtogenomesthatarecomplete(Caenorhabditiselegans)oratanadvancedstageofsequencing(Arabidopsis,Drosophila,human).Asaconsequence,thenumberofEMBLcross-referenceshasincreasedby>1000sincelastyear(Table1).Moreover,theinternalEPDcross-referenceshavebeenrevised.Untilnow,suchlinkswereonlyusedtoconnectalternativepromotersofthesamegene.Infuturereleases,promotersofdifferentgenesoccurringatashortdistancefromeachother(
*Towhomcorrespondenceshouldbeaddressed.Tel:+[1**********];Fax:+[1**********];Email:[email protected]
hasbeenintroducedandsofarbeenpopulatedwithkeywordsimportedfromSWISS-PROT(12).Thisfeatureisintendedtoenhancethequerycapabilitiesofvariousaccesstools.Additionalkeywordsrelatingtopropertiesofthepromoterratherthantopropertiesofthecorrespondinggeneproductwillbeaddedinthenearfuture.
Table1.Databasecross-referencesinEPDrelease60DatabaseNumberoflinksEPDinternal188EMBL(9)2978TRANSFAC(1)1700SWISS-PROT(12)1058FlyBase(16)116MIM(17)234MGD(18)126MEDLINE
2393
Documentation
Theusermanualhasbeenextensivelyrevised.Bibliographicreferenceshavebeenaddedtothesectionexplainingtherepresentationoftranscriptmappingdata.Someofthemareaccompaniedbydirecthyperlinkstofiguresinonlinejournalsexemplifyingaparticulartechnique.Severaladditionaldocumentshaverecentlybeenmadeavailableovertheweb.Onecontainsalistofall‘homologygroups’definedinEPD.Suchgroupsconsistofhomologouspromotersexhibitingsignificantsequencesimilarityinthe–79to+20regionamongthemselves.Anotherdocumentpresentsthehierarchicalpromoterclassifi-cationsystemofEPD.
Promoterelementdescriptions
Weightmatrixdescriptionsoffourmajoreukaryoticpromoterelements(TATA-box,initiator,GC-boxandCCAAT-box)havepreviouslybeenderivedfromEPDrelease17(7).WehavenowdecidedtomakeupdatedversionsofsuchmatricesavailableonayearlybasisfromtheEPDwebpages.ThelatestversionswereproducedfromEPDrelease60usingaBaum–WelchhiddenMarkovmodeltrainingalgorithm(programbuildmodelofSAMrelease1.3.3,Hughey&Krogh1998,http://www.cse.ucsc.edu/research/compbio/sam.html).ACCESSFTP
Thefollowingfilesareavailablefromftp.epd.isb-sib.ch/pub/databases/epd
•Flat-filescontainingtheEPDdatabaseinthenewandintheoldformat.
•EPDusermanual.
•SequencelibrariesinEMBLandFASTAformatcontainingpromotersequencesfrom–499to+100relativetothetran-scriptionstartsite.
•AslightlyreducedversionofEPDinASN.1formatdesignedforimportintotheGenBank–Entrezdataenvironment(13),includingaformaldatadescriptioninASN1.
NucleicAcidsResearch,2000,Vol.28,No.1
303
•IcarusscriptsforindexingEPDbySRS(14).WWW
Thefollowingservicesareofferedathttp://www.epd.isb-sib.ch•AccesstoEPDentriesbyIDoraccessionnumber.Thefollowingformatsareavailable:textonly,HTMLandHTMLcombinedwithagraphicrepresentationofsequenceobjectsbyaJavaapplet(15).
•ApagefordownloadingpromotersequencesubsetsdefinedinEPD.
•AccesstoEPDentriesandcorrespondingpromotersequencesviaaqueryform.
AccesstoEPDviaSRSisprovidedbytheSwissEMBNetnodeathttp://www.ch.embnet.org/SUPPLEMENTARYMATERIAL
RelevantURLlinksareavailableatNAROnline.ACKNOWLEDGEMENT
EPDisfundedinpartbygrant31-54782.98fromtheSwissNationalScienceFoundation.REFERENCES
1.Heinemeyer,T.,Chen,X.,Karas,H.,Kel,A.E.,Kel,O.V.,Liebich,I.,Meinhardt,T.,Reuter,I.,Schacherer,F.andWingender,E.(1999)NucleicAcidsRes.,27,318–322.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,316–319.
2.Ghosh,D.(1999)NucleicAcidsRes.,27,315–317.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,308–310.
3.Kolchanov,N.A.,Ananko,E.A.,Podkolodnaya,O.A.,Ignatieva,E.V.,Stepanenko,I.L.,Kel-Margoulis,O.V.,Kel,A.E.,Merkulova,T.I.,Goryachkovskaya,T.N.,Busygina,T.V.,Kolpakov,F.A.,
Podkolodny,N.L.,Naumochkin,A.N.andRomashchenko,A.G.(1999)NucleicAcidsRes.,27,303–306.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,298–301.
4.Rombauts,S.,Dehais,P.,VanMontagu,M.andRouze,P.(1999)NucleicAcidsRes.,27,295–296.
5.Higo,K.,Ugawa,Y.,Iwamoto,M.andKorenaga,T.(1999)NucleicAcidsRes.,27,297–300.
6.Bucher,P.andTrifonov,E.N.(1986)NucleicAcidsRes.,22,10009–10026.
7.Bucher,P.(1990)J.Mol.Biol.,212,563–578.
8.Fickett,J.W.andHatzigeorgiou,A.G.(1997)GenomeRes.,7,861–878.9.Stoesser,G.,Tuli,M.A.,Lopez,R.andSterk,P.(1999)NucleicAcidsRes.,27,18–24.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,19–23.
10.CavinPérier,R.,Junier,T.andBucher,P.(1998)NucleicAcidsRes.,26,
353–357.
11.CavinPérier,R.,Junier,T.,Bonnard,C.andBucher,P.(1999)
NucleicAcidsRes.,27,307–309.
12.Bairoch,A.andApweiler,R.(1999)NucleicAcidsRes.,27,49–54.
Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,45–48.13.Benson,D.A.,Boguski,M.,Lipman,D.J.andOstell,J.(1994)Nucleic
AcidsRes.,22,3441–3444.
14.Etzold,T.,Ulyanov,A.andArgos,P.(1996)MethodsEnzymol.,266,
114–128.
15.Junier,T.andBucher,P.(1998)InSilicoBiol.,1,13–20.
16.TheFlybaseConsortium(1999)NucleicAcidsRes.,27,85–88.17.Pearson,P.,Francomano,C.,Foster,P.,Bocchini,C.,Li,P.and
McKusick,V.(1994)NucleicAcidsRes.,22,3470–3473.
18.Blake,J.A.,Richardson,J.E.,Davisson,M.T.andEppig,J.T.(1999)
NucleicAcidsRes.,27,95–98.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,108–111.
302–303NucleicAcidsResearch,2000,Vol.28,No.12000OxfordUniversityPress
TheEukaryoticPromoterDatabase(EPD)
RouaïdaCavinPérier,VivianePraz,ThomasJunier,ClaudeBonnardandPhilippBucher*
SwissInstituteofBioinformaticsandSwissInstituteforExperimentalCancerResearch,Ch.desBoveresses155,1066-Epalingess/Lausanne,Switzerland
ReceivedOctober6,1999;AcceptedOctober8,1999
ABSTRACT
TheEukaryoticPromoterDatabase(EPD)isananno-tatednon-redundantcollectionofeukaryoticPOLIIpromotersforwhichthetranscriptionstartsitehasbeendeterminedexperimentally.Accesstopromotersequencesisprovidedbypointerstopositionsinnucleotidesequenceentries.Theannotationpartofanentryincludesadescriptionoftheinitiationsitemappingdata,exhaustivecross-referencestotheEMBLnucleotidesequencedatabase,SWISS-PROT,TRANSFACandotherdatabases,aswellasbiblio-graphicreferences.EPDisstructuredinawaythatfacilitatesdynamicextractionofbiologicallymean-ingfulpromotersubsetsforcomparativesequenceanalysis.WWW-basedinterfaceshavebeendevel-opedthatenabletheusertoviewEPDentriesindifferentformats,toselectandextractpromotersequencesaccordingtoavarietyofcriteria,andtonavigatetorelateddatabasesexploitingdifferentcross-references.TheEPDwebsitealsofeaturesyearlyupdatedbasefrequencymatricesformajoreukaryoticpromoterelements.EPDcanbeaccessedathttp://www.epd.isb-sib.chDATABASEDESCRIPTION
Thetermpromoterhastwodifferentmeaningsinbiology:(i)ageneregionimmediatelyupstreamofatranscriptioninitiationsite,and(ii)acis-actinggeneticelementcontrollingtherateoftranscriptioninitiationofagene.TheEukaryoticPromoterDatabase(EPD)isadatabaseofpromotersintheformersense.InformationaboutpromotersinthelattersensecanbefoundinotherdatabasessuchasTRANSFAC(1),ooTFD(2),TRRD(3),PlantCARE(4)andPLACE(5).
EPDwasoriginallydesignedasaresourceforcomparativesequenceanalysisand,assuch,hasplayedaninstrumentalroleinthecharacterizationofeukaryotictranscriptioncontrolelements(6,7),aswellasinthedevelopmentofeukaryoticpromoterpredictionalgorithms(8).Themainpurposeofthedatabaseistokeeptrackofexperimentaldatathatdefinetranscriptioninitiationsitesofeukaryoticgenes.Thistypeoffunctionalinformationislinkedtopromotersequencesviamachine-readablepointerstopositionswithinsequencesoftheEMBLnucleotidesequencedatabase(9).
EPDisarigorouslyselected,curatedandquality-controlleddatabase.Inordertobeincluded,apromotermustfulfillanumberofconditionslaiddownintheusermanual.Mostimportantly,thetranscriptionstartsitemustbemappedexperimentallywithanestimatedprecisionof 5bporhigher.AllinformationinEPDoriginatesfromacriticalexaminationandindependentinterpretationoftheexperimentaldatapresentedinthecitedresearchpublications.PublishedconclusionsandfeaturetableannotationsinEMBLentriesareneverblindlyreliedupon.Atpresent,EPDisconfinedtopromotersrecognizedbytheRNAPOLIIsystemofhighereukaryotes(multicellularplantsandanimals).Notethatthisrestrictiondoesnotaprioriexcludeviralpromoters.
EPDisalsoastrictlynon-redundantdatabase.Thegeneralruleisthatoneentrycorrespondstoonetranscriptioninitiationsiteinagenome.Organismsaredistinguishedatthetaxonomiclevelofthespecies.Accordingtothispolicy,datafromdifferentliteraturesourcespertainingtothesametranscriptioninitiationsitesarerepresentedbythesameentry.Likewise,promotersbelongingtodifferentallelesofthesamegene,ortothesamegeneindifferentsubspecies,arecoveredbythesameentryregardlessofwhethertheydifferinsequence.Theusermanualprovidesmoredetailsabouthowcertainnon-trivialcasessuchaspromotersoftandemlyrepeatedgenesorretro-transposableelements,arehandled.
AcomprehensivedescriptionofthecontentsandformatofEPDhasbeenpublishedearlier(10).Userinterfacesandsoftwaresupportforlocalinstallationshavebeenpreviouslydescribed(11).
RECENTDEVELOPMENTSDatabase
Theobjectiveofexhaustivecross-referencingbetweenEPDpromotersandEMBLsequencesisbeinggivenhighpriorityatthemoment,especiallywithregardtogenomesthatarecomplete(Caenorhabditiselegans)oratanadvancedstageofsequencing(Arabidopsis,Drosophila,human).Asaconsequence,thenumberofEMBLcross-referenceshasincreasedby>1000sincelastyear(Table1).Moreover,theinternalEPDcross-referenceshavebeenrevised.Untilnow,suchlinkswereonlyusedtoconnectalternativepromotersofthesamegene.Infuturereleases,promotersofdifferentgenesoccurringatashortdistancefromeachother(
*Towhomcorrespondenceshouldbeaddressed.Tel:+[1**********];Fax:+[1**********];Email:[email protected]
hasbeenintroducedandsofarbeenpopulatedwithkeywordsimportedfromSWISS-PROT(12).Thisfeatureisintendedtoenhancethequerycapabilitiesofvariousaccesstools.Additionalkeywordsrelatingtopropertiesofthepromoterratherthantopropertiesofthecorrespondinggeneproductwillbeaddedinthenearfuture.
Table1.Databasecross-referencesinEPDrelease60DatabaseNumberoflinksEPDinternal188EMBL(9)2978TRANSFAC(1)1700SWISS-PROT(12)1058FlyBase(16)116MIM(17)234MGD(18)126MEDLINE
2393
Documentation
Theusermanualhasbeenextensivelyrevised.Bibliographicreferenceshavebeenaddedtothesectionexplainingtherepresentationoftranscriptmappingdata.Someofthemareaccompaniedbydirecthyperlinkstofiguresinonlinejournalsexemplifyingaparticulartechnique.Severaladditionaldocumentshaverecentlybeenmadeavailableovertheweb.Onecontainsalistofall‘homologygroups’definedinEPD.Suchgroupsconsistofhomologouspromotersexhibitingsignificantsequencesimilarityinthe–79to+20regionamongthemselves.Anotherdocumentpresentsthehierarchicalpromoterclassifi-cationsystemofEPD.
Promoterelementdescriptions
Weightmatrixdescriptionsoffourmajoreukaryoticpromoterelements(TATA-box,initiator,GC-boxandCCAAT-box)havepreviouslybeenderivedfromEPDrelease17(7).WehavenowdecidedtomakeupdatedversionsofsuchmatricesavailableonayearlybasisfromtheEPDwebpages.ThelatestversionswereproducedfromEPDrelease60usingaBaum–WelchhiddenMarkovmodeltrainingalgorithm(programbuildmodelofSAMrelease1.3.3,Hughey&Krogh1998,http://www.cse.ucsc.edu/research/compbio/sam.html).ACCESSFTP
Thefollowingfilesareavailablefromftp.epd.isb-sib.ch/pub/databases/epd
•Flat-filescontainingtheEPDdatabaseinthenewandintheoldformat.
•EPDusermanual.
•SequencelibrariesinEMBLandFASTAformatcontainingpromotersequencesfrom–499to+100relativetothetran-scriptionstartsite.
•AslightlyreducedversionofEPDinASN.1formatdesignedforimportintotheGenBank–Entrezdataenvironment(13),includingaformaldatadescriptioninASN1.
NucleicAcidsResearch,2000,Vol.28,No.1
303
•IcarusscriptsforindexingEPDbySRS(14).WWW
Thefollowingservicesareofferedathttp://www.epd.isb-sib.ch•AccesstoEPDentriesbyIDoraccessionnumber.Thefollowingformatsareavailable:textonly,HTMLandHTMLcombinedwithagraphicrepresentationofsequenceobjectsbyaJavaapplet(15).
•ApagefordownloadingpromotersequencesubsetsdefinedinEPD.
•AccesstoEPDentriesandcorrespondingpromotersequencesviaaqueryform.
AccesstoEPDviaSRSisprovidedbytheSwissEMBNetnodeathttp://www.ch.embnet.org/SUPPLEMENTARYMATERIAL
RelevantURLlinksareavailableatNAROnline.ACKNOWLEDGEMENT
EPDisfundedinpartbygrant31-54782.98fromtheSwissNationalScienceFoundation.REFERENCES
1.Heinemeyer,T.,Chen,X.,Karas,H.,Kel,A.E.,Kel,O.V.,Liebich,I.,Meinhardt,T.,Reuter,I.,Schacherer,F.andWingender,E.(1999)NucleicAcidsRes.,27,318–322.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,316–319.
2.Ghosh,D.(1999)NucleicAcidsRes.,27,315–317.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,308–310.
3.Kolchanov,N.A.,Ananko,E.A.,Podkolodnaya,O.A.,Ignatieva,E.V.,Stepanenko,I.L.,Kel-Margoulis,O.V.,Kel,A.E.,Merkulova,T.I.,Goryachkovskaya,T.N.,Busygina,T.V.,Kolpakov,F.A.,
Podkolodny,N.L.,Naumochkin,A.N.andRomashchenko,A.G.(1999)NucleicAcidsRes.,27,303–306.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,298–301.
4.Rombauts,S.,Dehais,P.,VanMontagu,M.andRouze,P.(1999)NucleicAcidsRes.,27,295–296.
5.Higo,K.,Ugawa,Y.,Iwamoto,M.andKorenaga,T.(1999)NucleicAcidsRes.,27,297–300.
6.Bucher,P.andTrifonov,E.N.(1986)NucleicAcidsRes.,22,10009–10026.
7.Bucher,P.(1990)J.Mol.Biol.,212,563–578.
8.Fickett,J.W.andHatzigeorgiou,A.G.(1997)GenomeRes.,7,861–878.9.Stoesser,G.,Tuli,M.A.,Lopez,R.andSterk,P.(1999)NucleicAcidsRes.,27,18–24.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,19–23.
10.CavinPérier,R.,Junier,T.andBucher,P.(1998)NucleicAcidsRes.,26,
353–357.
11.CavinPérier,R.,Junier,T.,Bonnard,C.andBucher,P.(1999)
NucleicAcidsRes.,27,307–309.
12.Bairoch,A.andApweiler,R.(1999)NucleicAcidsRes.,27,49–54.
Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,45–48.13.Benson,D.A.,Boguski,M.,Lipman,D.J.andOstell,J.(1994)Nucleic
AcidsRes.,22,3441–3444.
14.Etzold,T.,Ulyanov,A.andArgos,P.(1996)MethodsEnzymol.,266,
114–128.
15.Junier,T.andBucher,P.(1998)InSilicoBiol.,1,13–20.
16.TheFlybaseConsortium(1999)NucleicAcidsRes.,27,85–88.17.Pearson,P.,Francomano,C.,Foster,P.,Bocchini,C.,Li,P.and
McKusick,V.(1994)NucleicAcidsRes.,22,3470–3473.
18.Blake,J.A.,Richardson,J.E.,Davisson,M.T.andEppig,J.T.(1999)
NucleicAcidsRes.,27,95–98.Updatedarticleinthisissue:NucleicAcidsRes.(2000),28,108–111.