Skip to content

Commit 4ed3b3e

Browse files
authored
Merge pull request #11 from gephi/fix/add-datasets
Fix/add datasets
2 parents d929551 + 3d63104 commit 4ed3b3e

33 files changed

+4888
-37
lines changed

gephi-desktop/docs/User_Manual/Datasets.md

Lines changed: 28 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -6,70 +6,61 @@ sidebar_position: 4
66

77
The Gephi sample datasets below are available in various formats (GEXF, GDF, GML, NET, GraphML, DL, DOT). Feel free to add new datasets, but be sure to cite the original authors.
88

9-
**Supported graph formats are described [here](https://gephi.org/users/supported-graph-formats/).**
9+
**Supported graph formats are described [here](./Import).**
1010

1111
Gephi can open zipped files directly.
1212

1313
## Web and Internet
1414

15-
- [GEXF file](https://gephi.org/datasets/eurosis.gexf.zip). **EuroSiS web mapping study**: Mapping interactions between Science in Society actors on the Web of 12 European countries. Original report and data can be found [here](http://www.webatlas.fr/exhibition/eurosis/).
16-
- [GML file](https://gephi.org/datasets/internet_routers-22july06.gml.zip). **Internet**: a symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from BGP tables posted by the [University of Oregon Route Views Project](http://routeviews.org/). This snapshot was created by Mark Newman on July 22, 2006 and was not previously published.
15+
- [GEXF file](/datasets/eurosis.gexf.zip). **EuroSiS web mapping study**: Mapping interactions between Science in Society actors on the Web of 12 European countries. Original report and data can be found [here](http://www.webatlas.fr/exhibition/eurosis/).
16+
- [GML file](/datasets/internet_routers-22july06.gml.zip). **Internet**: a symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from BGP tables posted by the [University of Oregon Route Views Project](http://routeviews.org/). This snapshot was created by Mark Newman on July 22, 2006 and was not previously published.
1717

1818
## Social networks
1919

20-
- [GML file](https://gephi.org/datasets/lesmiserables.gml.zip). **Les Miserables**: coappearance weighted network of characters in the novel Les Miserables. D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA (1993).
21-
- [GEXF file](https://gephi.org/datasets/ht2009_15min.gexf.gz). **Hypertext 2009 dynamic contact network**: contact network during the Hypertext 2009 conference. Source: [Sociopatterns.org](http://www.sociopatterns.org/datasets/hypertext-2009-dynamic-contact-network/).
20+
- [GML file](/datasets/lesmiserables.gml.zip). **Les Miserables**: coappearance weighted network of characters in the novel Les Miserables. D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA (1993).
21+
- [GEXF file](/datasets/ht2009_15min.gexf.gz). **Hypertext 2009 dynamic contact network**: contact network during the Hypertext 2009 conference. Source: [Sociopatterns.org](http://www.sociopatterns.org/datasets/hypertext-2009-dynamic-contact-network/).
2222
- [GEXF file](https://zenodo.org/record/4612153#.YFIuQi1Xaw4). **CLASS OF 1880/81**: friendship network of a German boys' school class from 1880/1881. It's based on the probably first ever primarily collected social network dataset, assembled by the primary school teacher Johannes Delitsch. The data was reanalyzed and compiled for the article: [Heidler, R., Gamper, M., Herz, A., Eßer, F. (2014): Relationship patterns in the 19th century: The friendship network in a German boys' school class from 1880 to 1881 revisited. Social Networks 13: 1--13.](http://www.sciencedirect.com/science/article/pii/S0378873313000865).
23-
- [GML file](https://gephi.org/datasets/karate.gml.zip). **Zachary's karate club**: social network of friendships between 34 members of a karate club at a US university in the 1970s. W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).
24-
- [GML file](https://gephi.org/datasets/netscience.gml.zip). **Coauthorships in network science**: coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006. A figure depicting the largest component of this network can be found here. M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
25-
- [GEXF file](https://gephi.org/datasets/cpan-authors.gexf.zip). **CPAN authors**: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of developers, linked when they use the same Perl module. Original data can be found [here](http://cpan-explorer.org/).
26-
- [GEXF file](https://gephi.org/datasets/cpan-distributions.gexf.zip). **CPAN distributions**: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of Perl modules dependencies. Orginal data can be found [here](http://cpan-explorer.org/).
27-
- [NET file](https://gephi.org/datasets/jazz.net.zip). **Jazz musicians network**: List of edges of the network of Jazz musicians. P.Gleiser and L. Danon , Adv. Complex Syst.6, 565 (2003).
28-
- [TGZ file](http://franck.lumberjaph.net/graphs.tgz). **Github open source developers**: See https://github.com/franckcuny/blog/blob/master/posts/2010-03-25-github-explorer.md/
29-
- [DL file](https://gephi.org/datasets/OClinks_w.dl.zip). **Online Social Network** 1899 nodes - Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163
30-
- [GEPHI file](https://gephi.org/datasets/hero-social-network.gephi). **The Marvel Social Network** Networks of super heroes, constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. Collected by [Infochimps](http://www.infochimps.com/datasets/marvel-universe-social-graph) and transformed & enhanced by Kai Chang.
31-
- [GDF file](https://gephi.org/datasets/comic-hero-network.gdf.zip). **Comic and Hero Network** Same data as above, but this includes the comics the heroes appear in.
32-
- [DOT file](http://rankinfo.pkqs.net/twittercrawl.dot.gz). **Twitter mentions and retweets** of some part of the Twitter network. The file is updated from time to time.
23+
- [GML file](/datasets/karate.gml.zip). **Zachary's karate club**: social network of friendships between 34 members of a karate club at a US university in the 1970s. W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).
24+
- [GML file](/datasets/netscience.gml.zip). **Coauthorships in network science**: coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006. A figure depicting the largest component of this network can be found here. M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
25+
- [GEXF file](/datasets/cpan-authors.gexf.zip). **CPAN authors**: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of developers, linked when they use the same Perl module. Original data can be found [here](http://cpan-explorer.org/).
26+
- [GEXF file](/datasets/cpan-distributions.gexf.zip). **CPAN distributions**: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of Perl modules dependencies. Orginal data can be found [here](http://cpan-explorer.org/).
27+
- [NET file](/datasets/jazz.net.zip). **Jazz musicians network**: List of edges of the network of Jazz musicians. P.Gleiser and L. Danon , Adv. Complex Syst.6, 565 (2003).
28+
- [DL file](/datasets/OClinks_w.dl.zip). **Online Social Network** 1899 nodes - Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163
29+
- [GEPHI file](/datasets/hero-social-network.gephi). **The Marvel Social Network** Networks of super heroes, constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. Collected by [Infochimps](http://www.infochimps.com/datasets/marvel-universe-social-graph) and transformed & enhanced by Kai Chang.
30+
- [GDF file](/datasets/comic-hero-network.gdf.zip). **Comic and Hero Network** Same data as above, but this includes the comics the heroes appear in.
3331
- [GEXF file](http://www.sociopatterns.org/datasets/primary-school-cumulative-networks/). **Contact networks** in a primary school, SocioPatterns team, 2011.
3432
- [GEXF file](https://github.com/mbingenheimer/ChineseBuddhism_SNA). **Historical Social Network of Chinese Buddhism 漢傳佛教歷史社會網絡** 17,000+ persons, 25,000+ connections.
3533

3634
## Biological networks
3735

38-
- [GEXF](http://gephi.org/datasets/diseasome.gexf.zip). **Diseasome**: A network of disorders and disease genes linked by known disorder–gene associations, indicating the common genetic origin of many diseases. Genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. The original dataset can be found here: The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007), Proc Natl Acad Sci USA 104:8685-8690
39-
- [GEXF](http://gephi.org/datasets/celegans.gexf.zip). **C. Elegans neural network**: A directed, weighted network representing the neural network of C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314, 1-340 (1986).
40-
- [GEXF](https://gephi.org/datasets/yeast.gexf.zip). **Yeast**: Protein-Protein interaction network in yeast. Original data can be found [here](http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm).
36+
- [GEXF](/datasets/diseasome.gexf.zip). **Diseasome**: A network of disorders and disease genes linked by known disorder–gene associations, indicating the common genetic origin of many diseases. Genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. The original dataset can be found here: The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007), Proc Natl Acad Sci USA 104:8685-8690
37+
- [GEXF](/datasets/celegans.gexf.zip). **C. Elegans neural network**: A directed, weighted network representing the neural network of C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314, 1-340 (1986).
38+
- [GEXF](/datasets/yeast.gexf.zip). **Yeast**: Protein-Protein interaction network in yeast. Original data can be found [here](http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm).
4139

4240
## Infrastructure networks
4341

44-
- [GML](https://gephi.org/datasets/power.gml.zip). **Power grid**: An undirected, unweighted network representing the topology of the Western States Power Grid of the United States. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).
45-
- [GRAPHML](https://gephi.org/datasets/airlines.graphml.zip). **Airlines**: unknown source.
46-
- [NET](https://gephi.org/datasets/us-air97.net.zip). **US Air97**: North American Transportation Atlas Data (NORTAD). Original data can be found [here](http://vlado.fmf.uni-lj.si/pub/networks/pajek/data/gphs.htm).
42+
- [GML](/datasets/power.gml.zip). **Power grid**: An undirected, unweighted network representing the topology of the Western States Power Grid of the United States. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).
43+
- [GRAPHML](/datasets/airlines.graphml.zip). **Airlines**: unknown source.
44+
- [NET](/datasets/us-air97.net.zip). **US Air97**: North American Transportation Atlas Data (NORTAD). Original data can be found [here](http://vlado.fmf.uni-lj.si/pub/networks/pajek/data/gphs.htm).
4745

4846
## Other networks
4947

50-
- [GEXF](https://gephi.org/datasets/codeminer.gexf.zip). **Java code**: Source code structure of a Java program, by S.Heymann & J.Palmier, 2008.
51-
- [GEXF](https://gephi.org/datasets/photoviz-dynamic.gexf.zip). **Dynamic Java code**: Dynamic source code structure of a Java program by evolution of commits on the SVN, by S.Heymann & J.Bilcke, 2008.
52-
- [GML](https://gephi.org/datasets/word_adjacencies.gml.zip). **Word adjacencies**: adjacency network of common adjectives and nouns in the novel David Copperfield by Charles Dickens. Please cite M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
53-
- [NET](https://gephi.org/datasets/wordnet3.net.zip). **Wordnet English dictionary**: unknown source.
54-
- [DOT](https://gephi.org/datasets/hex.dot.zip). **Abstract mesh**: 331 nodes.
48+
- [GEXF](/datasets/codeminer.gexf.zip). **Java code**: Source code structure of a Java program, by S.Heymann & J.Palmier, 2008.
49+
- [GEXF](/datasets/photoviz-dynamic.gexf.zip). **Dynamic Java code**: Dynamic source code structure of a Java program by evolution of commits on the SVN, by S.Heymann & J.Bilcke, 2008.
50+
- [GML](/datasets/word_adjacencies.gml.zip). **Word adjacencies**: adjacency network of common adjectives and nouns in the novel David Copperfield by Charles Dickens. Please cite M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
51+
- [NET](/datasets/wordnet3.net.zip). **Wordnet English dictionary**: unknown source.
52+
- [DOT](/datasets/hex.dot.zip). **Abstract mesh**: 331 nodes.
5553

5654
## Sources
5755

5856
Some of the above datasets are from:
5957
- [Mark Newman](http://www-personal.umich.edu/~mejn/netdata/)
60-
- [Alexandre Arenas](http://deim.urv.cat/~aarenas/data/welcome.htm)
61-
- [Albert-László Barabási](http://www.nd.edu/~networks/resources.htm)
62-
- [Vladimir Batagelj and Andrej Mrvar](http://vlado.fmf.uni-lj.si/pub/networks/pajek/data/gphs.htm)
58+
- Alexandre Arenas
59+
- Albert-László Barabási
60+
- Vladimir Batagelj and Andrej Mrvar
6361
- [Tore Opsahl](http://toreopsahl.com/datasets/)
6462

6563
## Other network data repositories
66-
67-
- [Duncan Watts](http://cdg.columbia.edu/cdg/datasets)
68-
- [Kevin Chai](http://kevinchai.net/datasets/)
69-
- [Indiana University](http://iv.slis.indiana.edu/db/index.html)
70-
- [Trust networks dataset at trustlet.org](http://www.trustlet.org/wiki/Trust_network_datasets#Released_datasets)
71-
- [Data sets at CFinder.org](http://cfinder.org/data)
7264
- [SNAP data](http://snap.stanford.edu/data)
7365
- [UC Irvine Network Data Repository](http://networkdata.ics.uci.edu/)
74-
- [The Internet Topology Zoo](http://topology-zoo.org/)
75-
- [Yahoo! Graph Datasets](http://webscope.sandbox.yahoo.com/catalog.php?datatype=g)
66+
- [GephiDatasets](https://github.com/Ifeanyi55/GephiDatasets)

0 commit comments

Comments
 (0)