Skip to content

biojava/biojava-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioJava-Spark

Algorithms that are built around BioJava and are running on Apache Spark

Build Status License Status Version

Starting up

Some initial instructions can be found on the mmtf-spark project

https://github.com/sbl-sdsc/mmtf-spark

First download and untar a Hadoop sequence file of the PDB (~7 GB download)

wget http://mmtf.rcsb.org/v1.0/hadoopfiles/full.tar
tar -xvf full.tar

Or you can get a C-alpha, phosphate, ligand only version (~800 Mb download)

wget http://mmtf.rcsb.org/v1.0/hadoopfiles/reduced.tar
tar -xvf reduced.tar

Second add the biojava-spark dependecy to your pom

<dependency>
	<groupId>org.biojava</groupId>
	<artifactId>biojava-spark</artifactId>
	<version>0.2.1</version>
</dependency>

Extra Biojava examples

Do some simple quality filtering

float maxResolution = 3.0f;
float maxRfree = 0.3f;
StructureDataRDD structureData = new StructureDataRDD("/path/to/file")
			.filterResolution(maxResolution)
			.filterRfree(maxRfree);

Summarsing the elements in the PDB

Map<String, Long> elementCountMap = BiojavaSparkUtils.findAtoms(structureData).countByElement();

Finding inter-atomic contacts from the PDB

Double mean = BiojavaSparkUtils.findContacts(structureData,
		new AtomSelectObject()
				.groupNameList(new String[] {"PRO","LYS"})
				.elementNameList(new String[] {"C"})
				.atomNameList(new String[] {"CA"}),
				cutoff)
		.getDistanceDistOfAtomInts("CA", "CA")
		.mean();
System.out.println("\nMean PRO-LYS CA-CA distance: " + mean);

About

💥 Algorithms that are built around BioJava and run on Apache Spark

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 6

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy