TGAC - AllBio 2014

bmpvieira.com/allbio14

Who

bmpvieira

Bruno Vieira | @bmpvieira

Phd Student @ QMUL

Bioinformatics and Population Genomics


Supervisor:
Yannick Wurm | @yannick__

© 2014 Bruno Vieira CC-BY 4.0

Before

2004-2009 FCUL

Master in Human Biology and Environment
Licentiate in Cell Biology and Biotechnology

2009-2013 CoBiG2

Bioinformatician and SysAdmin

2012-2013 eseb2013 Full Stack Web Developer - Built everything with
Node.js, Express.js, Bootstrap, MongoDB and Redis

2013 geeklist Full Stack Web Developer - Worked on integration
with LinkedIn API

 

What

Bionode.io - Modular and universal bioinformatics bionode

Pipeable UNIX command line tools and JavaScript / Node.js APIs for bioinformatic analysis workflows on the server and browser.
Collaborates with BioJS - Represent biological data on the web

dat Dat - Build data pipelines

Provides a streaming interface between every file format and data storage backend. "git for data"

dat-data.com | @maxogden | @mafintosh

Why Bionode / Node.js?

Reusable, small and tested

badges

javascript-everywhere

JavaScript is fast

Package Manager that works

NPM


npm install bionode
npm install bionode -g
npm test
npm start
npm run test-browser
npm run build-docs
npm init
npm publish

CommonJS pattern

// awesome-lib/index.js
module.exports = function() {
  return "Small modules everywhere"
}

// myscript.js
var awesome = require('awesome-lib')
awesome()

Module counts

modules

Benefit from other JS projects

dat

biojs

noflo

Streams

turtles

@substack: "It's turtles all the way down!"


var fork1 = through.obj()
var fork2 = through.obj()

ncbi
  .search('sra', 'Solenopsis invicta')
  .pipe(fork1)
  .pipe(dat.reads)

fork1
  .pipe(tool.extractProperty('expxml.Biosample.id'))
  .pipe(ncbi.search('biosample'))
  .pipe(dat.samples)

fork1
  .pipe(tool.extractProperty('uid'))
  .pipe(fork2)


fork2
  .pipe(ncbi.link('sra', 'pubmed'))
  .pipe(ncbi.search('pubmed'))
  .pipe(dat.papers)

streams


Command Line Interface


# Subset a fasta file to a particular sequence

cat sequences.fasta
| bionode-fasta
| grep "contig123"
| bionode-fasta --write > contig123.fasta


# Find the reads datasets used for the Solenopsis invicta assembly

bionode-ncbi search assembly Solenopsis invicta |
tool-stream extractProperty uid                 |
bionode-ncbi link assembly bioproject           |
tool-stream extractProperty destUID             |
bionode-ncbi link bioproject sra                |
tool-stream extractProperty destUID             |
bionode-ncbi urls sra                           |
dat import --json

Project status: available

  • Data access:
    • ncbi
  • Parsing
    • fasta
    • bbi
  • Wrangling
    • seq
  • Wrappers
    • sra
    • sam
    • bwa

Project status: down the line

  • Data access:
    • ebi
    • ensembl
  • Parsing
    • fastq
    • sam
    • vcf
    • gff
  • Wrangling
    • quality control/stats
  • Wrappers
    • blast
    • diginorm

Try

generalhenry.com/data-plumber

Install

Node

# OSX
brew install node
# Ubuntu
sudo apt-get install nodejs npm

Bionode

npm install bionode

Thanks!

Acknowledgements:

@yannick__
@maxogden
@mafintosh
@alanmrice