1. Data Intensive Science, and Workflows

    I'm writing this on my way back from Stockholm, where I attended a workshop on the 4th Paradigm. This is the idea (so named by Jim Gray, I gather?) that data-intensive science is a distinct paradigm from the first three paradigms of scientific investigation -- theory, experiment, and simulation. I was …

    read more

    There are comments.

  2. Trying out 'cram'

    I desperately need something to run and test things at the command line, both for course documentation (think "doctest" but with shell prompts) and for script testing (as part of scientific pipelines). At the 2011 testing-in-python BoF, Augie showed us cram, which is the mercurial project's internal test code ripped …

    read more

    There are comments.

  3. My new data analysis pipeline code

    First, I write a recipe file, 'metagenome.recipe', laying out my job description for, say, sequence trimming and assembly with Velvet:

    fasta_file soil-data.fa
    
    qc_filter min_length=50 remove_Ns=true
    
    graph_filter min_length=400
    
    velvet_assemble k=33 min_length=1000 scaffolding=True
    

    Then I specify machine parameters, e.g. 'bigmem.conf':

    [defaults]
    n_threads …
    read more

    There are comments.

  4. A memory efficient way to remove low-abundance k-mers from large (metagenomic?) DNA data sets

    I've spent the last few weeks working on a simple solution to a challenging problem in DNA sequence assembly, and I think we've got a nice simple theoretical solution with an actual implementation. I'd be interested in comments!

    Introduction

    Briefly, the algorithmic challenge is this:

    We have a bunch of …

    read more

    There are comments.

« Page 6 / 38 »

social