1. Registration reminder for our two-week summer workshop on high-throughput sequencing data analysis!

    Our two-week summer workshop (announcement, direct link) is shaping up quite well, but the application deadline is today! So if you're interested, you should apply sometime before the end of the day. (We'll leave applications open as long as it's March 17th somewhere in the world.)

    Some updates and expansions …

    read more

    There are comments.

  2. Advancing metagenome classification and comparison by MinHash fingerprinting of IMG/M data sets.

    This is our just-submitted proposal for the JGI-NERSC "Facilities Integrating Collaborations for User Science" call. Enjoy!


    1. Brief description: (Limit 1 page)

    Abstract: Sourmash is a command-line tool and Python library that calculates and compares MinHash signatures from sequence data. Sourmash "compare" and "gather" functionality enables comparison and characterization of signatures …

    read more

    There are comments.

  3. Request for Compute Infrastructure to Support the Data Intensive Biology Summer Institute for Sequence Analysis at UC Davis

    Note: we were just awarded this allocation on Jetstream for DIBSI. Huzzah!


    Abstract:

    Large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. From June 18 to July 21, 2017 (30 days), the Lab for Data Intensive Biology …

    read more

    There are comments.

  4. Computational postdoc opening at UC Davis!

    We are currently soliciting applications for computational postdoctoral fellows to undertake exciting projects in computational biology/bioinformatics jointly supervised by Dr. Titus Brown (http://ivory.idyll.org/lab/) and Dr. Fereydoun Hormozdiari (http://www.hormozdiarilab.org/) at UC Davis.

    UC Davis is a world class research institution with a strong …

    read more

    There are comments.

  5. Categorizing 400,000 microbial genome shotgun data sets from the SRA

    This is another blog post on MinHash sketches; see also:

    read more

    There are comments.

  6. How I learned to stop worrying and love the coming archivability crisis in scientific software

    Note: This is the fifth post in a mini-series of blog posts inspired by the workshop Envisioning the Scientific Paper of the Future.

    This post was put together after the event and benefited greatly from conversations with Victoria Stodden, Yolanda Gil, Monya Baker, Gail Peretsman-Clement, and Kristin Antelman!


    Archivability is …

    read more

    There are comments.

  7. Quickly searching all the microbial genomes, mark 2 - now with archaea, phage, fungi, and protists!

    This is an update to last week's blog post, "Efficiently searching MinHash Sketch collections".


    Last week, Thanksgiving travel and post-turkey somnolescence gave me some time to work more with our combined MinHash/SBT implementation. One of the main things the last post contained was a collection of MinHash signatures of …

    read more

    There are comments.

  8. What is open science?

    Gabriella Coleman asked me for a short, general introduction to open science for a class, and I couldn't find anything that fit her needs. So I wrote up my own perspective. Feedback welcome!

    Some background: Science advances because we share ideas and methods

    Scientific progress relies on the sharing of …

    read more

    There are comments.

  9. Increasing postdoc pay

    I just gave all of my postdocs a $10,000-a-year raise.

    My two current postdocs all got a $10k raise over their current salary, and the four postdocs coming on board over the next 6 months will start at $10k over the NIH base salary we pay them already. (This …

    read more

    There are comments.

  10. An #openscienceprize entry: Integrating server-side annotation into the hypothes.is ecosystem

    Over the last few months, I've been playing with hypothes.is and thinking about how to use it to further my scientific work. This resulted in some brainstorming with Jon Udell and Maryann Martone about, well, lots of things. And now we're putting in an open science prize entry!

    tl …

    read more

    There are comments.

  11. Is mybinder 95% of the way to next-gen computational science publishing, or only 90%?

    If you haven't seen mybinder.org, you should go check it out. It's a site that runs IPython/Jupyter Notebooks from GitHub for free, and I think it's a solution to publishing reproducible computational work.

    For a really basic example, take a look at my demo Software Carpentry lesson. Clicking …

    read more

    There are comments.

  12. Why software development practice matters, Containerization version

    A while back, Kai Blin (via Nick Loman) asked Michael Barton:

    If we containerize all these things won't it just encourage worse software development practices; right now developers still need to consider someone other than themselves installing the software.

    and Michael Barton's response, transcribed, was:

    "It's a good point. Ultimately …
    read more

    There are comments.

  13. Transcriptomic analysis with Docker containers and data volumes

    As part of our Docker hands-on workshop earlier this month, I learned a lot about building Dockerfiles, running Docker containers on remote hosts with docker-machine, and using data volumes to manage data in remotely hosted Docker containers.

    During and after the workshop, I put together Docker images (and, more importantly …

    read more

    There are comments.

  14. Pubwication of software papers, and authorship on them

    Pubwication. Pubwication is what bwings us togethew today. Pubwication, that bwessed awwangement, that dweam within a dweam. And authorship, twue authorship, wiww fowwow you fowevah and evah. So tweasuwe youw authorship.

    Last week, our software paper on khmer 2.0 was published on F1000Research. We intend this paper to be …

    read more

    There are comments.

  15. jclub: Bloom Filter Trie - a data structure for pan-genome storage

    Note: this is a blog post from the DIB Lab journal club.

    Jump to Questions and Comments:.


    The paper:

    http://www.techfak.uni-bielefeld.de/~stoye/dropbox/wabi2015final.pdf

    "Bloom Filter Trie: a data structure for pan-genome storage."

    by Guillaume Holley, Roland Wittler, and Jens Stoye.

    Background

    • Pan Genome: Represents genes …
    read more

    There are comments.

  16. A review of "Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees"

    (This is a review of Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees, Solomon and Kingsford, 2015.)

    In this paper, Solomon and Kingsford present Sequence Bloom Trees (SBTs). SBT provides an efficient method for indexing multiple sequencing datasets and finding in which datasets a query sequence is present …

    read more

    There are comments.

  17. Taking grad students to PyCon

    I am still up at PyCon 2015 in Montreal, and most of my lab is here with me.

    On Saturday, I told Terry Peppers and some others that PyCon had been one of my (limited) lifelines to (limited) sanity during my early tenure-track years. Whenever I was in danger of …

    read more

    There are comments.

  18. Some ideas for workshops and unconference models for data-intensive biology

    Here at the Lab for Data-Intensive Biology (TM) we are constantly trying to explore new ideas for advancing the practice of biological data sciences. Below are some ideas that originated with or were sharpened by conversations with Greg Wilson (Executive Director, Software Carpentry) and Tracy Teal (Project Lead, Data Carpentry …

    read more

    There are comments.

  19. "Open Source, Open Science" Meeting Report - March 2015

    On March 19th and 20th, the Center for Open Science hosted a small meeting in Charlottesville, VA, convened by COS and co-organized by Kaitlin Thaney (Mozilla Science Lab) and Titus Brown (UC Davis). People working across the open science ecosystem attended, including publishers, infrastructure non-profits, public policy experts, community builders …

    read more

    There are comments.

  20. How we develop software (2015 version)

    A colleague who is starting their own computational lab just asked me for some advice on how to run software projects, and I wrote up the following. Comments welcome!


    A brief summary of what we've converged on for our own needs is this:

    • everything's on github (you can have private …

    read more

    There are comments.

  21. Lab for Data Intensive Biology at UC Davis joins Software Carpentry as an Affiliate

    We are pleased to announce that the Laboratory for Data Intensive Biology at UC Davis has joined the Software Carpentry Foundation as an Affiliate Member for three years, starting in January 2015.

    "We've been long-term supporters of Software Carpentry, and Affiliate status lets us support the Software Carpentry Foundation in …

    read more

    There are comments.

  22. Letter of resignation


    Dear <chairs>,

    I am resigning my Assistant Professor position at Michigan State University effective January 2nd, 2015.

    Sincerely,

    CTB.


    Anticipated FAQ:

    • Why? I'm moving to UC Davis.
    • Do you have an employment contract with UC Davis?? Nope. But I'm starting there in January, anyway. Or that's the plan. And yes …
    read more

    There are comments.

  23. Introducing the Moore Foundation's Data Driven Discovery (DDD) Investigators

    Note: the source data for this is available on github at https://github.com/ctb/dddi

    Today, the Moore Foundation announced that they have selected fourteen Moore Data Driven Discovery Investigators.

    In reverse alphabetical order, they are:


    Dr. Ethan White, University of Florida

    Proposal: Data-intensive forecasting and prediction for ecological …

    read more

    There are comments.

  24. The Critical Assessment of Metagenome Interpretation and why I'm not a fan

    Update 3/29/15: the CAMI FAQ now includes information on reproducibility measures, and looks very promising. The data sets they are producing also seem fascinating.

    If you're into metagenomics, you may have heard of CAMI, the Critical Assessment of Metagenome Interpretation. I've spoken to several people about it in …

    read more

    There are comments.

  25. Preprints and double publication - when is some exposure too much?

    Note to all: this is satire... As Marcia McNutt says below, please see Science Magazine's Contributors FAQ for more detailed information.


    Recently I had some conversations with Science Magazine about preprints, and when they're counted as double publication (see: Ingelfinger Rule). Now, Science has an enlightened preprint policy:

    ...we do …
    read more

    There are comments.

  26. A first science fair

    So my daughter just participated in her first science fair, at the age of 6. ("Conclusion: science can be fun! and sticky!")

    Over dinner, my wife and I came up with some ideas for her next fair. She was having trouble dissolving sugar in ice water, so we suggested maybe …

    read more

    There are comments.

  27. Imagine...

    Links, software, thoughts -- all solicited! Add 'em below or send 'em to me, t@idyll.org.

    ---

    Imagine... a rolling 48 hour hackathon, internationally teleconferenced, on reproducing analyses in preprints and papers. Each room of contributors could hack on things collaboratively while awake, then pass it on to others in overlapping …

    read more

    There are comments.

  28. Notes for my PyCon 2014 talk: Instrument ALL the things: Studying data-intensive workflows in the cloud

    Resources:

    read more

    There are comments.

  29. The Story Behind "Tackling soil diversity with the assembly of large, complex metagenomes"

    I'm pleased to announce the publication of "Tackling soil diversity with the assembly of large, complex metagenomes", by Adina Howe, Janet Jansson, Stephanie Malfatti, Susannah Tringe, James Tiedje, and myself. The paper is openly available on the PNAS Web site here (open access).

    External links:

    read more

    There are comments.

  30. Install gplots in R 2.1X

    I've been using EBSeq for a few things lately, and have had trouble getting some of the dependencies installed -- in particular, gplots doesn't seem to be readily available for R 2.14, 2.15, etc. Judging by my Google searches, others have been having the same problems; see e.g …

    read more

    There are comments.

  31. Will you join my committee?

    Dear <student>,

    I'd be happy to, but I do have a few conditions/requests based on prior experience with students!

    First, please schedule all of your meetings at least 2 months in advance :)

    Second, a condition for my signing off on your thesis will be that, for any paper for …

    read more

    There are comments.

  32. Is "Scientific Data" ever-finer salami-slicing, or is it reducing time to data publication?

    I just read Scientific Data - ultimate salami slicing publishing, in which Pedro Beltrao argues that Nature's new journal is simply another venue for them to suck money out of scientists. Maybe. But I'm strongly considering sending a lot of stuff there, and I really think Pedro is missing something very …

    read more

    There are comments.

  33. I've got a new job

    As the title says, I've got a new job.

    But it's not really that exciting a switch, sorry :)

    As of mid-August sometime, I will officially switch my appointment from 2/3 Computer Science and Engineering / 1/3 Microbiology and Molecular Genetics, to 2/3 Microbiology and Molecular Genetics, 1/3 …

    read more

    There are comments.

  34. My 2013 PyCon talk: Awesome Big Data Algorithms

    Schedule link

    Description

    Random algorithms and probabilistic data structures are algorithmically efficient and can provide shockingly good practical results. I will give a practical introduction, with live demos and bad jokes, to this fascinating algorithmic niche. I will conclude with some discussions of how our group has applied this to …

    read more

    There are comments.

  35. A mildly crazy idea: crowdsourced -omic analysis with data privacy sunset?

    Or, "can we crowdsource BGI?" ;)

    With all of the crazy need surrounding genomic analysis -- most of it on a shoestring budget -- I am thinking about a mildly crazy idea.

    What if I offered to computationally analyze people's non-model transcriptomic and metagenomic data for them, in exchange for (a) non-exclusive access …

    read more

    There are comments.

  36. Thinking about software architecture for heterogeneous data integration

    I just left the NAS meeting on Integrating Environmental Health Data to Advance Discovery, where I was an invited speaker. It was a pretty interesting meeting, with presentations from speakers who worked on chemotoxicity data, pollution data, exposure data, and electronic health records, as well as a few "outsiders" from …

    read more

    There are comments.

  37. Assembling the heck out of soil - paper posted

    We just posted yet another pre-submission paper to arXiv.org:

    Assembling large, complex environmental metagenomes

    Authors: Adina Chuang Howe, Janet Jansson, Stephanie A. Malfatti, Susannah Tringe, James M. Tiedje, and C. Titus Brown

    arXiv link

    Paper repository on github

    Abstract:

    The large volumes of sequencing data required to deeply sample …
    read more

    There are comments.

  38. Assembly artifacts paper posted

    We just posted another pre-submission paper to arXiv.org:

    Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets

    Authors: Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, and C. Titus Brown

    arXiv link

    Paper repository on github

    Abstract:

    Sequencing errors and …
    read more

    There are comments.

  39. Anecdotal science

    I'm starting to notice that a lot of bioinformatics is anecdotal.

    People publish software that "works for them." But it's not clear what "works" means -- all to often either the exact parameters or the specific evaluation procedure is not provided (and yes, there's a double standard here where experimental methods …

    read more

    There are comments.

  40. What biologists need to know about cyberinfrastructure

    I recently attended an NSF BIO directorate meeting about cyberinfrastructure needs. Here's a list of training & education challenges identified at that meeting:

    • development and adaptation of tools to archive data and metadata from diverse sources to enable data mining
    • integration of structured and unstructured data from heterogenous data sources
    • discussion …
    read more

    There are comments.

  41. The Beachcomber's Dilemma

    Here's a data analysis question for all you Big Data folk.

    A beachcomber is interested in obtaining up to 10 examples of every type of shell present on a beach. The shells are individually easy to find, but some types are really rare and some are really abundant. The beachcomber …

    read more

    There are comments.

  42. Welcome to my new blog!

    I've just moved my blog over to Pelican, a static blog generator that takes in reStructuredText and spits out, well, this! I'm now using Disqus for commenting, too.

    The main motivations for the move (apart from slightly better theming) were to escape dynamic-blog-land in favor of static-blog-land, while enabling a …

    read more

    There are comments.

  43. A call for open lab and reusable lab protocols

    About a year ago, I came across a really interesting Science paper entitled "Rapid and inefficient isolation of single cells from soil". In it, the authors -- from a well-known lab at UC Davis -- described how they used low-percentage agarose gels to extract thousands of individual cells from a soil sample …

    read more

    There are comments.

  44. Some early experience in teaching using ipython notebook

    As part of the 2012 Analyzing Next-Generation Sequencing Data course, I've been trying out ipython notebook for the tutorials.

    In previous years, our tutorials all looked like this: Short read assembly with Velvet -- basically, reStructuredText files integrated with Sphinx. This had a lot of advantages, including Googleability and simplicity; but …

    read more

    There are comments.

  45. DRAFT: A community-focused pre-publication data release and sharing policy for sequence data

    This is a draft proposal of a policy to encourage pre-publication data release and data sharing within a community. This policy is based on discussions at the Cephalopod Genomics Workshop (a Catalysis workshop sponsored by NESCent).

    Note, this is made available under a CC-BY-SA license permitting use and re-use with …

    read more

    There are comments.

  46. A simple idea: standard but optional review criteria for bioinformatics papers

    Brad Chapman (@chapmanb on twitter) wrote and signed a nice review of my submission to the Bioinformatics Open Source Conference. In his review, he said

    My only small suggestion is to include some discussion about your
    reproducibility work during the talk: the Amazon AMI, documentation
    and reproducible ipython workflows. This …
    read more

    There are comments.

  47. The Parable of the Mad Photocopier

    (I came across this fragmentary blog post that I wrote sometime in December. It's a fine example of a failed allegory. To what, I'll let you determine for yourself. Anyway, in case anyone wants to know what dreck doesn't make it out of my computer onto the Intarweb, well, here's …

    read more

    There are comments.

  48. Paper draft: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

    (updated to point to http://arxiv.org/).

    Authors: Jason Pell, Arend Hintze, Rosangela Canino-Koning, Adina Howe, James M. Tiedje, C. Titus Brown

    Abstract:

    The memory requirements for de novo assembly of short-read shotgun sequencing data from complex microbial populations are an increasingly large practical barrier to environmental studies. Here we …
    read more

    There are comments.

  49. On mentoring

    One of the most important jobs a professor has is to pay it forward: that is, to teach, train, mentor, support, and open up opportunities for their students and postdocs. It's a job that is undervalued by those who focus on the short term -- the administrators and review committees that …

    read more

    There are comments.

  50. Course: Analyzing Next-Generation Sequencing Data (2011 version)

    The second iteration of our bioinformatics summer course, Analyzing Next-Generation Sequencing Data, just finished. It was a great success, at least judging from the comments that people made to us personally; the evaluations aren't yet complete.

    The what: a two week course on analyzing next-gen sequencing data, using the Amazon …

    read more

    There are comments.

  51. Trying out 'cram'

    I desperately need something to run and test things at the command line, both for course documentation (think "doctest" but with shell prompts) and for script testing (as part of scientific pipelines). At the 2011 testing-in-python BoF, Augie showed us cram, which is the mercurial project's internal test code ripped …

    read more

    There are comments.

  52. The last five years

    Looking back, the last 5 years have, collectively, been rather overwhelming.

    Five years ago, I was a big-mouthed 7th year graduate student. The biggest change in my recent life was getting a cat (first) and getting married (second).

    Now, I'm the father of two (adorable) daughters. I have a minivan …

    read more

    There are comments.

  53. CSE 891, Computational Science for Evolutionary Biologists

    Course page at: http://ged.msu.edu/courses/2010-fall-cse-891/:

    This course will introduce biologists to computational thinking, practical computational techniques, and research topics in computational evolution. The course will consist of three intensive hands-on 5-week modules: computational competence in UNIX; data mining and hypothesis generation using the Avida digital life …
    read more

    There are comments.

  54. A memory efficient way to remove low-abundance k-mers from large (metagenomic?) DNA data sets

    I've spent the last few weeks working on a simple solution to a challenging problem in DNA sequence assembly, and I think we've got a nice simple theoretical solution with an actual implementation. I'd be interested in comments!

    Introduction

    Briefly, the algorithmic challenge is this:

    We have a bunch of …

    read more

    There are comments.

  55. Course announcement: Analyzing Next-Generation Sequencing Data

    Analyzing Next-Generation Sequencing Data

    May 31 - June 11th, 2010

    Kellogg Biological Station, Michigan State University

    CSE 891 s431 / MMG 890 s433, 2 cr

    Applications are due by midnight EST, April 9th, 2010.

    Course sponsor: Gene Expression in Disease and Development Focus Group at Michigan State University.

    Instructors: Dr. C. Titus …

    read more

    There are comments.

  56. VMware: vmtn not defined error

    I've spent several hours in the past year trying to debug a frustrating error with the vmware Web UI for Linux. This UI relies on running iceweasel (aka Firefox) on the VM host machine. I think I finally solved the problem today, after much tooth gnashing.

    Briefly, if you get …

    read more

    There are comments.

  57. What's with the goat?

    A new meme was born at PyCon 2010: The Testing Goat.

    Or, "Be Stubborn. Obey the Goat."

    The goat actually emerged from the Testing In Python Birds of a Feather session at PyCon, where Terry Peppers used slides full of goat in his introduction. This was apparently an overreaction to …

    read more

    There are comments.

  58. Managing student expectations for open-source projects

    On the heels of my aggressive competence post, about (among other things) my failure to outline my expectations for students, I've started putting together a page to help manage student expectations for the pony-build project, which is participating in the Undergraduate Capstone Open-Source Projects (UCOSP) course this term.

    (Please comment …

    read more

    There are comments.

  59. A Tale of a Bug

    or, "those python-dev people are awesome."

    My experience with the Python bug tracker has been pretty sparse and largely limited to some of the eternaissues like "make HTMLParser deal with even more broken HTML" that never really get resolved because they're not very important and don't have a champion. So …

    read more

    There are comments.

  60. Lazyweb query: CloudStore (or KosmosFS)

    Does anyone have any experience with CloudStore, formerly known as KosmosFS? From http://en.wikipedia.org/wiki/CloudStore:

    CloudStore (KFS, previously Kosmosfs) is Kosmix's C++ implementation of
    Google File System. ... CloudStore supports incremental scalability,
    replication, checksumming for data integrity, client side fail-over and access
    from C++, Java and Python.
    

    The …

    read more

    There are comments.

  61. A slightly more thoughtful post on diversity

    (Some more meanderings on the brouhaha about diversity in the Python world.)

    First, I've removed 'python' from the tags and made sure that neither Planet Python nor Advogato feed from this blog otherwise; I suspect by talking about politics and feelings in OSS I'm getting further from my normal target …

    read more

    There are comments.

  62. Diversity in a Nutshell

    Since a few people have asked, here's a rough guide to the diversity discussion. No specifics allowed.

    1. diversity list created to (among other things) ponder an official diversity statement for Python. List is closed-archive but open for general subscription.

    2. Various diversity list discussions become heated. Some people (including …

    read more

    There are comments.

  63. GHOP to run again; HELP.

    The contest formally known as GHOP is going to run again this fall, and we need your help.

    Yes, you. YOU, over there in the corner. Stop avoiding this post!

    GHOP, for those of you who don't remember or weren't around 2 years ago, was the very successful pilot sister …

    read more

    There are comments.

  64. Buggy Python code?

    I'm looking for examples of frustratingly simple-yet-wrong Python code, suitable for an undergrad class to debug. I'd prefer things that don't rely on tricky features of Python (like shared list references), but rather code where subtly bad logic or program flow leads to bad behavior.

    Comment below, or e-mail me …

    read more

    There are comments.

  65. Success, at last!

    For only the second time (out of many tries) I managed to smoke some salmon and trout so that it was not overcooked and dry as a bone. Conclusion? I think my smoker thermometer is about 50 deg F off of the true "on grill" temperature, probably because it's about …

    read more

    There are comments.

  66. Upgrading PlanetPlanet.

    OK Folks, I know that planet.python.org and planetpython.org underwent a merger, and during the merger a new, or patched, or somehow upgraded version of planet went into effect on both. However, I cannot find a link to the info post any more.

    I would like to put …

    read more

    There are comments.

  67. Easily Accessible Web-Based Tools For Analyzing Next-Generation Sequencing Data From Agricultural Animals

    Just submitted this on Thursday:

    Next generation sequencers are beginning to impact agricultural biology. Over the next few years, next generation sequencing will produce incredibly large datasets that will address structural (e.g., SNPs, CNVs, indels, methylation, translocations) and functional (e.g., RNA expression, transcription factor binding sites) variation in …
    read more

    There are comments.

  68. Python in the humanities?

    I'm writing some proposals to expand support for Python infrastructure (think cross-platform build and test farms a la Snakebite) and for the Mellon Foundation application, I'd like to find out how Python is being used in the humanities. I found NLTK, the Natural Language Toolkit; what else is big?

    thanks …

    read more

    There are comments.

  69. Software testing in science

    As part of a CiSE submission I'm working on, I interviewed the lead developer on a scientific software package today. This software package is mainly used for evolutionary studies, and has a small but devoted following - ~6 developers and ~12 users locally, plus a few dozen users outside of MSU …

    read more

    There are comments.

  70. Open Source is like a mistress

    Open source coding is like a not-so-demanding mistress: I work on it at night, surreptitiously, after my wife and daughter are asleep. twill and figleaf are like bastard children, who only get attention when I can spare it from my "real" family (my teaching, research or my actual family, depending …

    read more

    There are comments.

  71. Twill lives!

    One of the advantages of this year's PyCon was that it was (again) held in Chicago, the home town of Leapfrog Online. Since they use twill quite a bit, and were bothered by some of the poor design decisions and bugginess, they were keen to get together with me to …

    read more

    There are comments.

  72. Pursuing simplicity

    John Gall apparently said:

    A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with …
    read more

    There are comments.

  73. Twitter Ho!

    OK, I'm going to try out twitter for the first time, in order to see if it works out at PyCon for keeping track of what's going on and letting people know what I'm up to. I guess you have to e-mail me to get in touch with me, though …

    read more

    There are comments.

  74. titus!

    i asked two other friends of mine, [ ... ], for recommendations about model code and about their work environment. their feedback was extremely helpful and i thought you'd be interested to hear the opinions of other good programmers. i would also love to hear your thoughts on these (extracted and scattered) comments …

    read more

    There are comments.

  75. Re Syndication to advogato

    Hey cdfrey, ta0kira, I syndicate to advogato because I wanted to syndicate my posts to PlanetPython as well as advogato, at a time when advogato didn't have the tools to do that. Also, this way I can use my own damn editor, on my own damn laptop, to write the …

    read more

    There are comments.

  76. Goodbye, Paypal

    I'm going to stop using my Paypal account regularly as soon as I finish transferring some money around. They're too screwed up to be reliable, and their customer service is either incompetent or dishonest.

    Luckily, many places are now accepting credit cards or Google Checkout, with which I've had no …

    read more

    There are comments.

  77. What's a good Python code base?

    A friend asks,

    i'm going to be recoding <x> from scratch starting next week, in python.
    what codebase would you recommend as good to model after?
    

    Any thoughts on a well-formed, reasonably sized (yet not huge), and simple Python code base?

    There have to be some examples somewhere! I'd suggest …

    read more

    There are comments.

  78. PyCon review process

    We're going through the PyCon '09 review process, and participating in the process has been pretty interesting. (I joined the Program Committee in large part because I was told to put up or shut up after I critiqued PyCon '08. Ahh, the open source world... where you're encouraged to go …

    read more

    There are comments.

  79. Missive from a Swing State

    I have a habit of occasionally sending odd e-mails to my postdoc lab mailing list, for reasons that I cannot adequately explain. Here's the latest one:

    Dear Bronner-Fraser Lab,
    
    I would like to thank you all for your private letters of support;
    between the blizzards of Colorado, the floods of …
    read more

    There are comments.

  80. Python is a ... little language?

    Here at MSU, we just had a 40th anniversary celebration of the Computer Science department. As it happens, Carl Page (Sr.) was a founding member of CSE at MSU, and so his son, Carl Page (Jr.) came and participated in a panel. In response to my question about what we …

    read more

    There are comments.

  81. Python for Intro CS?

    I'm surprised I haven't seen this on planetpython yet...

    ...an emerging consensus in the scripting community holds that Python is the right solution for freshman programming. Ruby would also be a defensible choice.

    (emphasis mine). Originally found via Lambda the Ultimate, and also passed onto me by Rich Enbody.

    In …

    read more

    There are comments.

  82. Helping Python

    Recently the question came up: suppose you wanted to give enthusiastic people some guidance on how to help work on Python. What suggestions do you have? Surely there's a Web page on this!

    Well, no: a few quick Google searches led me to discover that "contributing to Python" was answered …

    read more

    There are comments.

  83. Serving XML-RPC over HTTPS with Python

    We've been talking about how to manage pygr resources remotely via the existing XML-RPC interface, and for that HTTPS is a requirement. I offered to track down the code necessary for running an XML-RPC server over HTTPS. Here's what I found:

    It turns out that while the Python stdlib supports …
    read more

    There are comments.

  84. Off to MSU - Woo hoo!

    On Thursday, May 15th, I finished my post-doc position at Caltech.

    On Friday, May 16th, I officially started as an Assistant Professor split between Computer Science & Engineering and Microbiology & Molecular Genetics at Michigan State University.

    On Friday evening and Saturday, we hung out down at the Caltech Marine Lab and …

    read more

    There are comments.

  85. E-mail is getting really unreliable

    I've been hit by a few different e-mail-related problems over the last few months, and it's becoming intensely frustrating.

    Some servers seem to randomly drop messages from me, for no obvious reason; at least, people don't get one message and then do get another, a day later. gmail may be …

    read more

    There are comments.

  86. Threading and subprocess

    I'm having a long-running discussion with some people about threading and why using threads with simple subprocess calls is almost certainly an overcomplicated (== BAD) use of threads. Everyone seems to think I'm wrong (at least, there's either deafening silence or straight out argument ;) and I think I finally figured out …

    read more

    There are comments.

  87. Some new terminology?

    In some discussions with a moderately new Python programmer who seems to value complexity over simplicity, I may have coined a new term:

    "Penis size" style of programming -- the (mistaken) belief that the
    more advanced programming language features you use, the more
    impressive your code will look.
    

    I think it's …

    read more

    There are comments.

  88. Google Highly Open Participaton Contest -- another notch in the source code!

    Pavel Vinogradov <fastnix> has been keeping me updated on an issue he discovered while testing TCMalloc with Python as a Google Highly Open Participation (GHOP) task, task 105.

    Briefly, Pavel discovered a situation in which replacing the Python memory allocator with TCMalloc resulted in really bad performance. The latest is …

    read more

    There are comments.

  89. Google's AppEngine OneThousand

    At Google Campfire One, v 2.0 -- introducing AppEngine.

    IT'S FREEZING. The cider ran out. Brr.

    Deploying Web apps is annoyingly difficult. Technical hurdles, etc. Need machines. Blech. Costly.

    AppEngine solves all these problems. Runs web apps, handles app lifecycles, apps are run on Google infrastructure can make use of …

    read more

    There are comments.

  90. Yahoo is bouncing my mail server's e-mail.

    On top of dreamhost dropping off the 'net just when I posted a bunch of screencasts... our socal-piggies meeting nearly got whacked because this month's organizer uses Yahoo, and most of the messages going through my mail server (which hosts the mailing list) were filed as "spam".

    Now Yahoo is …

    read more

    There are comments.

  91. PyCon Tip of the Day

    I'm at PyCon, and I have a tip for people: don't stay at the same hotel as everyone else.

    My hotel room Internet connection is great, perhaps in part because I'm not sharing it with the rest of the PyCon attendees! (Plus it's free -- the DoubleTree wireless Internet doesn't charge …

    read more

    There are comments.

  92. Where are the Zope tutorials?

    I've been a hater of Zope (2) for a while, but I hear great things about Zope 3. I also know that the Zope community contains some of the smartest folks in Python, so I'm sure Zope 3 is worth hearing about.

    So I thought to myself, maybe a 3 …

    read more

    There are comments.

  93. Useless 'net arguments

    The blognet is full of people posting their own opinions, and that's a good thing. What is a little less supportable is flawed argumentation.

    I recently spent some time discussion a post about software engineering; I was trying to figure out why the author thought what he did. The annoying …

    read more

    There are comments.

  94. Building test fixtures for PostgreSQL

    I'm having trouble with some tests of a PostgreSQL-based system. Briefly, I have a set of functional tests that

    • create a new database
    • populate it with a data model
    • run a Web server (in-process)
    • test the integrated Web server - database functionality

    The tests are now slow enough that I'm averse …

    read more

    There are comments.

  95. Dear Lazyweb: Config file guidelines?

    Tracy recently asked me if there were any good guidelines about how to write configuration files -- not coding-level guidelines, but guidelines on structure and content.

    I was unable to come up with anything: my Google-fu failed me, and my DevonThink database was silent (although it did have some nice testing …

    read more

    There are comments.

  96. Principles and Practices of Scientific Origonology

    heh, this applies to many fields, I think...

    Luis Ibanez
    
    This presentation is a satire of the current obsession with
    intellectual property, innovation and originality that plagues
    the field of medical image analysis. The presentation makes the
    point that most Journals and Conferences focus on Originality and
    despise Reproducibility and …
    read more

    There are comments.

  97. Conversions between classes in Python

    My future colleague, Bill Punch at MSU, is teaching Python to intro CS students. He asks (slightly edited):


    In C++, you can write multiple constructors, each one taking a different type and/or number of arguments. Let's say we are writing a RationalNumber class. I could write 2 constructors:

    class …
    read more

    There are comments.

  98. Coming up with GHOP tasks at your Python Interest Group meetings

    If anyone is having a Python Interest Group meeting this month, please consider devoting 15-30 minutes to coming up with random task ideas for the Google Highly Open Participation Contest.

    Briefly,

    • tasks must involve Python and Open Source;
    • non-core pet projects are welcome;
    • building screencasts, updating documentation, and adding unit …
    read more

    There are comments.

  99. Two entertaining quotes

    I'm going through some of my saved up e-mail from the last few months, and found these two gems.

    Noah Gift on grokking threads, from the testing-in-python list:

    Trying to understand what massive pools of threads that spawn other
    massive spools of threads, that spawn other massive pools of threads …
    read more

    There are comments.

  100. On Academia

    So, I've been wanting to write a bit about why grad school and academia in general isn't necessarily a waste of time. Unfortunately I'm now a white male in a power position that depends on me sucking good people into grad school and then exploiting them, so this may end …

    read more

    There are comments.

  101. Projects for people new to Python

    Hi all,

    I need to find some useful projects for new, young contributors, especially in the area of 3rd party packages; we've been thinking about things like porting 3rd party packages to Py3K, adding tests to existing projects, and providing Windows binary eggs for various packages. Everything would be open …

    read more

    There are comments.

  102. Caption contest

    So I'm having a little caption contest for this picture of our daughter, Amarie Rose Brown, who is now nearly 6 weeks old (but only four weeks old in this picture).

    Captions so far:

    1. "We thought she'd be easier to find, this way."
    2. "Well, someone certainly got lucky."
    3. "Google recruiting …
    read more

    There are comments.

  103. Intercepting DNS errors

    charter.net, my home ISP, is intercepting name-not-found errors and replacing them with a host that answers with a Web search page. This leads to some "entertaining" problems when trying to ssh to a host that doesn't exist.

    Is this actually wrong or just sleazy?

    In fact, it actually breaks …

    read more

    There are comments.

  104. Lesson of the day

    If you use sys.settrace to set a tracing function, and that function prints to sys.stdout`, then don't ever trash ``sys.stdout, even briefly. You will raise an invisible exception and your trace function will be removed.

    (I don't know precisely what is supposed to happen when a trace …

    read more

    There are comments.

  105. Yes, Software Is Hard

    I haven't seen Kyle Wilson's article, Software is Hard, making the rounds yet... worth reading!

    --titus


    Legacy Comments

    Posted by Noah Gift on 2007-10-03 at 08:15.

    Great article, at least it is a comfort that some people admit it is
    hard and that "software is hard" is becoming more …
    read more

    There are comments.

  106. Writing Code That Doesn't Suck

    Note: this is ultimately intended for the biology-in-python Wiki at http://bio.scipy.org/. I will release it under a CC license, so please feel free to use it for your own site! --titus

    Here are some prescriptions for writing Python code that other Python programmers will find more usable …

    read more

    There are comments.

  107. Wireless data uplink

    Dear lazyweb,

    does anyone have any thoughts on how to get a relatively inexpensive wireless data uplink to the Internet from a Mac OS X laptop? We're going to be spending some time in a location with decent cell coverage but no wi-fi, and I'd like to be able to …

    read more

    There are comments.

  108. A disappointing SoC experience

    I was the official mentor for a Google Summer of Code student this year -- Martin van Loewis was "technical mentor" -- and I found it to be a disappointing experience. At the beginning, I felt guilty about not being more on the ball about pushing the student to do more work …

    read more

    There are comments.