Recently we interviewed for a staff job that involved building
bioinformatics data analysis pipelines. We came up with the following
interview questions, which seemed to work quite well for a first round
interview, & I thought I'd share --
Question 1
Scenario: you've been maintaining a data analysis pipeline that
involves running a shell script by hand. The shell script works
perfectly about 95% of the time, and breaks the remaining 5% of the
time because of many small issues. This is OK so far because you've
been asked to process 1 data set a week and the rest of the time is
spent on other tasks. But now the job has changed and you're working
50% or more of your time on this and expected to analyze 100 data sets
a month. How would you allocate your time and efforts? Feel free to
fill in backstory from your own previous work experiences.
Question 2
Scenario: You're running the same data analysis pipeline as above, and
after two months, you suddenly get feedback from your bosses boss that
the results are wrong now. How do you approach this situation?
Bonus: Question 3
You are building a workflow or pipeline with a bunch of software that
is incompatible in its dependencies and installation
requirements. What approaches would you consider, what kinds of
questions would you ask about the workflow and pipeline to choose
between the approaches, and what are the drawbacks of the various
approaches?
--titus
There are comments .
Proudly powered by Pelican , which takes great advantage of Python .
The theme is by Smashing Magazine , thanks!