Recently we interviewed for a staff job that involved building bioinformatics data analysis pipelines. We came up with the following interview questions, which seemed to work quite well for a first round interview, & I thought I'd share --
Scenario: you've been maintaining a data analysis pipeline that involves running a shell script by hand. The shell script works perfectly about 95% of the time, and breaks the remaining 5% of the time because of many small issues. This is OK so far because you've been asked to process 1 data set a week and the rest of the time is spent on other tasks. But now the job has changed and you're working 50% or more of your time on this and expected to analyze 100 data sets a month. How would you allocate your time and efforts? Feel free to fill in backstory from your own previous work experiences.
Scenario: You're running the same data analysis pipeline as above, and after two months, you suddenly get feedback from your bosses boss that the results are wrong now. How do you approach this situation?
Bonus: Question 3
You are building a workflow or pipeline with a bunch of software that is incompatible in its dependencies and installation requirements. What approaches would you consider, what kinds of questions would you ask about the workflow and pipeline to choose between the approaches, and what are the drawbacks of the various approaches?