Join us for our weekly series of short talks: nf-core/bytesize.

Just 15 minutes + questions, we focus on topics about using and developing nf-core pipelines. These are recorded and made available at https://nf-co.re , helping to build an archive of training material. Got an idea for a talk? Let us know on the #bytesize Slack channel!

This week, Christopher Hakkaart (@christopher-hakkaart) will discuss the results of the nextflow/nf-core community survey.

Video transcription **Note: The content has been edited for reader-friendliness**

0:01 (host) Thank you everyone for joining. Sorry for the slightly slow start. I think we’re up and running now. I’d like to welcome Chris Hakkaart to talk today. He’s going to tell us all about the community survey that we do every year for Nextflow and nf-core. There’s already been a bit of noise about the results and everything, and thank you to everyone who submitted. Chris is going to delve in and give us some juicy insights. Looking forward to hearing about that. Chris is working at Seqera Labs as a developer advocate for the Nextflow and nf-core communities. Thank you very much for speaking, Chris, and I’ll hand over to you.

0:36 Thank you. Hopefully everyone can now see see my slides. Thank you for having me. What I will be going through today is some of the results from the state of the workflow survey. The information I have taken from the survey has a real community flavor. Things relevant to Nextflow and the nf-core community in particular. As Phil already mentioned, there was a lot of noise being made on the different channels, on Twitter, on Slack, asking people to fill this in. It’s really important that people do. A part of this is because of the Chan Zuckerberg Initiative grants that we’ve got, which are really there to help us increase the reach of nf-core and Nextflow, making sure that people who do want to come along to the community can join and get over the overheads of actually joining the community and making sure they understand what’s going on. As a part of that, of course, we have the mentorship program, looking at the ambassador program, but there’s a lot of initiatives that go on. The survey is really a way of helping to measure how we’re dealing with things like that.

1:43 You might be asking another survey, didn’t I just do one? Yes, absolutely. This is the 2023 community survey results. This is the third year of doing the survey. From 2022 onwards, we started to ask extra questions about the nf-core community in particular. This year was the largest survey to date, there were 502 responses to the survey. This is up by 31% from 2022, in 2022 we had slightly over 300. The 502 responses is more than double that of 2021. Each year we’re getting more responses, which is fantastic.

2:23 When we look at where people are coming from, the region or the country that the respondents told us they live. In 2022, we had 20 different countries that were listed from participants, and this year is 47. You can see that we’ve got a much larger geographic reach. We’re having responses from many, many more countries, more than double that of 2022. What you might also notice there is that most of the responses come from America and Europe. You’ll see that America is still number one with most responses coming from there, and then Europe makes up the rest of the top six, especially in 2022. Some special mentions go out to India, Belgium, and I think it was Serbia who have all made the top 16 for the first time. When we look at the survey participants, when we look at the age, we see that most people are younger than 40. You’ll see that 37% were under the age of 30. This is slightly up from 2022, so generally it’s trending a little bit younger.

3:28 When we asked the participants for the survey what languages they were proficient in, so this is slightly different to the question that we asked in 2022, which was your most proficient language. This time we’ve asked for all of the languages that people are proficient in. You’ll see that English is still number one, so 99% of people who responded to the survey are proficient in English. But you can see there’s a lot of other languages there that people are proficient as well. This is really important information to help us decide how we can, when we’re doing things like translating documentation or doing training in other languages. 99% is obviously just about everyone, but there is an important 1% there that are not proficient in English, so having information like this really helps us prioritize our efforts. When we looked at gender, you’ll see that it is still predominantly males that responded to the survey, who are Nextflow users, but the female representation did increase slightly, so it’s up to 26%, which I think is about a 3% increase from 2022, as well as 1% of other who didn’t identify as either male or female.

4:28 When we look at the roles, or how people define their roles, most people define themselves as bioinformaticians, which is the same as previous years. It is down slightly from 2022, which is about 70%. That 3% was spread across an increase in PIs/managers, software engineers, and data scientists. When we asked people what their interests were when using Nextflow, most people said genomics, however transcriptomics, metagenomics, and proteomics were still all quite prevalent. All of these are obviously within life sciences, but when we dig into this other group, you’ll see that there are many other fields outside of the life sciences that Nextflow is being used for these as well. When we asked people how they defined the industry that they belong to, most people said they came from academia, but you’ll see that biotech startups and research institutions are still quite prevalent as well, as well as healthcare and clinical.

5:24 Moving on to how long people have been using workflow managers and Nextflow. When we looked at the years working with workflows. This is a little bit ambiguous in terms of, is this just someone writing a bash script, strapping a few tools together, or have they been using other workflow managers for some amount of time? It’s a little bit hard to tell this apart, but you can see that people have been using workflows for some time, 8% for 10 years or more. When we asked how long you’ve been using Nextflow, you’ll see that most people are very new to Nextflow with less than one year of experience, and of course a very few who have been with Nextflow from the start from 6 to 10 years, remembering that Nextflow turned 10 earlier this year. When we asked if you are using other workflow managers, you’ll see that most people are using more than one workflow manager, which is expected, and a little bit of variety definitely doesn’t hurt. When we asked about your preferences for using Nextflow, most people are running the analysis using Nextflow themselves, how the others are running analysis for others, and others have written their own custom in-house workflows as well. These questions weren’t mutually exclusive, so you could tick multiple boxes for these, so you’ll see that most people have multiple roles when they’re using Nextflow as well. Importantly, about 25%, 24%, because it’s been rounded down, actually contribute to nf-core pipelines, which is really important as well, and really great to see that so many Nextflow users are joining the nf-core community and giving back as well.

6:58 When we asked people the workflows that they actually run, so not just developing, but actually just running them, you’ll see that most people are actually running nf-core workflows, which is great. All the workflows that are being developed as a part of nf-core are getting used by a lot of Nextflow users. Of course, a lot of people are still developing their own workflows as well, as well as others, some are using workflows that are developed by others in their group or other outsourced developers. This is a meme that was on the Slack channel today, which I thought was quite nice, so I’ll give a shoutout to James for making that as well.

7:34 When we asked what you find useful when you are learning Nextflow, most people said the reference docs. This is a weighted average, so this graph can be a little bit misleading and difficult to understand, but roughly what’s happening here with these weighted average graphs is that when people respond positively, saying it’s very useful, it drags the score up, and if it’s not useful, it’d be below the line, near or below zero. In this case, when we looked at these numbers in more detail, 89% of people said they find the Nextflow docs very useful, and 73% find the nf-core docs very useful as well. It’s really important that those nf-core documents are developed as well, and I think they are, which is really fantastic. If you look at the rest of these data points in detail, a lot of people said that they were indifferent, so they weren’t either useful or not useful, probably because they weren’t using these particular methods for learning Nextflow. But from digging into the style a little bit more, you’ll find that people found all of these resources very useful. It’s just some people aren’t using all of them.

8:42 When we asked how you get help when you have a problem with Nextflow, most people are reaching out on the nf-core Slack, so that probably feeds into a lot of the people that responded to the survey potentially being nf-core developers or trying to use an nf-core pipeline, something we’ve seen in the data already. What’s also interesting is that there’s been this really huge uptake in people using the Nextflow Slack, which is about one year old now, so to see that being quickly and widely adopted is really important. When we asked if you had attended a training before, 36% answered no, but they would like to, while 32% said yes, and it was one of the community trainings run by nf-core. Earlier this year, in March, we had the Nextflow and nf-core online community training, and all this is still on YouTube, along with all the training material. If you do belong to that 36%, all the training material is there, whether if you’re a part of that 32%, you might have attended this training already. But it’s really great to see those community training events being used so heavily by the community and Nextflow users. That was very nice.

9:48 When we asked about how you launch your workflows, most people are launching from the command line, so 77%, with a very small but important part of the community using things like Tower, as well as other in-house platforms. When we asked the infrastructure that you’re running this on, most people were doing on-prem clusters, but there is an increasing migration to the cloud, and that’s something that we’ve seen over the last few years of the survey, is that people are quickly adopting cloud, and we can delve into those details in the survey blog post as well. When we asked what was important to you, so as a Nextflow user, as a developer, documentation was number one, so people find the documentation really important, but things like performance at scale, ease of installation, as well as the pipelines and data, portability, community adoption, all of these things were important. This is another weighted average, so anything above the line is important. Where the people found that in particular the commercial support wasn’t overly important, as well as the graphical user interface wasn’t a priority for people.

10:55 63% of people reported that they felt frustrated with Nextflow, which I think is normal. When we looked at the responses, so this was the qualitative part of the response, people said things like the Groovy language, debugging error messages, unclear documentation, having a large cache. All of these things come up regularly, we see this in the Nextflow Slack, but we do take this very seriously. It does help us prioritize the features that need to be fixed or improved on as a part of Nextflow and nf-core. Looking at that in reverse, when we asked what are the features people would like to see, they’d like to see Nextflow in other languages other than Groovy, more obvious error messages, better documentation, and a way of removing intermediate files. This is the complete inverse of what people felt frustrated with, which makes a lot of sense. There are also requests for things like ability to optimize resources, submit Java arrays, more regular community trainings, visibility to write unit tests. I just want to reinforce as well is that we hear you, this is all really important feedback and the developers take this really seriously when they think about what features they’ll be adding to Nextflow.

12:09 But I guess the bottom line and probably the most important thing is that 99% of people are satisfied with Nextflow. People coming to use Nextflow are really happy. This is up from 98% in 2022. This is a really fantastic result. This is a picture from a great Australian movie called The Castle. It’s just the vibe of it as a summary, but Nextflow users are very satisfied. Nextflow has a growing user base with increased diversity, which is really important. Nextflow is experiencing rapid adoption and growth. We see that with a lot of Nextflow users having less than one year experience. The nf-core community is especially valued, things like the community training and outreach that is done through the nf-core Slack channel, for example, is incredibly important. The survey has been really helpful and helped guide development of new features for Nextflow, as well as nf-core in the future. That’s the end of the presentation. If there are any questions, I’ll be happy to answer them.

13:09 (host) Thanks very much, Chris. That’s really good. Just to leave it open, anyone, feel free to unmute yourself or ask a question or drop a question into the chat and I can relay it. I liked all the memes, by the way, Chris, flashing up. I’m going to have to go back through the recording at half speeds to try and catch some of those.

(speaker) I should have dwelled on those. Same to break up all the green, I think.

(question) I can kick off, maybe, if you could go back a couple of slides to the things that people wanted to see. I thought it was quite interesting that like here, nearly all of these have got things which are being actively developed or coming out soon. Do you want to just mention a couple of those?

(answer) Yeah, absolutely. If anyone’s actually ever dug around on the Nextflow GitHub, for example, we see that there are a lot of branches that are actively developing some of these particular features. Java arrays in particular, that’s been requested for a while, and there’s been some really good development on that recently. I’m not sure if there’s a date that that might be available, but that is a feature as an example that will be coming out in the future. Things like the more regular community training, from a developer advocate perspective, we know how important the training is, and there will be more training and more training resources in the near future. It’s something that we’re really prioritizing and see the value of and having available for the community as well. I’m not as up to date with what’s happening with unit tests. I’m not sure if that’s under active development or not. But in terms of optimizing resources, there are features through Tower, for example, that’s already doing that. If you’re using Tower, you can click the button there and have an optimized resource primitives file created for you.

(comment) The unit tests, I was thinking about the nf-test framework, which is a nice way to write unit tests for the Nextflow pipelines. It’s not from a core Nextflow team, it’s a community tool, but it’s being picked up and used in nf-core. At the moment, we’ve got an nf-test channel on the nf-core slack. If you want to chat about it or just Google it, you’ll find out. It’s got really nice documentation. The optimized resources one, I think that’s not very well known, but if you’re a Nextflow Tower user, once you’ve run a workflow on Tower, then it should come up with a little button saying optimize resources. That will build a customized config file for you based on what that run used, which should be optimized for future runs. It’s not like it’s still very much a work in progress. It’s a preview feature, that one. The UI and stuff has some improvements to be made still, but it might be useful for those of you interested in that as a feature.

16:00 (questions) There’s a question just come in on the chat. An excellent presentation. Please make an example custom config files for all possible tools used in the pipeline. I’m not totally clear what that question means, really. Because we’ve… do you have any thought about that one?

(answer) I guess just expanding what Phil has mentioned with the with the feature in Tower. After you have executed a run and it’s run to completion, you’ll be given the option to go back and make this this custom config file, which would give you the resources that it will take into account that the resources we use on your previous runs. It’ll come up with a new config file that will say, hey, look, you requested, 50 gigs or 50 CPUs, but for this you actually need two or this much memory or whatever else. This will go through all the tools that are part of the pipeline for that. It’s really nice feature. I think if you go into the community showcase for Tower that you’ll be able to see that in action as well, which is probably the best way to see and understand it is just to go and get hands on as part of that.

17:25 (host) Any more questions?

(comment) Like I said, try screen share the optimization thing. It’s not very obvious. I can just steal the screen share just super fast. This is Tower. This is the testing, actually… Okay, let’s go into verified pipelines. I’ve been going to existing runs here which have already gone. If I take one here you can see as a button saying optimization available. This is a demo pipeline so it doesn’t have anything in it. Easy as one process which is called Say Hello, and this is saying, only needs one CPU. You can imagine if this is a nf-core pipeline that’d be like, you know, 100 different processes listed here with CPUs and memory for each one. You can manually copy that out, stick that in a custom config file for the next one.

(host) Great. Right, thanks very much, Chris. Pleasure to have you here and thank you everyone for joining we’ll see you for the next nf-core bytesize talk soon.