Self development plans

Bode-Museum, Berlin

Data science is a rapidly developing field, and I want to develop a strong set of skills to maximise my options in the future, and to provide the most value to my employers as I can.

In this article about self-development, I’m going to focus on technical skills and exclude things like relationship-building, communicating results, and influence, among others. I think these skills are important, but for this article I want to focus on technical skills.

What do employers want?

Looking at job descriptions, you can see what kind of technical skills employers are looking for. I’ve read maybe 50 senior data scientist / machine learning engineer job descriptions. There are three broad areas that seem to come up frequently on job adverts:

  • Machine learning understanding - statistics, probability, and things like regression, clustering, classification.
  • Software engineering - can you write production-ready deployable code? Includes things like Docker and performance optimisation.
  • What’s hot in LLMs - are you familiar with the latest models, do you know about methods to improve LLM performance like prompt engineering, RAG, chain-of-thought, few-shot learning, structured outputs, and fine-tuning models?

And one that I’ve seen rarely but I think is under-appreciated:

  • Can you quickly build demos and put together a production plan, i.e. how to go from drawing board -> streamlit -> deployment?

Getting the most employability bang for your time investment buck

I’m currently a senior data scientist, and I have a fair amount of knowledge about all of these three areas. But there’s always more to learn!

So how best to keep skills sharp? I’ve asked a few different people and different people suggest different prioritisations of the areas above.

I think it’s probably the case that there’s diminishing returns in all the areas above, e.g. where you need to know about different approaches to classification, but there’s less value to knowing more advanced maths.

I think this means that developing skills in multiple different areas actually is a coherent strategy. You might think that a mixed approach strategy seems indecisive or vague. But if there are diminishing returns, then you get the most bang for your buck in terms of employability from the key skills in each area.

Creating a plan

So what should I do? Let’s think of a three month plan, assuming 5-10 hours of self-dev time per week. Here’s my plan.

  • Weeks 1-4: Code along to at least Karpathy’s first zero to hero video
  • Weeks 5-7: ML theory - identify what maths would be most useful (maybe a single chapter from a textbook, like Simon Prince’s Understanding Deep Learning)
  • Weeks 8-10: Improve software engineering skills - set up a repo using template, use Streamlit to take something from a local notebook (I haven’t used Streamlit before and it’s a popular tool for making rapid prototypes)
  • Weeks 11-13: What’s hot in LLMs: inc RAG, structured outputs, fine-tuning a model, e.g. possibly using the LLM lectures from MIT’s Intro to Deep Learning course.

Here’s my planned outputs at the end of three months:

  • A complete Karpathy mini-GPT transformer model (GitHub repo)
  • Five maths exercises relevant to ML completed, with notes on this blog
  • A deployable Streamlit ML app
  • A practical example using RAG, structured outputs, prompt engineering, and fine-tuning LLMs

Let’s say week 1 is this week - week commencing 19 May 2025. I’m also going to be on leave in the first week of July. Taking that into account, here’s the initial plan:

  • Weeks 1–4: 19 May – 15 June 2025
  • Weeks 5–7: 16 June – 29 June 2025
  • Break: 30 June – 6 July 2025
  • Weeks 8–10: 7 July – 3 August 2025
  • Weeks 11–13: 4 August – 24 August 2025

Accountability, motivation, and reality

Obviously this is a plan, and very likely something will derail these tightly defined dates. The sooner I learn these techniques the better, in part because LLMs are getting better at doing many of them without needing me! But I accept that it’s likely these dates might shift, or I might change the activities.

My plan is to write an update on this blog, after each section, and there’s four sections listed above. So I aim to post four updates on how it goes at each stage. Here we go!




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Building micrograd
  • LLM post-training
  • Run tracker part 2 - AWS Lambda
  • Run tracker MVP
  • Get big things done (in data science)