Skip Navigation

100% code coverage is near-meaningless - but is there a good measure to use?

Is there some formal way(s) of quantifying potential flaws, or risk, and ensuring there's sufficient spread of tests to cover them? Perhaps using some kind of complexity measure? Or a risk assessment of some kind?

Experience tells me I need to be extra careful around certain things - user input, code generation, anything with a publicly exposed surface, third-party libraries/services, financial data, personal information (especially of minors), batch data manipulation/migration, and so on.

But is there any accepted means of formally measuring a system and ensuring that some level of test quality exists?

38 comments
  • But is there any accepted means of formally measuring a system and ensuring that some level of test quality exists?

    Formally? No, this is basically impossible by Rice's Theorem. There is not even a guarantee that if you have 100% test coverage, the program is good (the tests could be flawed).

    This is just a natural limitation of turing completeness. You can't decide these properties while also having full computational power. In order to decide such things, you need a less powerful mode of computation (something not turing complete) that can be analyzed more thoroughly and with more guarantees.

    • That makes sense, thank you. Yes, it's specifically "test quality" I'm looking to measure, as 100% coverage is effectively meaningless if the tests are poor.

      • Yea I'm afraid the only real way to "measure" that is to read through the tests and the code and make a good ol human value judgement on the state of the code and tests. But it won't give you a number.

  • There are tools to detail the code coverage if your tests. I've worked with Istanbul in the past, and it's helped to point out parts of the code that could use more attention

    https://istanbul.js.org/

    • I use coverage tools like nyc/c8, but I can easily get 100% coverage on buggy, exploitable, and unstable code. You can have two projects, both with 100% coverage, and one be a shit show and the other be rock solid - so I was wondering if there's a way to measure quality of tests, or to identify code that really needs extra attention (despite being 100%). Mutation testing has been suggested and that's really interesting, I'm going to give it a go tomorrow and see what it throws up!

  • I am not sure what the common/agreed upon rules are. Seems like it depends on the team lead or manager to decide. Some orgs have better engineers, experience, systems and others don’t.

    I used to follow the 100% coverage because I was told to do so in my start. I found myself chasing semi-colons rather than null references. Luckily, I had a team mate with which we argued a lot about what we did, do, and will do and he helped me. (In a friendly manner, not like Dinesh and Gilfoyd from Silicon Valley).

    Now, I start my tests by going over how the user will use it, e.g. the happy path. Then happy path away. It seems to cover most cases. It helps if you know the business too. (Think messaging system that is intentionally and strictly simple, or one that has a lot of Unicode and language support… fucking emojis hurt me cause I forget they exist even though I use them all the time, I always forget).

    Alas, no matter what, I always miss some test case or a very imaginative user will find a way to show me how wrong I am.

    In the end, I think the best, no matter how big or small the project you’re building is, to do many small PRs (with tests) to your team. This way, things are tested in increments and helps prevent PR burnouts. This I need to get better at myself.

  • Maybe fraction between money spent on writing code versus money spent on testing code?

  • I.prefer to count and report total tests run as part of each build. We get impressive large numbers, but there is no way to put any specific goal on the exact number, we can always go higher.

38 comments