When I read about these things I always think about some of the writing of @pluralistic on graceful failure modes. A product (system) is not defined by its success but by how good or poorly it fails. I've been teaching students that not considering (poor) failure modes is a huge liability.

https://arstechnica.com/gadgets/2024/12/nightmare-zipcar-outage-is-a-warning-against-complete-app-dependency/

#failure #scalability #devops #management #governance

@koen_hufkens Had a discussion about this with a fellow veteran-engineer this weekend, about how designers are neglecting the basic #UX concept that systems should fail to manual.

I have a touch-activated faucet in my new home which, when batteries fail, just doesn't let you have water. Fail-to-manual is a much more sensible approach than locked out of your car, or can't have water.

@pluralistic

@koen_hufkens @pluralistic Not only is this poor failure handling on the engineering side, it highlights a further issue: the attitude of "it doesn't matter, we'll refund the users if it happens" forgets that failures are likely to lead to consequential losses and even physical danger - or ignores that because of a click-through get-out clause
+ -
@pluralistic Many of these problems originate from shifting failure modes from a focus on weak-link problems to strong-link problems, to increase profit.

Weak-linked problems are defined by their worst performance, while strong-linked problems are defined by their best performance. Problems don't reside strictly in either category, but when dealing with infrastructure (which isn't an easily replaced discretionary purchase) the focus should not deviate too far from a weak-link assumption.