Diff of The Seventy Maxims Of Effective Machine Learning Engineers at ccf8156

@@ -1,70 +1,70 @@
-* Preprocess, then train.
-* A training loop in motion outranks a perfect architecture that isn’t implemented.
-* A debugger with a stack trace outranks everyone else.
-* Regularization covers a multitude of overfitting sins.
-* Feature importance and data leakage should be easier to tell apart.
-* If increasing model complexity wasn’t your last resort, you failed to add enough layers.
-* If the accuracy is high enough, stakeholders will stop complaining about the compute costs.
-* Harsh critiques have their place—usually in the rejected pull requests.
-* Never turn your back on a deployed model.
-* Sometimes the only way out is through… through another epoch.
-* Every dataset is trainable—at least once.
-* A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
-* Do unto others’ hyperparameters as you would have them do unto yours.
-* “Innovative architecture” means never asking, “What’s the worst thing this could hallucinate?”
-* Only you can prevent vanishing gradients.
-* Your model is in the leaderboards: be sure it has dropout.
-* The longer training goes without overfitting, the bigger the validation-set disaster.
-* If the optimizer is leading from the front, watch for exploding gradients in the rear.
-* The field advances when you turn competitors into collaborators, but that’s not the same as your h-index advancing.
-* If you’re not willing to prune your own layers, you’re not willing to deploy.
-* Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised,” and it’ll generate new ones for you to validate tomorrow.
-* If you’re manually labeling data, somebody’s done something wrong.
-* Training loss and validation loss should be easier to tell apart.
-* Any sufficiently advanced algorithm is indistinguishable from a matrix multiplication.
-* If your model’s failure is covered by the SLA, you didn’t test enough edge cases.
-* “Fire-and-forget training” is fine, provided you never actually forget to monitor drift.
-* Don’t be afraid to be the first to try a random seed.
-* If the cost of cloud compute is high enough, you might get promoted for shutting down idle instances.
-* The enemy of my bias is my variance. No more. No less.
-* A little dropout goes a long way. The less you use, the further backpropagates.
-* Only overfitters prosper (temporarily).
-* Any model is production-ready if you can containerize it.
-* If you’re logging metrics, you’re being audited.
-* If you’re seeing NaN, you need a smaller learning rate.
-* That which does not break your model has made a suboptimal adversarial example.
-* When the loss plateaus, the wise call for more data.
-* There is no “overkill.” There is only “more epochs” and “CUDA out of memory.”
-* What’s trivial in Jupyter can still crash in production.
-* There’s a difference between spare GPUs and GPUs you’ve accidentally mined Ethereum on.
-* Not all NaN is a bug—sometimes it’s a feature.
-* “Do you have a checkpoint?” means “I can’t fix this training run.”
-* “They’ll never expect this activation function” means “I want to try something non-differentiable.”
-* If it’s a hack and it works, it’s still a hack and you’re lucky.
-* If it can parallelize inference, it can double as a space heater.
-* The size of the grant is inversely proportional to the reproducibility of the results.
-* Don’t try to save money by undersampling.
-* Don’t expect the data to cooperate in the creation of your dream benchmark.
-* If it ain’t overfit, it hasn’t been trained on enough epochs.
-* Every client is one missed deadline away from switching to AutoML, and every AutoML is one custom loss function away from becoming a client.
-* If it only works on the training set, it’s defective.
-* Let them see you tune the hyperparameters before you abandon the project.
-* The framework you’ve got is never the framework you want.
-* The data you’ve got is never the data you want.
-* It’s only too many layers if you can’t fit them in VRAM.
-* It’s only too many parameters if they’re multiplying NaNs.
-* Data engineers exist to format tables for people with real GPUs.
-* Reinforcement learning exists to burn through compute budgets on simulated environments.
-* The whiteboard is mightiest when it sketches architectures for more transformers.
-* “Two dropout layers is probably not going to be enough.”
-* A model’s inference time is inversely proportional to the urgency of the demo.
-* Don’t bring BERT into a logistic regression.
-* Any tensor labeled “output” is dangerous at both ends.
-* The CTO knows how to do it by knowing who Googled it.
-* An ounce of precision is worth a pound of recall.
-* After the merge, be the one with the main branch, not the one with the conflicts.
-* Necessity is the mother of synthetic data.
-* If you can’t explain it, cite the arXiv paper.
-* Deploying with confidence intervals doesn’t mean you shouldn’t also deploy with a kill switch.
-* Sometimes SOTA is a function of who had the biggest TPU pod.
-* Failure is not an option—it is mandatory. The option is whether to let failure be the last epoch or a learning rate adjustment.
\ No newline at end of file
+*. Preprocess, then train.
+*. A training loop in motion outranks a perfect architecture that isn’t implemented.
+*. A debugger with a stack trace outranks everyone else.
+*. Regularization covers a multitude of overfitting sins.
+*. Feature importance and data leakage should be easier to tell apart.
+*. If increasing model complexity wasn’t your last resort, you failed to add enough layers.
+*. If the accuracy is high enough, stakeholders will stop complaining about the compute costs.
+*. Harsh critiques have their place—usually in the rejected pull requests.
+*. Never turn your back on a deployed model.
+*. Sometimes the only way out is through… through another epoch.
+*. Every dataset is trainable—at least once.
+*. A gentle learning rate turneth away divergence. Once the loss stabilizes, crank it up.
+*. Do unto others’ hyperparameters as you would have them do unto yours.
+*. “Innovative architecture” means never asking, “What’s the worst thing this could hallucinate?”
+*. Only you can prevent vanishing gradients.
+*. Your model is in the leaderboards: be sure it has dropout.
+*. The longer training goes without overfitting, the bigger the validation-set disaster.
+*. If the optimizer is leading from the front, watch for exploding gradients in the rear.
+*. The field advances when you turn competitors into collaborators, but that’s not the same as your h-index advancing.
+*. If you’re not willing to prune your own layers, you’re not willing to deploy.
+*. Give a model a labeled dataset, and it trains for a day. Take its labels away and call it “self-supervised,” and it’ll generate new ones for you to validate tomorrow.
+*. If you’re manually labeling data, somebody’s done something wrong.
+*. Training loss and validation loss should be easier to tell apart.
+*. Any sufficiently advanced algorithm is indistinguishable from a matrix multiplication.
+*. If your model’s failure is covered by the SLA, you didn’t test enough edge cases.
+*. “Fire-and-forget training” is fine, provided you never actually forget to monitor drift.
+*. Don’t be afraid to be the first to try a random seed.
+*. If the cost of cloud compute is high enough, you might get promoted for shutting down idle instances.
+*. The enemy of my bias is my variance. No more. No less.
+*. A little dropout goes a long way. The less you use, the further backpropagates.
+*. Only overfitters prosper (temporarily).
+*. Any model is production-ready if you can containerize it.
+*. If you’re logging metrics, you’re being audited.
+*. If you’re seeing NaN, you need a smaller learning rate.
+*. That which does not break your model has made a suboptimal adversarial example.
+*. When the loss plateaus, the wise call for more data.
+*. There is no “overkill.” There is only “more epochs” and “CUDA out of memory.”
+*. What’s trivial in Jupyter can still crash in production.
+*. There’s a difference between spare GPUs and GPUs you’ve accidentally mined Ethereum on.
+*. Not all NaN is a bug—sometimes it’s a feature.
+*. “Do you have a checkpoint?” means “I can’t fix this training run.”
+*. “They’ll never expect this activation function” means “I want to try something non-differentiable.”
+*. If it’s a hack and it works, it’s still a hack and you’re lucky.
+*. If it can parallelize inference, it can double as a space heater.
+*. The size of the grant is inversely proportional to the reproducibility of the results.
+*. Don’t try to save money by undersampling.
+*. Don’t expect the data to cooperate in the creation of your dream benchmark.
+*. If it ain’t overfit, it hasn’t been trained on enough epochs.
+*. Every client is one missed deadline away from switching to AutoML, and every AutoML is one custom loss function away from becoming a client.
+*. If it only works on the training set, it’s defective.
+*. Let them see you tune the hyperparameters before you abandon the project.
+*. The framework you’ve got is never the framework you want.
+*. The data you’ve got is never the data you want.
+*. It’s only too many layers if you can’t fit them in VRAM.
+*. It’s only too many parameters if they’re multiplying NaNs.
+*. Data engineers exist to format tables for people with real GPUs.
+*. Reinforcement learning exists to burn through compute budgets on simulated environments.
+*. The whiteboard is mightiest when it sketches architectures for more transformers.
+*. “Two dropout layers is probably not going to be enough.”
+*. A model’s inference time is inversely proportional to the urgency of the demo.
+*. Don’t bring BERT into a logistic regression.
+*. Any tensor labeled “output” is dangerous at both ends.
+*. The CTO knows how to do it by knowing who Googled it.
+*. An ounce of precision is worth a pound of recall.
+*. After the merge, be the one with the main branch, not the one with the conflicts.
+*. Necessity is the mother of synthetic data.
+*. If you can’t explain it, cite the arXiv paper.
+*. Deploying with confidence intervals doesn’t mean you shouldn’t also deploy with a kill switch.
+*. Sometimes SOTA is a function of who had the biggest TPU pod.
+*. Failure is not an option—it is mandatory. The option is whether to let failure be the last epoch or a learning rate adjustment.
\ No newline at end of file