equivariant

Neural network model extraction

Natural language models can be "stolen" from a black box API. The attacker provides random sentences, then uses the output of the API to train its own model. The authors found the following:

Such an attack could be used in the following scenarios:

References

  1. https://www.cleverhans.io/2020/04/06/stealing-bert.html
  2. K. Krishna, G. S. Tomar, A. P. Parikh, N. Papernot, and M. Iyyer, “Thieves on Sesame Street! Model Extraction of BERT-based APIs,” arXiv:1910.12366 [cs], Jan. 2020, Accessed: Apr. 08, 2020. [Online]. Available: https://arxiv.org/abs/1910.12366
  3. G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv:1503.02531 [cs, stat], Mar. 2015, Accessed: Apr. 08, 2020. [Online]. Available: https://arxiv.org/abs/1503.02531
  4. F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing Machine Learning Models via Prediction APIs,” arXiv:1609.02943 [cs, stat], Oct. 2016, Accessed: Apr. 08, 2020. [Online]. Available: https://arxiv.org/abs/1609.02943