{"id":1062,"date":"2025-01-09T07:03:59","date_gmt":"2025-01-09T07:03:59","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/09\/statistical-learnability-of-strategic-linear-classifiers-a-proof-walkthrough-e80db99d6c4e\/"},"modified":"2025-01-09T07:03:59","modified_gmt":"2025-01-09T07:03:59","slug":"statistical-learnability-of-strategic-linear-classifiers-a-proof-walkthrough-e80db99d6c4e","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/09\/statistical-learnability-of-strategic-linear-classifiers-a-proof-walkthrough-e80db99d6c4e\/","title":{"rendered":"Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough"},"content":{"rendered":"<p>    Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>With the help of an intricate geometric construction, we can prove that instance-wise cost functions quickly drive SVC to infinity.<\/h4>\n<p>In the <a href=\"https:\/\/towardsdatascience.com\/quantifying-the-complexity-and-learnability-of-strategic-classification-problems-fd04cbfdd4b9\">previous article in this series<\/a>, we examined the concept of <em>strategic VC dimension<\/em> (SVC) and its connection to the <em>Fundamental Theorem of Strategic Learning<\/em>. We will make use of both of those in this article, alongside the ideas of <em>achievable labelings<\/em> and <em>strategic shattering coefficients<\/em> that we explored in the lead-up to\u00a0them.<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/quantifying-the-complexity-and-learnability-of-strategic-classification-problems-fd04cbfdd4b9\">Quantifying the Complexity and Learnability of Strategic Classification Problems<\/a><\/p>\n<p><strong>If you haven\u2019t read the <\/strong><a href=\"https:\/\/towardsdatascience.com\/extending-pac-learning-to-a-strategic-classification-setting-6c374935dde2\"><strong>first article in this series<\/strong><\/a><strong> yet, I encourage you to start from there before moving on to the aforementioned article on\u00a0SVC:<\/strong><\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/extending-pac-learning-to-a-strategic-classification-setting-6c374935dde2\">Extending PAC Learning to a Strategic Classification Setting<\/a><\/p>\n<p>With the context from the other articles in this series in place, <strong>a basic grasp of set theory and geometry will be all you\u2019ll need to understand the theorem and its\u00a0proof.<\/strong><\/p>\n<h4>How an Instance-Wise Cost Function Affects SVC: Stating the\u00a0Theorem<\/h4>\n<p>As we saw, SVC can be used as a tool to estimate the expressive power of a hypothesis class within a strategic classification context. Having carefully defined SVC as a generalization of the canonical VC dimension, we understand that the two have much in common. When, though, does SVC diverge from its canonical counterpart? Can we come up with a scenario in which the strategic aspect of a classification problem significantly increases its complexity?<strong> It turns out we can, with relative ease: linear classification.<\/strong><\/p>\n<p>Linear classification involves determining whether a data point should be positively or negatively classified based on a linear function applied to its features. Geometrically, we can imagine a linear classifier inducing a linear decision boundary in <em>d<\/em>-dimensional real space (\u211d<em>\u1d48 <\/em>). Anything on one side of the boundary is positively classified, and anything on the other side is negatively classified. In one-dimensional space, the decision boundary is a threshold (<a href=\"https:\/\/towardsdatascience.com\/quantifying-the-complexity-and-learnability-of-strategic-classification-problems-fd04cbfdd4b9#4322\">as we saw in the previous article<\/a>). In two-dimensional space, it\u2019s a line dividing the plane. In general, it\u2019s a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hyperplane\">hyperplane<\/a>.<\/p>\n<p><strong>In canonical binary classification, the VC dimension of the hypothesis class comprising all linear classifiers in \u211d<em>\u1d48 <\/em>is <em>d + 1<\/em><\/strong>, <strong>which is finite.<\/strong> For example, for <em>d<\/em> = 2 (linear classifiers in <strong>\u211d<\/strong>\u00b2), the VC dimension is 3. The <a href=\"https:\/\/www.cs.princeton.edu\/courses\/archive\/spring16\/cos511\/lec17.pdf\">Fundamental Theorem of Statistical Learning<\/a> dictates that <strong>canonical linear classification is therefore PAC learnable.\u207d\u00b9\u207e<\/strong><\/p>\n<p><strong>Intuitively, we might expect the same conclusion to hold for the strategic analog of the problem.<\/strong> After all, linear classifiers are some of the simplest classifiers there are, and reasoning about them can be rather natural.<strong>\u207d\u00b2\u207e<\/strong><\/p>\n<p><strong>However, that simplicity goes out the window as soon as we throw <em>instance-wise<\/em> cost functions into the mix. As we will\u00a0prove:<\/strong><\/p>\n<blockquote><p>\n<em>Given a strategic linear classification problem S\u1d1b\u0280\u1d00\u1d04\u27e8<\/em>H<em>, <\/em>R<em>, <\/em>c<em>\u27e9, <\/em><strong><em>there exists an<\/em><\/strong><em> <\/em><strong><em>instance-wise cost function<\/em><\/strong><em> <\/em><strong>c<em>(<\/em>z<em>; <\/em>x<em>)<\/em><\/strong><em> <\/em><strong><em>= \u2113<\/em>\u2093<em>(<\/em>z &#8211; x<em>)<\/em><\/strong><em> <\/em><strong><em>such that SVC(<\/em>H<em>, <\/em>R<em>, <\/em>c<em>) =\u00a0<\/em>\u221e.<\/strong>\n<\/p><\/blockquote>\n<p>In other words, using the <a href=\"https:\/\/towardsdatascience.com\/quantifying-the-complexity-and-learnability-of-strategic-classification-problems-fd04cbfdd4b9#8540\"><em>Fundamental Theorem of Strategic Learning<\/em><\/a>, we find that <strong>linear classification in a strategic setting equipped with an <em>instance-wise<\/em> cost function is not generally PAC learnable.<\/strong> Interestingly, it will not be PAC learnable even if we strip away as much complexity as we can. In this case, we will do so by <strong>focusing on strategic linear classification on the Cartesian plane ( <em>X <\/em>\u2286 \u211d\u00b2) with the homogeneous preference class (<em>R<\/em> = { 1\u00a0}).<\/strong><\/p>\n<p><strong>The more general conclusion will follow from the counterexample we will show under those simplifying conditions. <\/strong>If strategic linear classification is not PAC learnable in \u211d\u00b2, there is no way it could be PAC learnable in any higher dimension. Likewise, every other preference class we laid out in our setup is a strict generalization of the homogeneous preference class. If we could prove PAC learnability for any of those preference classes, we would also be able to do so for the simpler case where <em>R<\/em> = { 1\u00a0}.<\/p>\n<h4>From Labelings to Points on the Unit Circle: Proof\u00a0Setup<\/h4>\n<p>Based on the assumptions above, we begin by turning our attention to the special case <strong>S\u1d1b\u0280\u1d00\u1d04\u27e8<em>H\u2097<\/em>, { 1 }, <em>c<\/em>\u27e9, <\/strong>with <em>H\u2097 <\/em>being the hypothesis class comprising all linear classifiers in \u211d\u00b2. We then initialize <em>n<\/em> two-dimensional feature vectors at the origin: \u2200 <em>i<\/em> \u2264 <em>n\u00a0<\/em>. <em>x\u1d62<\/em> = (0, 0). Since we\u2019re using the homogeneous preference class, we have that \u2200 <em>i<\/em> \u2264 <em>n\u00a0<\/em>. <em>r\u1d62<\/em> = 1. <strong>The only difference between the data points will be in how our cost function behaves on each of them.<\/strong> This is where the crux of the proof lies, as we will soon\u00a0see.<\/p>\n<p>Before we discuss the cost function at length, though, <strong>we need to <em>geometrize <\/em>the <\/strong><a href=\"https:\/\/towardsdatascience.com\/quantifying-the-complexity-and-learnability-of-strategic-classification-problems-fd04cbfdd4b9#4322\"><strong>possible labelings<\/strong><\/a><strong> of our unlabeled data points.<\/strong> As we saw <a href=\"https:\/\/towardsdatascience.com\/quantifying-the-complexity-and-learnability-of-strategic-classification-problems-fd04cbfdd4b9\">last time<\/a>,<strong> a set of <em>n<\/em> unlabeled data points must have exactly 2<em>\u207f<\/em> possible labelings.<\/strong> Representing a set of labelings (<em>n<\/em>-tuples) geometrically in \u211d\u00b2 is relatively straightforward: we simply select an arbitrary point for each possible labeling. In particular, we will <strong>choose 2<em>\u207f<\/em> such representative points<\/strong> <strong><em>on the unit circle<\/em>,<\/strong> <strong>each assigned to a possible labeling.<\/strong> While the particular coordinates of the representative points themselves are unimportant, <strong>we <em>do <\/em>require that each such point be unique.<\/strong> We also require that no two points be origin-symmetric with respect to one\u00a0another.<\/p>\n<p>We will <strong>denote this set of representative points by <em>S<\/em>.<\/strong> Having selected our representative points, we use them to define the <strong>origin-symmetric set <em>S\u2019,<\/em><\/strong> i.e., <em>S\u2019<\/em> = { (-<em>x<\/em>, &#8211;<em>y<\/em>)\u00a0: (<em>x<\/em>, <em>y<\/em>) \u2208 <em>S <\/em>}. Note that <em>S<\/em> and <em>S\u2019 <\/em>are disjoint (<em>S<\/em> \u2229 <em>S\u2019<\/em> = \u2205) as a consequence of how we selected the points in\u00a0<em>S<\/em>.<\/p>\n<p>For a particular <em>x<\/em>\u1d62, <strong>we define <em>S\u1d62 <\/em>as the subset of S that includes only the points that represent labelings in which <em>x<\/em>\u1d62 is positively classified. <\/strong>Similarly, we derive the origin-symmetric <em>S\u1d62\u2019 <\/em>\u2282 <em>S\u2019<\/em> from <em>S\u1d62<\/em>. In the example below, the points above the <em>x<\/em>-axis are those representing labelings in which <em>x<\/em>\u1d62 is positively classified, i.e., <em>S\u1d62<\/em>. Those below the <em>x<\/em>-axis comprise their origin-symmetric set <em>S\u1d62\u2019<\/em> (with the numbering matching between symmetric pairs of points). Note that the selection of points above the <em>x<\/em>-axis is completely arbitrary.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/776\/1%2A-AenB5mfrqb6jxd2y6XDEA.png?ssl=1\"><figcaption><strong>Figure 1:<\/strong> An example of labeling geometrization for an arbitrary <em>x<\/em>\u1d62. Recall that we started by initializing all unlabeled data points as (0, 0). Points in <em>S\u1d62 are numbered in black. Their origin-symmetric counterparts in S\u1d62\u2019 are numbered in blue. <\/em>Image by the author, based on image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\"><strong>PAC-Learning for Strategic Classification<\/strong><\/a> (use under<a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"> CC-BY 4.0 license<\/a>).<\/figcaption><\/figure>\n<p><strong>We proceed to construct a convex polygon <em>G\u1d62<\/em>, with its vertices being the points in <em>S\u1d62<\/em> \u222a <em>S\u1d62\u2019<\/em>.<\/strong> The <em>G\u1d62<\/em> for each unlabeled data point will be key in designing an instance-wise cost function <em>c<\/em> with which we will always be able to achieve all possible labelings, thus showing that SVC(<em>H\u2097<\/em>, { 1 }, <em>c<\/em>) = \u221e. Towards this end, the convexity of <em>G\u1d62 <\/em>will prove critical, as will its origin symmetry (stemming from our choice of <em>S\u1d62\u2019\u00a0<\/em>).<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/776\/1%2A5HfGStJrf3PXZAgA9U4jVA.png?ssl=1\"><figcaption><strong>Figure 2: <\/strong>The convex, origin-symmetric polygon G\u1d62 derived from <em>S\u1d62 and S\u1d62\u2019 as shown in Figure 1. <\/em>Image by the author, based on image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\"><strong>PAC-Learning for Strategic Classification<\/strong><\/a> (use under<a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"> CC-BY 4.0 license<\/a>).<\/figcaption><\/figure>\n<h4>Turning Polygons into Preferences: Constructing the Cost\u00a0Function<\/h4>\n<p>For each of the <em>n<\/em> origin-initialized, unlabeled data points we started with, we now have a convex, origin-symmetric polygon that represents the labelings in which it is positively classified. <strong>Each <em>G\u1d62<\/em> can now be used to define the behavior of our instance-wise cost function <em>c<\/em> on its corresponding <em>x\u1d62<\/em>.<\/strong> We will use <em>G\u1d62 <\/em>to define a seminorm<strong>\u207d\u00b3\u207e<\/strong>:<\/p>\n<blockquote><p><strong><em>\u2225 <\/em>y <em>\u2225<\/em>\u0262\u1d62 = <em>inf { \u03b5 \u2208 <\/em>\u211d\u207a\u00a0<em>: <\/em>y<em> \u2208 \u03b5<\/em>G\u1d62\u00a0<em>}<\/em><\/strong><\/p><\/blockquote>\n<p>This definition implies that <strong>the distance between <em>x\u1d62<\/em> and some point <em>z<\/em> is less than 1 if and only if <em>z <\/em>lies within <em>G\u1d62.<\/em><\/strong>\u00a0I.e.:<\/p>\n<blockquote><p>\n<em>\u2225 <\/em>z &#8211; x\u1d62 <em>\u2225<\/em>\u0262\u1d62 <em>&lt; 1 \u21d4 <\/em>z <em>\u2208\u00a0<\/em>G\u1d62<\/p><\/blockquote>\n<p><strong>For the rest of the proof, it is sufficient that we understand this connection between \u2225 \u22c5 \u2225<em>\u0262\u1d62 <\/em>and a point being inside <em>G\u1d62.<\/em><\/strong><em> <\/em>(See Footnote (3) for a discussion of why \u2225 \u22c5 \u2225<em>\u0262\u1d62 <\/em>qualifies as a seminorm and for more details about its geometric interpretation.)<\/p>\n<p><strong>We thus define the instance-wise cost function\u00a0<em>c<\/em>:<\/strong><\/p>\n<blockquote><p>c<em>(<\/em>z<em>; <\/em>x\u1d62<em>) = \u2113<\/em>\u1d62<em>(<\/em>z<em> &#8211;\u00a0<\/em>x\u1d62<em>)<\/em>\n<\/p><\/blockquote>\n<p><strong>Where:<\/strong><\/p>\n<blockquote><p>\n<em>\u2113<\/em>\u1d62<em>(<\/em>z<em> &#8211; <\/em>x\u1d62<em>) = \u2225 <\/em>z<em> &#8211; <\/em>x\u1d62\u00a0<em>\u2225<\/em>\u0262\u1d62<\/p><\/blockquote>\n<p><strong>That is, for each unlabeled data point <em>x\u1d62<\/em>, <em>c<\/em> behaves as \u2225 \u22c5 \u2225<em>\u0262\u1d62 <\/em>would.<\/strong> Note that this behavior is different for each data point. This is because we constructed a unique <em>G\u1d62 <\/em>for every <em>x\u1d62<\/em>, and each \u2225 <strong>\u22c5<\/strong> \u2225<em>\u0262\u1d62<\/em> is derived from its corresponding polygon\u00a0<em>G\u1d62<\/em>.<\/p>\n<h4>Data Point Best Response as a Result of the Cost\u00a0Function<\/h4>\n<p>With the instance-wise cost function <em>c<\/em> in place, <strong>we may turn our attention to how our data points interact with linear classifiers.<\/strong> Recall that we have constrained our consideration to the homogeneous preference class, meaning that <em>r<\/em> = 1 for all of our points. I.e., <em>x\u1d62<\/em> stands to gain a reward of magnitude 1 for being positively classified. Given a linear classifier, <strong>each data point will thus be willing to incur any cost less than 1 to manipulate its feature vector to ensure it falls on the positive side of the decision boundary. <\/strong>This will guarantee it receives positive utility as a result of the manipulation.<\/p>\n<p><em>c<\/em> is designed so that a data point with feature vector <em>x\u1d62<\/em> has to pay \u2225 <em>z<\/em> &#8211; <em>x\u1d62<\/em> \u2225<em>\u0262\u1d62<\/em> to change its feature vector to <em>z<\/em>. As we saw,<strong> <\/strong>as long as <em>z<\/em> lies inside <em>G\u1d62<\/em>, this cost will be less than\u00a01.<\/p>\n<p><strong>Suppose we have a decision boundary that crosses <em>G\u1d62 <\/em><\/strong>(intersects it at two points) <strong>with <em>x\u1d62<\/em> falling on its negative half-plane<em>.<\/em> As illustrated in Figure 3 below,<\/strong> this creates a sub-polygon such that for any <em>z<\/em> within\u00a0it:<\/p>\n<ul>\n<li>The cost to move to <em>z<\/em> is less than 1: <em>c<\/em>(<em>z<\/em>; <em>x\u1d62<\/em>) = \u2225 <em>z<\/em>\u200a\u2014\u200a<em>x\u1d62<\/em> \u2225<em>\u0262\u1d62 <\/em>&lt;\u00a01<\/li>\n<li>The <a href=\"https:\/\/towardsdatascience.com\/extending-pac-learning-to-a-strategic-classification-setting-6c374935dde2#2a1e\">realized reward<\/a> for moving is precisely 1: \ud835\udd40(<em>h<\/em>(<em>z<\/em>) = 1) \u22c5 <em>r <\/em>=\u00a01<\/li>\n<\/ul>\n<p>Whereby the utility for data point <em>i<\/em>, \ud835\udd40(<em>h<\/em>(<em>z<\/em>) = 1) \u22c5 <em>r &#8211;<\/em> <em>c<\/em>(<em>z<\/em>; <em>x\u1d62<\/em>), is positive, in turn making any such <em>z<\/em> a better response than non-manipulation. In other words, <strong>the data point will always want to manipulate its feature vector into one that lies in this sub-polygon.<\/strong><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/802\/1%2A8s638E5PYI66iPYIK6QahQ.png?ssl=1\"><figcaption><strong>Figure 3:<\/strong> An example of a linear classifier with a decision boundary that properly intersects <em>G\u1d62, with x\u1d62<\/em> falling on its negative half-plane<em>. The resulting \u201cGoldilocks zone\u201d between the decision boundary and the perimeter of G\u1d62 is shaded in green. Changing x\u1d62 to any z lying in this area confers <\/em><strong><em>positive utility.<\/em><\/strong><em> This is because the reward gained is 1 and the cost incurred is less than 1. <\/em>Image by the author, based on image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\"><strong>PAC-Learning for Strategic Classification<\/strong><\/a> (use under<a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"> CC-BY 4.0 license<\/a>).<\/figcaption><\/figure>\n<p><strong>Conversely, given a decision boundary that does <em>not <\/em>cross <em>G\u1d62<\/em>,<\/strong> no such sub-polygon exists. The cost of manipulating <em>x\u1d62<\/em> to cross the boundary will always be greater than 1, and thus not worth the reward. <strong>The data point best response will be the original feature vector, meaning it\u2019s best to stay\u00a0put.<\/strong><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/769\/1%2AbqmURetZfDaHyMXN6xAVRw.png?ssl=1\"><figcaption><strong>Figure 4:<\/strong> An example of a polygon <em>G\u1d62 and a <\/em>linear classifier with a decision boundary that <strong>does not<\/strong> cross <em>it<\/em>. Note how<strong> there are no points that are both on the positive side of the decision boundary and inside <em>G\u1d62.<\/em><\/strong><em> Equivalently, there are no vectors into which the data point could manipulate its feature vector for positive utility. <\/em>Image by the author, based on image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\"><strong>PAC-Learning for Strategic Classification<\/strong><\/a> (use under<a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"> CC-BY 4.0 license<\/a>).<\/figcaption><\/figure>\n<h4>Isolating Representative Points Using Linear Classifiers<\/h4>\n<p>We now understand the strategic implications of whether or not a certain decision boundary crosses <em>G\u1d62<\/em>. Calling to mind the role of our points on the unit circle as representatives of possible labelings, <strong>we can demonstrate the connection between labelings where a data point is positively classified and linear classifiers.<\/strong><\/p>\n<p>Let \ud835\udcdb be an arbitrary labeling of our <em>n<\/em> data points, and let <em>s\u2097 <\/em>\u2208 <em>S<\/em> be its unique representative point on the unit circle. Let <em>x<\/em>\u1d62 be one of our unlabeled data points. We will explore the behavior of the data point with respect to a particular linear classifier, denoted <em>h\u2097<\/em>. <strong>We require that the decision boundary induced by <em>h\u2097<\/em> do the following:<\/strong><\/p>\n<ol>\n<li>Cross the unit\u00a0circle.<\/li>\n<li>Strictly separate <em>s\u2097<\/em> from all other points in <em>S<\/em> \u222a\u00a0<em>S\u2019<\/em>.<\/li>\n<li>Positively classify\u00a0<em>s\u2097<\/em>.<\/li>\n<\/ol>\n<p><strong>The structure of <em>S<\/em> \u222a <em>S\u2019<\/em> guarantees that such an <em>h\u2097<\/em> exists.\u207d\u2074\u207e<\/strong><\/p>\n<p>With <em>h\u2097<\/em> at our disposal, we may explore how our cost function <em>c<\/em> interacts with <em>h\u2097 <\/em>for <em>x\u1d62<\/em> depending on whether or not <em>x<\/em>\u1d62 should be positively classified under \ud835\udcdb. <strong>In fact, we will prove that a data point is positively classified by <em>h\u2097<\/em> if and only if it is positively labeled under\u00a0\ud835\udcdb.<\/strong><\/p>\n<p>Let us first consider the case in which<strong> we want <em>x\u1d62<\/em> to be <em>positively<\/em> labeled (see Figure 5).<\/strong> Recall that we defined <em>S\u1d62 <\/em>as \u201cthe subset of <em>S<\/em> that includes only the points that represent labelings in which <em>x\u1d62<\/em> is positively classified.\u201d We know, then, that <strong><em>s\u2097<\/em> \u2208 <em>S\u1d62<\/em>. <\/strong>In particular, <em>s\u2097<\/em> must be one of the vertices of <em>G\u1d62<\/em>. The fact that <em>h\u2097<\/em> strictly separates <em>s\u2097<\/em> from all other points in <em>S<\/em> \u222a <em>S\u2019<\/em> means that it is strictly separated from the other vertices of <em>G\u1d62<\/em>. <strong>Hence, <em>h\u2097<\/em> must cross <em>G\u1d62, <\/em>incentivizing the data point to manipulate its feature\u00a0vector.<\/strong><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/860\/1%2AbGxEtuoR4M3jmO5jFIJFcw.png?ssl=1\"><figcaption><strong>Figure 5:<\/strong> If <em>x<\/em>\u1d62 should be positively labeled under \ud835\udcdb<strong>,<\/strong> <strong><em>h\u2097<\/em> crosses <em>G\u1d62. <\/em><\/strong><em>This incentivizes the data point to manipulate its feature vector (see <\/em><strong><em>Figure 3<\/em><\/strong><em>). <\/em>Image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\"><strong>PAC-Learning for Strategic Classification<\/strong><\/a> (use under<a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"> CC-BY 4.0 license<\/a>).<\/figcaption><\/figure>\n<p>We proceed to examine the case in which<strong> we want <em>x\u1d62<\/em> to be <em>negatively <\/em>labeled under \ud835\udcdb (see Figure 6). <\/strong>As a result of how we constructed <em>S\u1d62<\/em>, <strong><em>s\u2097<\/em> \u2209 <em>S\u1d62<\/em>. <\/strong>Additionally, having required that the origin-symmetric <em>S\u2019 <\/em>be disjoint from <em>S<\/em>, we know that <strong><em>s\u2097<\/em> \u2209 <em>S\u1d62\u2019<\/em>.<\/strong> It follows that <strong><em>s\u2097 <\/em>is not a vertex of <em>G\u1d62<\/em><\/strong>. Once again, <em>h\u2097<\/em> strictly separates <em>s\u2097<\/em> from all other points in <em>S<\/em> \u222a <em>S\u2019, <\/em>including all the vertices of <em>G\u1d62<\/em>. Because <em>G\u1d62 <\/em>is convex, we conclude that <strong>any point in <em>G\u1d62<\/em> is on the opposite side of <em>h\u2097<\/em> as <em>s\u2097<\/em>.<\/strong> In other words, <strong><em>h\u2097<\/em> does not cross <em>G\u1d62<\/em>.<\/strong> Consequently,<strong> the data point will choose to stay put<\/strong> rather than \u201coverpaying\u201d to manipulate its feature vector to cross\u00a0<em>h\u2097<\/em>.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/841\/1%2ADd-lxJIRAn5vDuEpUE9saw.png?ssl=1\"><figcaption><strong>Figure 6:<\/strong> If <em>x<\/em>\u1d62 should be positively labeled under \ud835\udcdb,<strong> <em>h\u2097<\/em> does not cross <em>G\u1d62. <\/em><\/strong><em>This incentivizes the data point <\/em><strong><em>not<\/em><\/strong><em> to manipulate its feature vector (see <\/em><strong><em>Figure 4<\/em><\/strong><em>). <\/em>Image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\"><strong>PAC-Learning for Strategic Classification<\/strong><\/a> (use under<a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"> CC-BY 4.0 license<\/a>).<\/figcaption><\/figure>\n<p><strong>In summary, our unlabeled data point <em>x\u1d62<\/em> will engage in manipulation to cross <em>h\u2097 <\/em>if and only if \ud835\udcdb dictates that the data point should be positively classified.<\/strong> In our strategic classification setting, this means that <strong><em>h\u2097 <\/em>positively classifies a data point if and only if that data point should be positively labeled according to\u00a0\ud835\udcdb.<\/strong><\/p>\n<h4>Putting It All Together: Inducing an Arbitrary Labeling<\/h4>\n<p>Using what we have seen so far, we are able to demonstrate that we can achieve any labeling of our <em>n<\/em> data points we want. <strong>Overlaying all of our data points and their respective polygons <\/strong>(see Figure 7),<strong> <\/strong>we can see that <strong>given a labeling \ud835\udcdb, we are able to achieve it<\/strong> with the help of a corresponding linear classifier <em>h\u2097<\/em>.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AGE6Ye2y0EECzktY3rNkSsw.png?ssl=1\"><figcaption><strong>Figure 7:<\/strong> Simplified visualization of overlaid data points with their corresponding cost function polygons. <strong><em>s\u2097 represents a labeling <\/em><\/strong>\ud835\udcdb <strong><em>in which data point i should be positively classified and data point j should be negatively classified.<\/em><\/strong><em> The unmanipulated feature vectors of the two overlap at (0, 0). However, <\/em><strong><em>data point i will be positively classified by h\u2097 <\/em><\/strong><em>because it will move to the positive side of the decision boundary induced by h\u2097 (since the boundary crosses G\u1d62). <\/em><strong><em>Data point j will stay on the negative side<\/em><\/strong><em> because the boundary does not cross G\u2c7c. <\/em>Image by the author, based on images by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\"><strong>PAC-Learning for Strategic Classification<\/strong><\/a> (use under<a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/\"> CC-BY 4.0 license<\/a>).<\/figcaption><\/figure>\n<p>Any data point <em>x\u1d62<\/em> that \ud835\udcdb rules should be positively classified will manipulate its feature vector and move to the positive side of the decision boundary created by <em>h\u2097 <\/em>(like the case in Figure 5). At the same time, any data point <em>x\u2c7c<\/em> that should be negatively classified will not be sufficiently incentivized to manipulate its feature vector, causing it to stay on the negative side of the decision boundary. <strong>Across all <em>n<\/em> data points, those that will be positively classified will be <em>exactly <\/em>the ones that \ud835\udcdb dictates should be positively classified. In other words, we can induce any labeling we\u00a0wish.<\/strong><\/p>\n<p>We therefore have a sample of <em>n <\/em>unlabeled, potentially-manipulated data points that is <strong>strategically shattered by <em>H\u2097<\/em>,<\/strong> the hypothesis class of all linear classifiers in \u211d\u00b2. Based on <a href=\"https:\/\/towardsdatascience.com\/quantifying-the-complexity-and-learnability-of-strategic-classification-problems-fd04cbfdd4b9#974a\">how we defined strategic shattering coefficients<\/a>, we find that <strong>\u03c3<em>\u2099<\/em>(<em>H\u2097, <\/em>{ 1 }<em>, c<\/em>) = 2<em>\u207f<\/em>.<\/strong> <strong>It follows that SVC(<em>H\u2097, <\/em>{ 1 }<em>, c<\/em>) =\u00a0\u221e.<\/strong><\/p>\n<h4>Conclusion<\/h4>\n<p>First, <strong>we posed linear classification<\/strong> as a problem that is canonically PAC learnable, but <strong>not generally strategic PAC learnable.<\/strong> Second, we simplified our strategic classification problem by limiting our consideration to <strong>the homogeneous preference class on the Cartesian plane.<\/strong> Given <em>n<\/em> origin-initialized data points, <strong>we mapped each of their 2<em>\u207f<\/em> possible labelings to a unique representative point<\/strong> on the unit circle. We then <strong>created a polygon for each data point <\/strong>using the representative points corresponding to the labelings under which it should be positively classified.<\/p>\n<p>Based on those polygons, <strong>we constructed an instance-wise cost function, <em>c<\/em>, <\/strong>for which the cost of feature manipulation is less than 1 only within each polygon. Next, we showed that <strong>for any labeling <\/strong>\ud835\udcdb<strong>, we could find a linear classifier that isolates its respective representative point.<\/strong> Such a classifier, paired with <em>c<\/em>, ensured that <strong>only the data points that are <em>supposed <\/em>to be positively classified according to <\/strong>\ud835\udcdb<strong> are incentivized to manipulate their feature vectors<\/strong> to cross the decision boundary and end up with a positive label. Finally, we explained why that means that <strong>the SVC of the problem is infinite.<\/strong><\/p>\n<h4>Acknowledgments<\/h4>\n<p><strong>Writing this series of articles has been a wonderful journey, and I\u2019m truly grateful to everyone who took the time to read them,<\/strong> especially those who reached out with feedback. This series wouldn\u2019t have been possible without the authors of <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\">PAC-Learning for Strategic Classification<\/a>: <strong>Ravi Sundaram, Anil Vullikanti, Haifeng Xu,<\/strong> and<strong> Fan Yao.<\/strong> They set the stage beautifully for a deep dive into this fascinating meeting point between machine learning and game theory. Thank you to the <a href=\"https:\/\/medium.com\/u\/7e12c71dfa81\"><strong>TDS Editors<\/strong><\/a>, particularly <a href=\"https:\/\/medium.com\/u\/e6ad8abedec9\"><strong>Ben Huberman<\/strong><\/a> and <a href=\"https:\/\/medium.com\/u\/895063a310f4\"><strong>Ludovic Benistant<\/strong><\/a>, for their support and for giving me such a fantastic platform to share my writing. Lastly, a huge thank you to <strong>Prof. Inbal Talgam-Cohen<\/strong> for sowing the seeds that grew into this series in the seminar she taught last\u00a0winter.<\/p>\n<p>If you enjoyed these articles, please consider following me on <a href=\"https:\/\/jhyahav.medium.com\/\">Medium<\/a> and <a href=\"https:\/\/www.linkedin.com\/in\/jhyahav\/\">LinkedIn<\/a> to keep up with future articles and projects.<\/p>\n<h4>Footnotes<\/h4>\n<p><strong>(1)<\/strong> See a proof of this result, as well as a great interactive explanation of linear classification, <a href=\"https:\/\/mlweb.loria.fr\/book\/en\/VCdimhyperplane.html\"><strong>here<\/strong><\/a><strong>.<\/strong> What we refer to as a \u201clinear classifier\u201d throughout this article is technically an <em>affine<\/em> classifier.<\/p>\n<p><strong>(2)<\/strong> Indeed, the paper shows that restricting our consideration to <em>instance-invariant<\/em> cost functions yields a problem that <strong><em>is<\/em><\/strong><em> <\/em>PAC learnable, just like the in the canonical setting. For a refresher on the difference between instance-invariant and instance-wise cost functions, see <a href=\"https:\/\/towardsdatascience.com\/extending-pac-learning-to-a-strategic-classification-setting-6c374935dde2#d9f5\">the first article in this\u00a0series<\/a>.<\/p>\n<p><strong>(3) \u2225 \u22c5 \u2225<em>\u0262\u1d62 <\/em>as we defined it is the <\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Minkowski_functional\"><strong><em>Minkowski functional<\/em><\/strong><\/a><strong> of <em>G\u1d62. <\/em><\/strong>In this context, we view <em>G\u1d62 <\/em>as the set of points enclosed in the polygon we constructed. The Minkowski functional <strong>generalizes <\/strong>the notion of a seminorm. Recall that in <a href=\"https:\/\/towardsdatascience.com\/extending-pac-learning-to-a-strategic-classification-setting-6c374935dde2#d9f5\">the first article in this series<\/a>, <strong>we assumed our cost function is induced by seminorms.<\/strong> We would be remiss not to reconcile this assumption with the construction of \u2225 \u22c5\u00a0\u2225<em>\u0262\u1d62.<\/em><\/p>\n<p>Fortunately, it can be proven that <a href=\"https:\/\/proofwiki.org\/wiki\/Minkowski_Functional_of_Symmetric_Convex_Absorbing_Set_in_Real_Vector_Space_is_Seminorm\">the Minkowski functional of a set <em>A<\/em> is a seminorm under certain conditions<\/a> [2]:<\/p>\n<ul>\n<li>\n<em>A<\/em> is a subset of a vector space over\u00a0<strong>\u211d<\/strong>.<\/li>\n<li>\n<em>A<\/em> is\u00a0convex.<\/li>\n<li>\n<em>A<\/em> is symmetric.<\/li>\n<li>\n<em>A<\/em> is an absorbing set.<\/li>\n<\/ul>\n<p>Even more fortunately (or rather, by meticulous design),<em> <\/em><strong><em>G\u1d62 <\/em>fulfills<em> <\/em>all of these conditions:<\/strong><\/p>\n<ul>\n<li>\n<em>G\u1d62 <\/em>is a subset of \u211d\u00b2, which is a vector space over\u00a0<strong>\u211d.<\/strong>\n<\/li>\n<li>\n<em>G\u1d62 <\/em>is a convex set since it is bounded by a convex\u00a0polygon.<\/li>\n<li>\n<em>G\u1d62<\/em> is origin-symmetric.<em> <\/em><strong>(This is why we needed the <em>S\u1d62\u2019<\/em> in the first\u00a0place!)<\/strong>\n<\/li>\n<li>\n<em>G\u1d62 <\/em>is a convex subset of the topological vector space \u211d\u00b2 equipped with the standard topology induced by the Euclidean norm. <em>G\u1d62 <\/em>also contains the origin in its interior. <a href=\"https:\/\/proofwiki.org\/wiki\/Convex_Subset_of_Topological_Vector_Space_containing_Zero_Vector_in_Interior_is_Absorbing_Set\">Therefore, <em>G\u1d62<\/em> is an absorbing set<\/a>\u00a0[3].<\/li>\n<\/ul>\n<p><strong>\u2225 \u22c5 \u2225<em>\u0262\u1d62 <\/em>is therefore a seminorm for all <em>i<\/em>, enabling us to continue the proof without violating our assumptions.<\/strong><\/p>\n<p><strong>(4) <\/strong>Let <em>P<\/em> be the convex hull of (<em>S<\/em> \u222a <em>S\u2019<\/em>)<em> <\/em><em> s\u2097<\/em>. Note that <em>S<\/em> \u222a <em>S\u2019<\/em> is finite. It can be proven that the convex hull of a finite set in \u211d\u00b2 is compact [4]. As a singleton, { <em>s\u2097<\/em> } is closed. Clearly, (<em>S<\/em> \u222a <em>S\u2019<\/em>)<em> <\/em><em> s\u2097<\/em> and <em>s\u2097<\/em> are disjoint. <strong>It follows from the <\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Hyperplane_separation_theorem\"><strong>hyperplane separation theorem<\/strong><\/a><strong> that there exists a hyperplane that strictly separates the\u00a0two.<\/strong><\/p>\n<p><strong>Let us also show that the linear decision boundary induced by the aforementioned hyperplane must intersect the unit circle at two points.<\/strong> Recall that a line may intersect a circle at exactly zero, one, or two\u00a0points.<\/p>\n<ul>\n<li>Suppose for the sake of contradiction that the decision boundary and the unit circle are disjoint. Then all points in the unit circle must lie on the same side of the decision boundary, contradicting the assumption that <em>h\u2097 <\/em>separates (<em>S<\/em> \u222a <em>S\u2019<\/em>)<em> <\/em><em> s\u2097<\/em> from\u00a0<em>s\u2097.<\/em>\n<\/li>\n<li>Due to the fact that the separation is strict, <em>s\u2097<\/em> may not lie on the decision boundary, whereby the latter cannot be tangent to the unit\u00a0circle.<\/li>\n<li>The only remaining possibility is that the decision boundary crosses the unit\u00a0circle.<\/li>\n<\/ul>\n<p>As for ensuring <em>h\u2097<\/em> positively classifies <em>s\u2097,<\/em> it is completely up to us which side of the boundary we wish to classify positively.<\/p>\n<h4>References<\/h4>\n<p>[1] R. Sundaram, A. Vullikanti, H. Xu, F. Yao. <a href=\"https:\/\/arxiv.org\/abs\/2012.03310\">PAC-Learning for Strategic Classification<\/a> (2021), International Conference on Machine Learning.<\/p>\n<p>[2] ProofWiki contributors. <a href=\"https:\/\/proofwiki.org\/wiki\/Minkowski_Functional_of_Symmetric_Convex_Absorbing_Set_in_Real_Vector_Space_is_Seminorm\">Minkowski functional of symmetric convex absorbing set in real vector space is seminorm<\/a>.<\/p>\n<p>[3] ProofWiki contributors. <a href=\"https:\/\/proofwiki.org\/wiki\/Convex_Subset_of_Topological_Vector_Space_containing_Zero_Vector_in_Interior_is_Absorbing_Set\">Convex subset of topological vector space containing zero vector in interior is absorbing set<\/a>.<\/p>\n<p>[4] Math Stack Exchange contributors. <a href=\"https:\/\/math.stackexchange.com\/a\/749839\">Is convex hull of a finite set of points in R\u00b2\u00a0closed?<\/a><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=e80db99d6c4e\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/statistical-learnability-of-strategic-linear-classifiers-a-proof-walkthrough-e80db99d6c4e\">Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Jonathan Yahav<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fstatistical-learnability-of-strategic-linear-classifiers-a-proof-walkthrough-e80db99d6c4e\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Statistical Learnability of Strategic Linear Classifiers: A Proof Walkthrough With the help of an intricate geometric construction, we can prove that instance-wise cost functions quickly drive SVC to infinity. In the previous article in this series, we examined the concept of strategic VC dimension (SVC) and its connection to the Fundamental Theorem of Strategic Learning. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,1176,1177,70,92,1178],"tags":[496,1179,1180],"class_list":["post-1062","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-classification-algorithms","category-game-theory","category-machine-learning","category-thoughts-and-theory","category-vc-dimension","tag-linear","tag-strategic","tag-svc"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1062"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1062"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1062\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1062"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1062"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1062"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}