{"id":3308,"date":"2025-04-24T07:03:16","date_gmt":"2025-04-24T07:03:16","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/04\/24\/data-science-from-school-to-work-part-iv\/"},"modified":"2025-04-24T07:03:16","modified_gmt":"2025-04-24T07:03:16","slug":"data-science-from-school-to-work-part-iv","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/04\/24\/data-science-from-school-to-work-part-iv\/","title":{"rendered":"Data Science: From School to Work, Part IV"},"content":{"rendered":"<p>    Data Science: From School to Work, Part IV<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h2 class=\"wp-block-heading\"><mdspan datatext=\"el1745436793378\" class=\"mdspan-comment\">Introduction<\/mdspan><\/h2>\n<p class=\"wp-block-paragraph\">Let\u2019s start with a simple example that will appeal to most of us. If you want to check if the blinkers of your car are working properly, you sit in the car, turn on the ignition and test a turn signal to see if the front and rear lights work. But if the lights don\u2019t work, it\u2019s hard to tell why. The bulbs may be dead, the battery may be dead, the turn signal switch may be faulty. In short, there\u2019s a lot to check. This is exactly what the tests are for. Every part of a function such as the blinker must be tested to find out what is going wrong. A test of the bulbs, a test of the battery, a test of the communication between the control unit and the indicators, and so on.<\/p>\n<p class=\"wp-block-paragraph\">To test all this, there are different types of tests, often presented in the form of a pyramid, from the fastest to the slowest and from the most isolating to the most integrated. This test pyramid can vary depending on the specifics of the project (database connection test, authentication test, etc.).<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/image-206.png?ssl=1\" alt=\"\" class=\"wp-image-601881\"><figcaption class=\"wp-element-caption\">Tests pyramid | Image from author.<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\">The Base of the Pyramid: Unit Tests<\/h3>\n<p class=\"wp-block-paragraph\">Unit tests form the basis of the test pyramid, regardless of the type of project (and language). Their purpose is to test a unit of code, e.g. a method or a function. For a unit test to be truly considered as such, it must adhere to a basic rule: A unit test must not depend on functionalities outside the unit under test. They have the advantage of being fast and automatable.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Example:<\/strong> Consider a function that extracts even numbers from an iterable. To test this function, we\u2019d need to create several types of iterable with integers and check the output. But we\u2019d also need to check the behavior in the case of empty iterables, element types other than int, and so on.<\/p>\n<h3 class=\"wp-block-heading\">Intermediate Level: Integration and Functional Tests<\/h3>\n<p class=\"wp-block-paragraph\">Just above the unit tests are the integration tests. Their purpose is to detect errors that cannot be detected by unit tests. These tests check that the addition of a new feature does not cause problems when it is integrated into the application. The functional tests are similar but aim at testing one precise fonctionality (e.g an authentification process).<\/p>\n<p class=\"wp-block-paragraph\">In a project, especially in a team environment, many functions are developed by different developers. Integration\/functional tests ensure that all these features work well together. They are also run automatically, making them fast and reliable.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Example<\/strong>: Consider an application that displays a bank balance. When a withdrawal operation is carried out, the balance is modified. An integration test would be to check that with a balance initialized at 1000 euros, then a withdrawal of 500 euros, the balance changes to 500 euros.<\/p>\n<h3 class=\"wp-block-heading\">The Top of the Pyramid: End-to-End Tests<\/h3>\n<p class=\"wp-block-paragraph\">End-to-end (E2E) tests are tests at the top of the pyramid. They verify that the application functions as expected from end to end, i.e. from the user interface to the database or external services. They are generally long and complicated to set up, but there\u2019s no need for a lot of tests.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Example<\/strong>: Consider a forecasting application based on new data. This can be very complex, involving data retrieval, variable transformations, learning and so on. The aim of the End-to-End test is to check that, given the new data selected, the forecasts correspond to expectations.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">The Unit Tests with Doctest<\/h2>\n<p class=\"wp-block-paragraph\">A fast and simple way of making unit tests is to use <code>docstring<\/code>. Let\u2019s take the example of a script <code>calculate_stats.py<\/code> with two functions: <code>calculate_mean()<\/code> with a complete docstring, which was presented in <a href=\"https:\/\/towardsdatascience.com\/data-science-from-school-to-work-part-ii\/\">Python best practices<\/a>, and the function <code>calculate_std()<\/code><strong> <\/strong>with a usual one.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import math\nfrom typing import List\n\ndef calculate_mean(numbers: List[float]) -&gt; float:\n\u00a0 \u00a0 \"\"\"\n\u00a0 \u00a0 Calculate the mean of a list of numbers.\n\n\u00a0 \u00a0 Parameters\n\u00a0 \u00a0 ----------\n\u00a0 \u00a0 numbers : list of float\n\u00a0 \u00a0 \u00a0 \u00a0 A list of numerical values for which the mean is to be calculated.\n\n\u00a0 \u00a0 Returns\n\u00a0 \u00a0 -------\n\u00a0 \u00a0 float\n\u00a0 \u00a0 \u00a0 \u00a0 The mean of the input numbers.\n\n\u00a0 \u00a0 Raises\n\u00a0 \u00a0 ------\n\u00a0 \u00a0 ValueError\n\u00a0 \u00a0 \u00a0 \u00a0 If the input list is empty.\n\n\u00a0 \u00a0 Notes\n\u00a0 \u00a0 -----\n\u00a0 \u00a0 The mean is calculated as the sum of all elements divided by the number of elements.\n\n\u00a0 \u00a0 Examples\n\u00a0 \u00a0 --------\n\u00a0 \u00a0 &gt;&gt;&gt; calculate_mean([1.0, 2.0, 3.0, 4.0])\n\u00a0 \u00a0 2.5\n\u00a0 \u00a0 &gt;&gt;&gt; calculate_mean([])\n\u00a0 \u00a0 0\n\u00a0 \u00a0 \"\"\"\n\n\u00a0 \u00a0 if len(numbers) &gt; 0:\n\u00a0 \u00a0 \u00a0 \u00a0 return sum(numbers) \/ len(numbers)\n\u00a0 \u00a0 else:\n\u00a0 \u00a0 \u00a0 \u00a0 return 0\n\ndef calculate_std(numbers: List[float]) -&gt; float:\n\u00a0 \u00a0 \"\"\"\n\u00a0 \u00a0 Calculate the standard deviation of a list of numbers.\n\n\u00a0 \u00a0 Parameters\n\u00a0 \u00a0 ----------\n\u00a0 \u00a0 numbers : list of float\n\u00a0 \u00a0 \u00a0 \u00a0 A list of numerical values for which the mean is to be calculated.\n\n\u00a0 \u00a0 Returns\n\u00a0 \u00a0 -------\n\u00a0 \u00a0 float\n\u00a0 \u00a0 \u00a0 \u00a0 The std of the input numbers.\n\u00a0 \u00a0 \"\"\"\n\n\u00a0 \u00a0 if len(numbers) &gt; 0:\n\u00a0 \u00a0 \u00a0 \u00a0 m = calculate_mean(numbers)\n\u00a0 \u00a0 \u00a0 \u00a0 gap = [abs(x - m)**2 for x in numbers]\n\u00a0 \u00a0 \u00a0 \u00a0 return math.sqrt(sum(gap) \/ (len(numbers) - 1))\n\u00a0 \u00a0 else:\n\u00a0 \u00a0 \u00a0 \u00a0 return 0<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The test is included in the \u201cExamples\u201d section at the end of the docstring of the function <code>calculate_mean()<\/code>. A doctest follows the layout of a terminal: three chevrons at the beginning of a line with the command to be executed and the expected result just below. To run the tests, simply type the command<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">  python -m doctests calculate_stats.py -v<\/code><\/pre>\n<p class=\"wp-block-paragraph\">or if you use <a href=\"https:\/\/towardsdatascience.com\/data-scientist-from-school-to-work-part-i\/\">uv<\/a> (what I encourage)<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">uv run python -m doctest calculate_stats.py -v<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The <code>-v<\/code> argument permits to display the following output:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-080303-2.png?ssl=1\" alt=\"\" class=\"wp-image-601889\"><\/figure>\n<p class=\"wp-block-paragraph\">As you can see that there have been two tests and no failures, and <code>doctest <\/code>has the intelligence to point out all the methods that don\u2019t have a test (as with <code>calculate_std()<\/code>).<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">The Unit Tests with Pytest<\/h2>\n<p class=\"wp-block-paragraph\">Using <code>doctest <\/code>is interesting, but quickly becomes limited. For a truly comprehensive testing process, we use a specific framework. There are two main frameworks for testing: <code>unittest <\/code>and <code><a href=\"https:\/\/towardsdatascience.com\/tag\/pytest\/\" title=\"Pytest\">Pytest<\/a><\/code>. The latter is generally considered simpler and more intuitive.<\/p>\n<p class=\"wp-block-paragraph\">To install the package, simply type:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">pip install pytest (in your virtual environment)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">or<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-shell-session\">uv add pytest<\/code><\/pre>\n<h3 class=\"wp-block-heading\">1 \u2013 Write your first test<\/h3>\n<p class=\"wp-block-paragraph\">Let\u2019s take the <code>calculate_stats.py<\/code><strong> <\/strong>script and write a test for the <code>calculate_mean() <\/code>function. To do this, we create a script <code>test_calculate_stats.py<\/code> containing the following lines:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-julia\">from calculate_stats import calculate_mean\n\ndef test_calculate_mean():\n    assert calculate_mean([1, 2, 3, 4, 5, 6]) == 3.5<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Tests are based on the assert command. This instruction is used with the following syntax:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">assert expression1 [, expression2]<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">The expression1 is the condition to be tested, and the optional expression2 is the error message if the condition is not verified.<\/p>\n<p class=\"wp-block-paragraph\">The <a href=\"https:\/\/towardsdatascience.com\/tag\/python\/\" title=\"Python\">Python<\/a> interpreter transforms each assert statement into:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">if __debug__:\n    if not expression1:\n        raise AssertionError(expression2)<\/code><\/pre>\n<h3 class=\"wp-block-heading\">2 \u2013 Run a test<\/h3>\n<p class=\"wp-block-paragraph\">To run the test, we use the following command:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-shell-session\">pytest (in your virtual environment)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">or<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-shell-session\">uv run pytest<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The result is as follows:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"238\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-212843-1024x238.png?resize=1024%2C238&#038;ssl=1\" alt=\"\" class=\"wp-image-601895\"><\/figure>\n<h3 class=\"wp-block-heading\">3 \u2013 Analyse the output<\/h3>\n<p class=\"wp-block-paragraph\">One of the great advantages of <code>pytest <\/code>is the quality of its feedback. For each test, you get:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">A green dot (.) for success;<\/li>\n<li class=\"wp-block-list-item\">An F for a failure;<\/li>\n<li class=\"wp-block-list-item\">An E for an error;<\/li>\n<li class=\"wp-block-list-item\">An s for a skipped test (with the decorator <code>@pytest.mark.skip(reason=\"message\")<\/code>).<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">In the event of failure, pytest provides:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">The exact name of the failed test;<\/li>\n<li class=\"wp-block-list-item\">The problematic line of code;<\/li>\n<li class=\"wp-block-list-item\">Expected and obtained values;<\/li>\n<li class=\"wp-block-list-item\">A complete trace to facilitate debugging.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">For example, if we replace the == 3.5 with == 4, we obtain the following output:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"491\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-213031-1024x491.png?resize=1024%2C491&#038;ssl=1\" alt=\"\" class=\"wp-image-601896\"><\/figure>\n<h3 class=\"wp-block-heading\">4 \u2013 Use parametrize<\/h3>\n<p class=\"wp-block-paragraph\">To test a function properly, you need to test it exhaustively. In other words, test it with different types of inputs and outputs. The problem is that very quickly you end up with a succession of assert and test functions that get longer and longer, which is not easy to read.<\/p>\n<p class=\"wp-block-paragraph\">To overcome this problem and test several data sets in a single unit test, we use the <code>parametrize<\/code>. The idea is to create a list containing all the datasets you wish to test in tuple form, then use the <code>@pytest.mark.parametrize<\/code> decorator. The last test can read write as follows<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from calculate_stats import calculate_mean\nimport pytest\n\ntestdata = [\n    ([1, 2, 3, 4, 5, 6], 3.5),\n    ([], 0),\n    ([1.2, 3.8, -1], 4 \/ 3),\n]\n\n@pytest.mark.parametrize(\"numbers, expected\", testdata)\ndef test_calculate_mean(numbers, expected):\n    assert calculate_mean(numbers) == expected<\/code><\/pre>\n<p class=\"wp-block-paragraph\">If you wish to add a test set, simply add a tuple to testdata.<\/p>\n<p class=\"wp-block-paragraph\">It is also advisable to create another type of test to check whether errors are raised, using the context with <code>pytest.raises(Exception)<\/code>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">testdata_fail = [\n    1,\n    \"a\",\n]\n\n@pytest.mark.parametrize(\"numbers\", testdata_fail)\ndef test_calculate_mean_fail(numbers):\n    with pytest.raises(Exception):\n        calculate_mean(numbers)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">In this case, the test will be a success on the function returns an error with the <em>testdata_fail<\/em> data.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"230\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-213442-1024x230.png?resize=1024%2C230&#038;ssl=1\" alt=\"\" class=\"wp-image-601897\"><\/figure>\n<h3 class=\"wp-block-heading\">5 \u2013 Use mocks<\/h3>\n<p class=\"wp-block-paragraph\">As mentioined in introduction, the purpose of a unit test is to test a single unit of code and, above all, it must not depend on external components. This is where <strong>mocks <\/strong>come in.<\/p>\n<p class=\"wp-block-paragraph\">Mocks simulate the behavior of a constant, a function or even a class. To create and use mocks, we\u2019ll use the <code>pytest-mock<\/code><strong> <\/strong>package. To install it:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-shell-session\">pip install pytest-mock (in your virtual environment)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">or<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-shell-session\">uv add pytest-mock<\/code><\/pre>\n<h4 class=\"wp-block-heading\">a) Mock a function<\/h4>\n<p class=\"wp-block-paragraph\">To illustrate the use of a mock, let\u2019s take our<strong> <\/strong><code>test_calculate_stats.py<\/code><strong> <\/strong>script and implement the test for the <code>calculate_std()<\/code> function. The problem is that it depends on the <code>calculate_mean()<\/code> function. So we\u2019re going to use the <code>mocker.patch<\/code> method to mock its behavior.<\/p>\n<p class=\"wp-block-paragraph\">The test for the <code>calculate_std()<\/code> function is written as follows<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def test_calculate_std(mocker):\n    mocker.patch(\"calculate_stats.calculate_mean\", return_value=0)\n\n    assert calculate_std([2, 2]) == 2\n    assert calculate_std([2, -2]) == 2<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Executing the pytest command yields<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"231\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-213853-1024x231.png?resize=1024%2C231&#038;ssl=1\" alt=\"\" class=\"wp-image-601898\"><\/figure>\n<p class=\"wp-block-paragraph\"><strong>Explanation<\/strong>:<br \/>The <code>mocker.patch(\"calculate_stats.calculate_mean\", return_value=0)<\/code> line assigns the output <code>0<\/code> to the <code>calculate_mean()<\/code> return in <code>calculate_stats.py<\/code>. The calculation of the standard deviation of the series [2, 2] is distorted because we mock the behavior of <code>calculate_mean()<\/code> by always returning <code>0<\/code>. The calculation is correct if the mean of the series is really <code>0<\/code>, as shown by the second assertion.<\/p>\n<h4 class=\"wp-block-heading\">b) Mock a class<\/h4>\n<p class=\"wp-block-paragraph\">In a similar way, you can mock the behavior of a class and simulate its methods and\/or attributes. To do this, you need to implement a <code>Mock <\/code>class with the methods\/attributes to be modified.<\/p>\n<p class=\"wp-block-paragraph\">Consider a function, <code>need_pruning()<\/code>, which tests whether or not a decision tree should be pruned according to the minimum number of points in its leaves:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from sklearn.tree import BaseDecisionTree\n\n\ndef need_pruning(tree: BaseDecisionTree, max_point_per_node: int) -&gt; bool:\n    # Get the number of samples in each node\n    n_samples_per_node = tree.tree_.n_node_samples\n\n    # Identify which nodes are leaves.\n    is_leaves = (tree.tree_.children_left == -1) &amp; (tree.tree_.children_right == -1)\n\n    # Get the number of samples in leaf nodes\n    n_samples_leaf_nodes = n_samples_per_node[is_leaves]\n    return any(n_samples_leaf_nodes &lt; max_point_per_node)<\/code><\/pre>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/towardsdatascience.com\/tag\/testing\/\" title=\"Testing\">Testing<\/a> this function can be complicated, since it depends on a class, <code>DecisionTree<\/code>, from the <code>scikit-learn<\/code> package. What\u2019s more, you\u2019d need data to train a <code>DecisionTree <\/code>before testing the function.<br \/>To get around all these difficulties, we need to mock the attributes of a <code>DecisionTree<\/code>\u2018s <code>tree_ <\/code>object.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from model import need_pruning\nfrom sklearn.tree import DecisionTreeRegressor\nimport numpy as np\n\n\nclass MockTree:\n    # Mock tree with two leaves with 5 points each.\n    @property\n    def n_node_samples(self):\n        return np.array([20, 10, 10, 5, 5])\n\n    @property\n    def children_left(self):\n        return np.array([1, 3, 4, -1, -1])\n\n    @property\n    def children_right(self):\n        return np.array([2, -1, -1, -1, -1])\n\n\ndef test_need_pruning(mocker):\n    new_model = DecisionTreeRegressor()\n    new_model.tree_ = MockTree()\n\n    assert need_pruning(new_model, 6)\n    assert not need_pruning(new_model, 2)<\/code><\/pre>\n<p class=\"wp-block-paragraph\"><strong>Explanation<\/strong>:<br \/>The <code>MockTree <\/code>class can be used to mock the <em>n_node_samples<\/em>, <em>children_left<\/em> and <em>children_right<\/em> attributes of a <code>tree_<\/code>object. In the test, we create a <code>DecisionTreeRegressor<\/code> object whose <code>tree_ <\/code>attribute is replaced by the <code>MockTree<\/code>. This controls the <em>n_node_samples<\/em>, <em>children_left <\/em>and <em>children_right <\/em>attributes required for the <code>need_pruning()<\/code> function.<\/p>\n<h3 class=\"wp-block-heading\">4 \u2013 Use fixtures<\/h3>\n<p class=\"wp-block-paragraph\">Let\u2019s complete the previous example by adding a function, <code>get_predictions()<\/code>, to retrieve the average of the variable of interest in each of the tree\u2019s leaves:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def get_predictions(tree: BaseDecisionTree) -&gt; np.ndarray:\n    # Identify which nodes are leaves.\n    is_leaves = (tree.tree_.children_left == -1) &amp; (tree.tree_.children_right == -1)\n\n    # Get the target mean in the leaves\n    values = tree.tree_.value.flatten()[is_leaves]\n    return values<\/code><\/pre>\n<p class=\"wp-block-paragraph\">One way of testing this function would be to repeat the first two lines of the <code>test_need_pruning()<\/code> test. But a simpler solution is to use the <code>pytest.fixture<\/code> decorator to create a fixture.<\/p>\n<p class=\"wp-block-paragraph\">To test this new function, we need the <code>MockTree <\/code>we created earlier. But, to avoid repeating code, we use a fixture. The test script then becomes:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from model import need_pruning, get_predictions\nfrom sklearn.tree import DecisionTreeRegressor\nimport numpy as np\nimport pytest\n\n\nclass MockTree:\n    @property\n    def n_node_samples(self):\n        return np.array([20, 10, 10, 5, 5])\n\n    @property\n    def children_left(self):\n        return np.array([1, 3, 4, -1, -1])\n\n    @property\n    def children_right(self):\n        return np.array([2, -1, -1, -1, -1])\n\n    @property\n    def value(self):\n        return np.array([[[5]], [[-2]], [[-8]], [[3]], [[-3]]])\n\n@pytest.fixture\ndef tree_regressor():\n    model = DecisionTreeRegressor()\n    model.tree_ = MockTree()\n    return model\n\n\ndef test_nedd_pruning(tree_regressor):\n    assert need_pruning(tree_regressor, 6)\n    assert not need_pruning(tree_regressor, 2)\n\n\ndef test_get_predictions(tree_regressor):\n    assert all(get_predictions(tree_regressor) == np.array([3, -3]))<\/code><\/pre>\n<p class=\"wp-block-paragraph\">In our case, the fixture allows us to have a <code>DecisionTreeRegressor <\/code>object whose <code>tree_<\/code> attribute is our <code>MockTree<\/code>.<\/p>\n<p class=\"wp-block-paragraph\">The advantage of a fixture is that it provides a fixed development environment for configuring a set of tests with the same context or dataset. This can be used to:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Prepare objects;<\/li>\n<li class=\"wp-block-list-item\">Start or stop services;<\/li>\n<li class=\"wp-block-list-item\">Initialize the database with a dataset;<\/li>\n<li class=\"wp-block-list-item\">Create test client for web project;<\/li>\n<li class=\"wp-block-list-item\">Configure mocks.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\">5 \u2013 Organize the tests directory<\/h3>\n<p class=\"wp-block-paragraph\"><code>pytest <\/code>will run tests on all files beginning with test_ or ending with _test. With this method, you can simply use the <code>pytest <\/code>command to run all the tests in your project.<\/p>\n<p class=\"wp-block-paragraph\">As with the rest of a Python project, the test directory must be structured. We recommend:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Break down your tests by package<\/li>\n<li class=\"wp-block-list-item\">Test no more than one module per script<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-080933.png?ssl=1\" alt=\"\" class=\"wp-image-601899\"><figcaption class=\"wp-element-caption\">Image from author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">However, you can also run only the tests of a script by specifying the path of the .py script.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-shell-session\">pytest .testPackage1tests_module1.py  (in your virtual environment)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">or<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-shell-session\">uv run pytest .testPackage1tests_module1.py<\/code><\/pre>\n<h3 class=\"wp-block-heading\">6 \u2013 Analyze your test coverage<\/h3>\n<p class=\"wp-block-paragraph\">Once the tests have been written, it\u2019s worth looking at the test coverage rate. To do this, we install the following two packages: <code>coverage <\/code>and <code>pytest-cov<\/code> and run a coverage measure.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">pip install pytest-cov, coverage (in your virtual environment)\npytest --cov=your_main_directory <\/code><\/pre>\n<p class=\"wp-block-paragraph\">or<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">uv add pytest-mock, coverage\nuv run pytest --cov=your_main_directory<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The tool then measures coverage by counting the number of lines tested. The following output is obtained.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"511\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-215119-1024x511.png?resize=1024%2C511&#038;ssl=1\" alt=\"\" class=\"wp-image-601900\"><\/figure>\n<p class=\"wp-block-paragraph\">The 92% obtained for the <code>calculate_stats.py<\/code><strong> <\/strong>script comes from the line where the squares of the deviations from the mean are calculated:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">gap = [abs(x - m)**2 for x in numbers]<\/code><\/pre>\n<p class=\"wp-block-paragraph\">To prevent certain scripts from being analyzed, you can specify exclusions in a <code>.coveragerc<\/code> configuration file at the root of the project. For example, to exclude the two test files, write<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-markup\">[run]\nomit = .test_*.py<\/code><\/pre>\n<p class=\"wp-block-paragraph\">And we get<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"472\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/Capture-decran-2025-04-19-215145-1024x472.png?resize=1024%2C472&#038;ssl=1\" alt=\"\" class=\"wp-image-601901\"><\/figure>\n<p class=\"wp-block-paragraph\">Finally, for larger projects, you can manage an html report of the coverage analysis by typing<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">pytest --cov=your_main_directory --cov-report html  (in your virtual environment)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">or<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">uv run pytest --cov=your_main_directory --cov-report html<\/code><\/pre>\n<h3 class=\"wp-block-heading\">7 \u2013 Some usefull packages<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<code>pytest-xdist<\/code>: Speed up test execution by using multiple CPUs<\/li>\n<li class=\"wp-block-list-item\">\n<code>pytest-randomly<\/code>: Randomly mix the order of test items. Reduces the risk of surprising inter-test dependencies.<\/li>\n<li class=\"wp-block-list-item\">\n<code>pytest-instafail<\/code>: Displays failures and errors immediately instead of waiting for all tests to complete.<\/li>\n<li class=\"wp-block-list-item\">\n<code>pytest-tldr<\/code>: The default pytest outputs are chatty. This plugin limits the output to only traces of failed tests.<\/li>\n<li class=\"wp-block-list-item\">\n<code>pytest-mlp<\/code>: Allows you to test Matplotlib results by comparing images.<\/li>\n<li class=\"wp-block-list-item\">\n<code>pytest-timeout<\/code>: Ends tests that take too long, probably due to infinite loops.<\/li>\n<li class=\"wp-block-list-item\">\n<code>freezegun<\/code>: Allows to mock datetime module with the decorator <code>@freeze_time()<\/code>.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Special thanks to <a href=\"https:\/\/www.linkedin.com\/in\/banias\/?originalSubdomain=de\">Banias Baabe<\/a> for this list.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">Integration and fonctional tests<\/h2>\n<p class=\"wp-block-paragraph\">Now that the unit tests have been written, most of the work is done. Courage, we\u2019re almost there!<\/p>\n<p class=\"wp-block-paragraph\">As a reminder, unit tests aim to test a unit of code without it interacting with another function. This way we know that each function\/method does what it was developed for. It is time to test how they work together!<\/p>\n<h3 class=\"wp-block-heading\">1 \u2013 Integration tests<\/h3>\n<p class=\"wp-block-paragraph\">Integration tests are used to check the combinations of different code units, their interactions and the way in which subsystems are combined to form a common system.<\/p>\n<p class=\"wp-block-paragraph\">The way we write integration tests is no different from the way we write unit tests. To illustrate it we create a very simple <code>FastApi <\/code>application to get or to set the couple Login\/Password in a \u201cdatabase\u201d. To simplify the example, the database is just a <code>dict<\/code> named users. We create a <code>main.py<\/code> script with the following code<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from fastapi import FastAPI, HTTPException\n\napp = FastAPI()\n\nusers = {\"user_admin\": {\"Login\": \"admin\", \"Password\": \"admin123\"}}\n\n\n@app.get(\"\/users\/{user_id}\")\nasync def read_user(user_id: str):\n    if user_id not in users:\n        raise HTTPException(status_code=404, detail=\"Users not found\")\n    return users[user_id]\n\n\n@app.post(\"\/users\/{user_id}\")\nasync def create_user(user_id: str, user: dict):\n    if user_id in users:\n        raise HTTPException(status_code=400, detail=\"User already exists\")\n    users[user_id] = user\n    return user<\/code><\/pre>\n<p class=\"wp-block-paragraph\">To test a this application, you have to use <code>httpx<\/code> and <code>fastapi.testclient<\/code> packages to make requests to your endpoints and verify the responses. The script of tests is as follows:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from fastapi.testclient import TestClient\nfrom main import app\n\nclient = TestClient(app)\n\n\ndef test_read_user():\n    response = client.get(\"\/users\/user_admin\")\n    assert response.status_code == 200\n    assert response.json() == {\"Login\": \"admin\", \"Password\": \"admin123\"}\n\n\ndef test_read_user_not_found():\n    response = client.get(\"\/users\/new_user\")\n    assert response.status_code == 404\n    assert response.json() == {\"detail\": \"User not found\"}\n\n\ndef test_create_user():\n    new_user = {\"Login\": \"admin2\", \"Password\": \"123admin\"}\n    response = client.post(\"\/users\/new_user\", json=new_user)\n    assert response.status_code == 200\n    assert response.json() == new_user\n\n\ndef test_create_user_already_exists():\n    new_user = {\"Login\": \"duplicate_admin\", \"Password\": \"admin123\"}\n    response = client.post(\"\/users\/user_admin\", json=new_user)\n    assert response.status_code == 400\n    assert response.json() == {\"detail\": \"User already exists\"}\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">In this example, the tests depend on the application created in the <code>main.py<\/code> script. These are therefore not unit tests. We test different scenarios to check whether the application works well.<\/p>\n<p class=\"wp-block-paragraph\">Integration tests determine whether independently developed code units work correctly when they are linked together. To implement an integration test, we need to:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">write a function that contains a scenario<\/li>\n<li class=\"wp-block-list-item\">add assertions to check the test case<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\">2 \u2013 Fonctional tests<\/h3>\n<p class=\"wp-block-paragraph\">Functional testing ensures that the application\u2019s functionality complies with the specification. They differ from integration tests and unit tests because you don\u2019t need to know the code to perform them. Indeed, a good knowledge of the functional specification will suffice.<\/p>\n<p class=\"wp-block-paragraph\">The project manager can write the all specifications of the application and developpers can write tests to perform this specifications.<\/p>\n<p class=\"wp-block-paragraph\">In our previous example of a FastApi application, one of the specifications is to be able to add a new user and then check that this new user is in the database. Thus, we test the fonctionallity \u201cadding a user\u201d with this test<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from fastapi.testclient import TestClient\nfrom main import app\n\nclient = TestClient(app)\n\n\ndef test_add_user():\n    new_user = {\"Login\": \"new_user\", \"Password\": \"new_password\"}\n    response = client.post(\"\/users\/new_user\", json=new_user)\n    assert response.status_code == 200\n    assert response.json() == new_user\n\n    # Check if the user was added to the database\n    response = client.get(\"\/users\/new_user\")\n    assert response.status_code == 200\n    assert response.json() == new_user\n<\/code><\/pre>\n<h2 class=\"wp-block-heading\">The End-to-End tests<\/h2>\n<p class=\"wp-block-paragraph\">The end is near! End-to-end (E2E) tests focus on simulating real-world scenarios, covering a range of flows from simple to complex. In essence, they can be thought of as foncntional tests with multiple steps.<\/p>\n<p class=\"wp-block-paragraph\">However, E2E tests are the most time-consuming to execute, as they require building, deploying, and launching a browser to interact with the application.<\/p>\n<p class=\"wp-block-paragraph\">When E2E tests fail, identifying the issue can be challenging due to the broad scope of the test, which encompasses the entire application. So you can now see why the testing pyramid has been designed in this way.<\/p>\n<p class=\"wp-block-paragraph\">E2E tests are also the most difficult to write and maintain, owing to their extensive scope and the fact that they involve the entire application.<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s essential to understand that E2E testing is not a replacement for other testing methods, but rather a complementary approach. E2E tests should be used to validate specific aspects of the application, such as button functionality, form submissions, and workflow integrity.<\/p>\n<p class=\"wp-block-paragraph\">Ideally, tests should detect bugs as early as possible, closer to the base of the pyramid. E2E testing serves to verify that the overall workflow and key interactions function correctly, providing a final layer of assurance.<\/p>\n<p class=\"wp-block-paragraph\">In our last example, if the user database is connected to an authentication service, an E2E test would consist of creating a new user, selecting their username and password, and then testing authentication with that new user, all through the graphical interface.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">To summarize, a balanced testing strategy is essential for any production project. By implementing a system of unit tests, integration tests, functional tests and E2E tests, you can ensure that your application meets the specifications. And, by following best practice and using the right testing tools, you can write more reliable, maintainable and efficient code and deliver high quality software to your users. Finally, it also simplifies future development and ensures that new features don\u2019t break the code.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\"><strong>References<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">1 \u2013 pytest documentation <a href=\"https:\/\/docs.pytest.org\/en\/stable\/\">https:\/\/docs.pytest.org\/en\/stable\/<\/a><\/p>\n<p class=\"wp-block-paragraph\">2 \u2013 An interresting blog <a href=\"https:\/\/realpython.com\/python-testing\/\">https:\/\/realpython.com\/python-testing\/<\/a> and <a href=\"https:\/\/realpython.com\/pytest-python-testing\/\">https:\/\/realpython.com\/pytest-python-testing\/<\/a><\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/data-science-from-school-to-work-part-iv\/\">Data Science: From School to Work, Part IV<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Vincent Margot<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/data-science-from-school-to-work-part-iv\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Science: From School to Work, Part IV Introduction Let\u2019s start with a simple example that will appeal to most of us. If you want to check if the blinkers of your car are working properly, you sit in the car, turn on the ignition and test a turn signal to see if the front [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,83,67,160,2452,157,1461],"tags":[1106,970,2453],"class_list":["post-3308","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-science","category-deep-dives","category-programming","category-pytest","category-python","category-testing","tag-test","tag-tests","tag-unit"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3308"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=3308"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/3308\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=3308"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=3308"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=3308"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}