{"id":5457,"date":"2025-07-21T07:03:35","date_gmt":"2025-07-21T07:03:35","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/07\/21\/how_would_you_structure_a_project_data_frame_to\/"},"modified":"2025-07-21T07:03:35","modified_gmt":"2025-07-21T07:03:35","slug":"how_would_you_structure_a_project_data_frame_to","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/07\/21\/how_would_you_structure_a_project_data_frame_to\/","title":{"rendered":"How would you structure a project (data frame) to scrape and track listing changes over time?"},"content":{"rendered":"<p>    How would you structure a project (data frame) to scrape and track listing changes over time?<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I\u2019m working on a project where I want to scrape data daily (e.g., real estate listings from a site like RentFaster or Zillow) and track how each listing changes over time. I want to be able to answer questions like:<\/p>\n<p>When did a listing first appear? How long did it stay up? What changed (e.g., price, description, status)? What\u2019s new today vs yesterday?<\/p>\n<p>My rough mental model is: 1. Scrape today\u2019s data into a CSV or database. 2. Compare with previous days to find new\/removed\/updated listings. 3. Over time, build a longitudinal dataset with per-listing history (kind of like slow-changing dimensions in data warehousing).<\/p>\n<p>I\u2019m curious how others would structure this kind of project:<\/p>\n<p>How would you handle ID tracking if listings don\u2019t always have persistent IDs? Would you use a single master table with change logs? Or snapshot tables per day? How would you set up comparisons (diffing rows, hashing)? Any Python or DB tools you\u2019d recommend for managing this type of historical tracking?<\/p>\n<p>I\u2019m open to best practices, war stories, or just seeing how others have solved this kind of problem. Thanks!<\/p>\n<\/p><\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/Proof_Wrap_2150\"> \/u\/Proof_Wrap_2150 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1m49rai\/how_would_you_structure_a_project_data_frame_to\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1m49rai\/how_would_you_structure_a_project_data_frame_to\/\">[comments]<\/a><\/span>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    \/u\/Proof_Wrap_2150<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.reddit.com\/r\/datascience\/comments\/1m49rai\/how_would_you_structure_a_project_data_frame_to\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How would you structure a project (data frame) to scrape and track listing changes over time? I\u2019m working on a project where I want to scrape data daily (e.g., real estate listings from a site like RentFaster or Zillow) and track how each listing changes over time. I want to be able to answer questions [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,99],"tags":[84,7,1259],"class_list":["post-5457","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-datascience","tag-data","tag-how","tag-would"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/5457"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=5457"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/5457\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=5457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=5457"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=5457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}