{
    "componentChunkName": "component---src-templates-portofolio-post-js",
    "path": "/data-modeling-postgresql",
    "result": {"data":{"markdownRemark":{"id":"9c3b4b16-4942-5452-88bb-273f29aa4b4c","html":"<h1>🚀 Data Modeling With PostgresSQL</h1>\n<p><figure class=\"gatsby-resp-image-figure\" style=\"\">\n    <span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 581px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 121.484375%; position: relative; bottom: 0; left: 0; background-image: url('data:image/svg+xml,%3csvg%20xmlns=\\'http://www.w3.org/2000/svg\\'%20width=\\'400\\'%20height=\\'486\\'%20viewBox=\\'0%200%20400%20486\\'%20preserveAspectRatio=\\'none\\'%3e%3cpath%20d=\\'M0%20243v243h401V0H0v243M71%2073v4h24v-4l-1-3-1%201-2%201h-4c-1%200-2%200-2%202-1%201-1%202-3%201v-1c2%200%201-2%200-2l-2%201h-1c-2-2-4-1-5%201s-1%202-1-1c0-4-2-4-2%200m12%2040h29l-2%204-2%204h2c2%200%202%201%202%204l-1%204h-2l-1%201-1-1c1-2-4-1-5%200h-1l-2-1v1c1%201%201%201-1%201l-2-1h-1c-1%201-1%201-1-1s-1-3-3%200l-3%201h-1l-3%201-3-1-1-1h-4c-3%200-4%203-2%205l1-1c0-1%201-2%202-1v2l1%2015v14H14v100h64l-2%204-2%204h2c2%200%202%201%202%2011l-1%2011v1l1%201h-3l2%202%201%201h4c2-1%208-1%209%201%202%201%202%201%203-1%200-1%201-2%203-1h3l1%201c2%202%203%201%202%200v-2l1%201h1c2-1%206-1%206%201-1%201%201%202%202%200h1l1-2c0-2%200-2-1-1-2%201-2%201-2-1h-2c-2%201-5%201-5-1s0-2-2%200c-2%201-3%202-7%202-3-1-4%200-4%201-1%201-1%200-1-1%201-2%200-3-1-1h-7l-3-1c-1%201-1-1-1-11%200-9%200-11%202-11%201%200%201-1-1-5l-1-3h64v-23c0-21%201-28%202-24%201%203%204%200%204-3s-1-4-3-2c-3%204-3%200-3-24v-24H79v-15c0-14%200-15%202-14l1-1c-1-1%200-1%202-1l3%201h19l1%201%201-1v-1l1%201%204%201%204-1%203-1%203%201%201%201c0-1%201-2%203-2v2l2-1%2010-1h11v82l1%2083h13v8l1%209v-19h-14V130h-11c-10%200-11%200-11-2s0-2-1-1l-1%202v1l-3-1h-1l-3%201-3-1-1-2-1%201c0%204-2%202-2-2l1-5c2%200%202-1%201-4l-2-3a283%20283%200%2001-1-2c-32%200-45%201-29%201m190%2035h112v19H273v18h112v91H274v-45l-1-45v91h56l-2%204-2%204h2c2%200%202%201%202%205v4h-6c-5%200-6%200-6%202v2l1-1%206-1h6v-6c0-4%201-5%202-5v-4l-2-3%2028-1h28V147h-56l-57%201m-115%2024v10l1-10v-9h50l50-1H158v10m-143%201v9h127v-19H15v10m243%200v9H158v14c0%2015%200%2017-3%2015l-2-1%201%202v1l-1%202%203%201%202-1v47h50l-2%204-2%204h2c2%200%202%201%202%2012v12h-7l-7%201-1%209%201%209%201-9v-8h14v-13c0-11%200-13%202-13v-4l-2-3%2025-1c13%200%202-1-25-1h-50v-24c0-22%201-29%202-23%201%201%202%202%206%202%206%200%206%200%205-3v-7c0-3%200-3-5-3s-5%200-5%203c-1%206-3%205-3-1v-5h99v31a1221%201221%200%20000-58m16%203v8h111v-16H274v8M15%20191v8h127v-16H15v8m144%200v8h99v-16h-99v8M15%20231v30h127v-43h-3c-1%200-2%200-1-1l2-1c2%201%202%200%202-8v-8H15v31m160-27v7l2-2c1-2%201-2%201%202s0%205%203%205%204-1%204-3%201-2%205-2c2%200%203-2%201-2v-2c1%200%201-1-1-2h-2l-1%201c-5%200-6%200-6%203%200%202%200%201-1-1-1-5-3-6-5-4m-48%202l-1%203h-2l-2%203c-1%202-1%202-1-1v-5l-2%202v3l-1-3-2-2v1l-1%201-1%201h2v2c1%201%202%202%201%203%200%202%200%202%203%202%202%200%203-1%203-2%200-3%202-2%202%200%201%202%205%202%205%200l1-1v2h4l1%201%201-2v-5l1-2v-1l-2%202h-1l-2-2v2l-4-2-2-2v2m108%20117v9h-56a1051%201051%200%20000%201h56v16h-56a1052%201052%200%20000%201h56v121h-56l-56%201h113v-79a3179%203179%200%2000-1-70m-95%20122c-2%204-1%206%202%207l5-1h1l8%201h7v-4c0-3%200-4-1-3v1h-7l-2%202c-1%202-3%203-3%201l1-1v-1h-5c-2-1-4%200-4%202l-1-2c0-3-1-3-1-2\\'%20fill=\\'%23d3d3d3\\'%20fill-rule=\\'evenodd\\'/%3e%3c/svg%3e'); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"ERD project \"\n        title=\"ERD project SparkifyDB\"\n        src=\"/static/6819318229a6e279d7348ef625881538/92d15/sparkifydb.png\"\n        srcset=\"/static/6819318229a6e279d7348ef625881538/6f3f2/sparkifydb.png 256w,\n/static/6819318229a6e279d7348ef625881538/01e7c/sparkifydb.png 512w,\n/static/6819318229a6e279d7348ef625881538/92d15/sparkifydb.png 581w\"\n        sizes=\"(max-width: 581px) 100vw, 581px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span>\n    <figcaption class=\"gatsby-resp-image-figcaption\">ERD project SparkifyDB</figcaption>\n  </figure></p>\n<h2><strong>Overview</strong></h2>\n<p>In this project, we create data modeling with postgres and build ETL pipeline using python.\n<strong>Study Case</strong> : A startup in indonesia wants to analyze the data they have been collecting on songs and user activity on their new music streaming app.\nCurrently, this startup collecting data in json format and the analytics team is particularly interested in understanding what songs user are listening to.</p>\n<h2><strong>Song Dataset</strong></h2>\n<p>Songs dataset is a subset  of [Million song dataset]((<a href=\"http://millionsongdataset.com/\">http://millionsongdataset.com/</a>)</p>\n<p>Sample record:</p>\n<div class=\"gatsby-highlight\" data-language=\"json\"><pre class=\"language-json\"><code class=\"language-json\"><span class=\"token punctuation\">{</span><span class=\"token property\">\"num_songs\"</span><span class=\"token operator\">:</span> <span class=\"token number\">1</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"artist_id\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"ARJIE2Y1187B994AB7\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"artist_latitude\"</span><span class=\"token operator\">:</span> <span class=\"token null keyword\">null</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"artist_longitude\"</span><span class=\"token operator\">:</span> <span class=\"token null keyword\">null</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"artist_location\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"artist_name\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"Line Renaud\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"song_id\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"SOUPIRU12A6D4FA1E1\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"title\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"Der Kleine Dompfaff\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"duration\"</span><span class=\"token operator\">:</span> <span class=\"token number\">152.92036</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"year\"</span><span class=\"token operator\">:</span> <span class=\"token number\">0</span><span class=\"token punctuation\">}</span></code></pre></div>\n<h2><strong>Log Dataset</strong></h2>\n<p>Logs dataset is generated by <a href=\"https://github.com/Interana/eventsim\">Event Simulator</a></p>\n<p>Sample Record :</p>\n<div class=\"gatsby-highlight\" data-language=\"json\"><pre class=\"language-json\"><code class=\"language-json\"><span class=\"token punctuation\">{</span><span class=\"token property\">\"artist\"</span><span class=\"token operator\">:</span> <span class=\"token null keyword\">null</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"auth\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"Logged In\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"firstName\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"Walter\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"gender\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"M\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"itemInSession\"</span><span class=\"token operator\">:</span> <span class=\"token number\">0</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"lastName\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"Frye\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"length\"</span><span class=\"token operator\">:</span> <span class=\"token null keyword\">null</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"level\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"free\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"location\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"San Francisco-Oakland-Hayward, CA\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"method\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"GET\"</span><span class=\"token punctuation\">,</span><span class=\"token property\">\"page\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"Home\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"registration\"</span><span class=\"token operator\">:</span> <span class=\"token number\">1540919166796.0</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"sessionId\"</span><span class=\"token operator\">:</span> <span class=\"token number\">38</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"song\"</span><span class=\"token operator\">:</span> <span class=\"token null keyword\">null</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"status\"</span><span class=\"token operator\">:</span> <span class=\"token number\">200</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"ts\"</span><span class=\"token operator\">:</span> <span class=\"token number\">1541105830796</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"userAgent\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"\\\"Mozilla\\/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit\\/537.36 (KHTML, like Gecko) Chrome\\/36.0.1985.143 Safari\\/537.36\\\"\"</span><span class=\"token punctuation\">,</span> <span class=\"token property\">\"userId\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"39\"</span><span class=\"token punctuation\">}</span></code></pre></div>\n<h2>Schema</h2>\n<h4>Fact Table</h4>\n<p><strong>songplays</strong> - records in log data associated with song plays i.e. records with page <code class=\"language-text\">NextSong</code></p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent</code></pre></div>\n<h4>Dimension Tables</h4>\n<p><strong>users</strong>  - users in the app</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">user_id, first_name, last_name, gender, level</code></pre></div>\n<p><strong>songs</strong>  - songs in music database</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">song_id, title, artist_id, year, duration</code></pre></div>\n<p><strong>artists</strong>  - artists in music database</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">artist_id, name, location, latitude, longitude</code></pre></div>\n<p><strong>time</strong>  - timestamps of records in  <strong>songplays</strong>  broken down into specific units</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">start_time, hour, day, week, month, year, weekday</code></pre></div>\n<h2>Project Files</h2>\n<p><code class=\"language-text\">sql_queries.py</code> -> contains sql queries for dropping and  creating fact and dimension tables. Also, contains insertion query template.</p>\n<p><code class=\"language-text\">create_tables.py</code> -> contains code for setting up database. Running this file creates <strong>sparkifydb</strong> and also creates the fact and dimension tables.</p>\n<p><code class=\"language-text\">etl.ipynb</code> -> a jupyter notebook to analyse dataset before loading. </p>\n<p><code class=\"language-text\">etl.py</code> -> read and process <strong>song_data</strong> and <strong>log_data</strong></p>\n<p><code class=\"language-text\">test.ipynb</code> -> a notebook to connect to postgres db and validate the data loaded.</p>\n<h2>Environment</h2>\n<p>Python 3.6 or above</p>\n<p>PostgresSQL 9.5 or above</p>\n<p>psycopg2 - PostgreSQL database adapter for Python</p>\n<h2>How to run</h2>\n<p>Run the drive program <code class=\"language-text\">main.py</code> as below.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">python main.py</code></pre></div>\n<p>The <code class=\"language-text\">create_tables.py</code> and <code class=\"language-text\">etl.py</code> file can also be run independently as below:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">python create_tables.py \npython etl.py </code></pre></div>\n<h2>Source</h2>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 728px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 25.78125%; position: relative; bottom: 0; left: 0; background-image: url('data:image/svg+xml,%3csvg%20xmlns=\\'http://www.w3.org/2000/svg\\'%20width=\\'400\\'%20height=\\'103\\'%20viewBox=\\'0%200%20400%20103\\'%20preserveAspectRatio=\\'none\\'%3e%3cpath%20d=\\'M331%2024c0%202%200%203%201%202h7l8%201%207-1h1c1%201%201%200%201-2%200-3%200-4-1-2%200%202%200%202-2%201h-3c-1%201-1%201-1-1%200-3%200-3-3%200-2%201-3%202-4%201h-2c-1%201-2%201-2-1-1-2-1-2-2-1-2%202-2%202-3%200-2-1-2-1-2%203m-179%202c0%203%200%203%202%202%201-2%201-2%203%200%201%202%201%202%203%201h2c0%202%200%202%202%201%200-1%202-2%203-1v-2l1-1%201%203v3c1%200%205-1%204-2l1-1%201-2%202%201c2%202%202%202%203%201%200-2%202-3%202-1l1%201%201-2c0-2-1-3-2-2h-4c-2%201-10%200-11-1h-1c-2%202-10%202-10%200l-2-1c-2%200-2%201-2%203m194%2065c-38%200-44%200-44%202l44%201c43%200%2044%200%2044-2v-2l-44%201\\'%20fill=\\'%23d3d3d3\\'%20fill-rule=\\'evenodd\\'/%3e%3c/svg%3e'); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"github badge\"\n        title=\"github badge\"\n        src=\"/static/77d2df311da6e01a694dd75c63f5ce50/cecac/github_badge.png\"\n        srcset=\"/static/77d2df311da6e01a694dd75c63f5ce50/6f3f2/github_badge.png 256w,\n/static/77d2df311da6e01a694dd75c63f5ce50/01e7c/github_badge.png 512w,\n/static/77d2df311da6e01a694dd75c63f5ce50/cecac/github_badge.png 728w\"\n        sizes=\"(max-width: 728px) 100vw, 728px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<h2>Reference:</h2>\n<p><a href=\"http://initd.org/psycopg/docs/\">Psycopg</a></p>\n<p><a href=\"https://www.postgresql.org/docs/\">PostgreSQL Documentation</a></p>\n<p><a href=\"https://pandas.pydata.org/pandas-docs/stable/\">Pandas Documentation</a></p>","excerpt":"🚀 Data Modeling With PostgresSQL  Overview In this project, we create data modeling with postgres and build ETL pipeline using python.\nStudy Case…","frontmatter":{"date":"July 27, 2021","slug":"/data-modeling-postgresql","title":"Data Modeling With PostgresSQL","description":"Data Modeling With PostgresSQL","featuredImage":{"childImageSharp":{"gatsbyImageData":{"layout":"fullWidth","backgroundColor":"#282828","images":{"fallback":{"src":"/static/6819318229a6e279d7348ef625881538/38dde/sparkifydb.png","srcSet":"/static/6819318229a6e279d7348ef625881538/38dde/sparkifydb.png 581w","sizes":"100vw"},"sources":[{"srcSet":"/static/6819318229a6e279d7348ef625881538/5502b/sparkifydb.webp 581w","type":"image/webp","sizes":"100vw"}]},"width":1,"height":1.2151462994836488}}}}}},"pageContext":{"id":"9c3b4b16-4942-5452-88bb-273f29aa4b4c","previous":{"id":"7059341d-f731-5c1e-b7e9-3e23979a2b05","frontmatter":{"slug":"/data-portofolio-traveloka","template":"portofolio-post","title":"Customer  Churn Analysis  & Segmentation  "}},"next":{"id":"340196e4-6110-5746-a9d2-1bcdd61d7363","frontmatter":{"slug":"/aws-ML-learning","template":"blog-post","title":"AWS Machine Learning Foundations Course"}}}},
    "staticQueryHashes": ["228695001","2744905544","4267595483"]}