Data Science วิชาใหม่ที่น่าสนใจแห่งศตวรรษที่ 21


ดูคลิป Data Science Learning Roadmap for 2021 ของ Harshit Tyagi

https://youtu.be/nM_wZIzKEhc&t=220

ประกอบกับบทความ

https://towardsdatascience.com/data-science-learning-roadmap-for-2021-84f2ba09a44f

ได้หัวข้อที่นำไปค้นหา เพื่อเป็นแนวทาง สำหรับผู้ต้องการศึกษา Data Science

1 Programming

1.1 Data Structures (python/R)

1.2 SQL scripting

https://tinyurl.com/2kx6c3wr

1.3 Conditionals, List/Dict comprehension

1.3.1 Conditionals List comprehension

https://tinyurl.com/4hdjjpmm

1.3.2 Conditionals Dictionary comprehension

https://tinyurl.com/mr3zpk3w

1.4 Object oriented programming

https://tinyurl.com/3a4a73xc

1.5 Working with external libraries

1.6 Fundamental algorithms - searching, sorting, trees, graphs, etc.

1.6.1 Fundamental algorithms – searching

https://tinyurl.com/mrfce5nm

1.6.2 Fundamental algorithms - sorting

https://tinyurl.com/2rz767d5

1.6.3 Fundamental algorithms - trees

https://tinyurl.com/yh5by24f

1.6.4 Fundamental algorithms - graphs

https://tinyurl.com/2p8k4k5n

1.7 Advanced: Functional programming

--------------------------

2 Data Extraction and Wrangling

Profile: Data Analysts(Any deptt.)

 

Data Extraction

https://tinyurl.com/2p8pprux

Data Wrangling

https://tinyurl.com/y9ym8c36

2.1 Scripting - extracting data from websites, APIs, DBs

https://tinyurl.com/3cmmdfup

2.2 Data formatting (type conversion)

https://tinyurl.com/2p89przh

2.3 Libraries - Pandas and NumPy

2.3.1 Data Extraction Pandas

https://tinyurl.com/bdpms9yz

2.3.2 Data Wrangling Pandas

https://tinyurl.com/ys52pk3v

2.3.3 Data Extraction Numpy

https://tinyurl.com/2z3nf33r

2.3.4 Data Wrangling NumPy

https://tinyurl.com/yu4texrj

2.4 Data transformation- joining, slicing, indexing

2.4.1 Data transformation- joining

https://tinyurl.com/bddtjhk6

2.4.2 Data transformation- slicing

https://tinyurl.com/2p8b6z5r

2.4.3 Data transformation- indexing

https://tinyurl.com/3wmc7cwd

2.5 Handling missing values - can use tools like trifacta

https://tinyurl.com/2v2ufyct

-----------------------------------

3 EDA, Business acumen and Storytelling

Profiles: Data Analyst, Business

Analysts, Marketing Analyst.

Data Product Manager

 

3 EDA Exploratory Data Analysis

https://tinyurl.com/ydt72z8e

3.1 Defining business-focused questions

3.2 Studying data distribution – outliers

https://tinyurl.com/yka8bcw2

3.3 Univariate and multivariate analysis

https://tinyurl.com/4x5t3ytz

3.4 Data Visualization

https://tinyurl.com/332r5fnj

3.4.1 Data Visualization – matplotlib

https://tinyurl.com/yv9v3z8n

3.4.2 Data Visualization – seaborn

https://tinyurl.com/mrh8nf4n

3.4.3 Data Visualization - plotly

https://tinyurl.com/4tvmux5e

3.5 Building dashboards- excel/tableau, Jupyter

3.5.1 Building dashboards- excel

https://tinyurl.com/35p2z64v

3.5.2 Building dashboards- tableau

https://tinyurl.com/6feedczw

3.5.3 Building dashboards- Jupyter

https://tinyurl.com/5bsvrkty

3.6 Writing concise and insightful reports

3.7 Business acumen

https://tinyurl.com/bdht7r7a

-----------------------------

4 Data Engineering

Profiles: Data Engineer,

DevOps Engineer,

Data Architect

 

4 Data Engineering

https://tinyurl.com/mw7cwzj5

4.1 Strong programming skills

4.2 Working with CLI Command Line Interface

https://tinyurl.com/4zmbze77

4.3 Building ETL Extract-Transform-Load pipelines

https://tinyurl.com/bp6dn3sy

4.4 Data engineering tools

https://tinyurl.com/2ppayc8v

Using tools - Spark, Kafka, Airflow, etc

4.4.1 Data engineering tool - Spark

https://tinyurl.com/ymfxvfkx

4.4.2 Data engineering tool - Kafka

https://tinyurl.com/2ua282bp

4.4.3 Data engineering tool - Airflow

https://tinyurl.com/2xe2s798

4.5 Cloud Services - AWS, GCP, Azure

4.5.1 Cloud Services

https://tinyurl.com/4atu83er

4.5.2 Cloud Services - AWS

https://tinyurl.com/3acnmb6u

4.5.3 Cloud Services – GCP Google Cloud Platform

https://tinyurl.com/3rw5c82s

4.5.4 Cloud Services - Azure

https://tinyurl.com/bxyk3yys

4.6 Algorithms - MapReduce, YARN

4.6.1 Algorithms - MapReduce

https://tinyurl.com/5n99jnf4

4.6.2 Algorithms – YARN (Yet Another Resource Negotiator)

https://tinyurl.com/4yzvtub8

4.7 Deploying ML models in production

https://tinyurl.com/s8mzpnh8

------------------------------

5 Statistics and Mathematics

Profiles: Data Scientist,

Quantitative Analysts

 

5.1 Descriptive - mean, median, mode, std. etc

https://tinyurl.com/2p9sjkwc

5.2 Inferential - hypothesis & A/B testing, Cl, p-value

https://tinyurl.com/3f8uwb86

5.2.1 Inferential - hypothesis & A/B testing

https://tinyurl.com/bdzzay9c

5.2.2 Inferential – Cl (Confidence Interval)

https://tinyurl.com/4jn3v2j5

5.2.3 Inferential - p-value

https://tinyurl.com/bdhbtvwj

5.3 Experiment Design

https://tinyurl.com/2v6wkhct

5.4 Probability - conditional, bayes theorem, etc

5.4.1 Probability 

https://tinyurl.com/2p9a5vd9

5.4.2 Probability - conditional

https://tinyurl.com/2mu8p86c

5.4.3 Probability – Bayes’ theorem

https://tinyurl.com/mrbnmk64

5.5 ANOVA, Chi-Square test

5.5.1 ANOVA Analysis of Variance

https://tinyurl.com/yu3rfuvb

5.5.2 Chi-Square test

https://tinyurl.com/2wr85m46

5.6 Sampling, data distributions, t-tests

5.6.1 Sampling

https://tinyurl.com/5n944jvt

5.6.2 Data distributions

https://tinyurl.com/4b98dcer

5.6.3 t-tests

https://tinyurl.com/3ycycrae

5.7 Linear Algebra

https://tinyurl.com/4f9kdyv8

5.8 Single and multivariate calculus

5.8.1 Single variate calculus

https://tinyurl.com/48ke7nxs

5.8.2 Multivariate calculus

https://tinyurl.com/wrabwnn8

--------------------------------

6 Machine Learning

Profiles: ML Engineer.

Data Scientist

 

6.1 Supervised - classification, regression

https://tinyurl.com/yc6tkscp

6.2 Unsupervised - clustering, dimensionality reduction

https://tinyurl.com/jay7yrfh

6.3 Reinforcement learning - TF-Agents, optimising rewards

https://tinyurl.com/4hjhebyj

6.3.1 Reinforcement learning - TF-Agents

https://tinyurl.com/yxez7zy9

6.3.2 Reinforcement learning - optimising rewards

https://tinyurl.com/2nmumnm2

6.4 Performance metrics - RMS, accuracy, confusion matrix, AUC-ROC, etc

6.4.1 Performance metrics – RMS Root-Mean-Square

https://tinyurl.com/yc8rypcw

6.4.2 Performance metrics – accuracy

https://tinyurl.com/25kvc8jm

6.4.3 Performance metrics - confusion matrix

https://tinyurl.com/3j6994bd

6.4.4 Performance metrics - AUC-ROC

Area Under Curve- Receiver Operating Characteristic

https://tinyurl.com/bdftnwvx

6.5 Hyperparameter tuning

https://tinyurl.com/3zp97ask

6.6 Statistical ML - KNN, Decision trees, bagging, boosting

6.7 Ensemble Models - Random forests, voting classifiers, adaboost

-------------------------

Harshit Tyagi profile

https://www.linkedin.com/in/tyagiharshit?originalSubdomain=in

Harshit Tyagi เป็น Data Science Engineer ชาวอินเดีย จบป.ตรีคอมพิวเตอร์
Bharati Vidyapeeth's College Of Engineering นิวเดลี อินเดีย

มีผลงานวิจัยร่วมกับทีมมหาวิทยาลัยเยล MIT และ UCLA

มีงานเขียนและงานสอนจำนวนมาก

Harshit Tyagi Blog

https://muckrack.com/harshit-tyagi/articles

Harshit Tyagi twitter

https://twitter.com/dswharshit

 

** มีแผนภูมิ Data Science ที่น่าสนใจเพิ่มเติม

https://www.geeksforgeeks.org/how-to-become-data-scientist-a-complete-roadmap/

https://www.onlinemanipal.com/blogs/data-science-roadmap

https://github.com/MrMimic/data-scientist-roadmap

** ลิงค์ที่เกี่ยวข้อง

Machine Learning

https://www.gotoknow.org/posts/711453

เรียน Data Science ทาง YouTube

https://www.gotoknow.org/posts/711726

 

 

คำสำคัญ (Tags): #Data Science#Harshit Tyagi#machine learning
หมายเลขบันทึก: 711665เขียนเมื่อ 11 กุมภาพันธ์ 2023 20:02 น. ()แก้ไขเมื่อ 18 กุมภาพันธ์ 2023 20:37 น. ()สัญญาอนุญาต: สงวนสิทธิ์ทุกประการจำนวนที่อ่านจำนวนที่อ่าน:


ความเห็น (0)

ไม่มีความเห็น

อนุญาตให้แสดงความเห็นได้เฉพาะสมาชิก
พบปัญหาการใช้งานกรุณาแจ้ง LINE ID @gotoknow
ClassStart
ระบบจัดการการเรียนการสอนผ่านอินเทอร์เน็ต
ทั้งเว็บทั้งแอปใช้งานฟรี
ClassStart Books
โครงการหนังสือจากคลาสสตาร์ท