Text Classification of Data Roles

A Predictive model that makes predictions based on the students' current skills and past projects.

Photo by Franki Chamaki on Unsplash

Data Role Recommendation

This project is a Text Extraction and Text Classification project. The model takes input in text format and extracts the skills contained to use as predictors for suitable data-role using the Functional API of Bidirectional LSTM Network.



Data Distribution

- Majority of the job posting data that we have were posted ~30 days ago. There's not much of job vacancy that were posted in recently.

- Since this data only consited of data profession, the most common word that would appear is data, product, business, analytic, and Python.

- Majority of the data profession job vacancy that were posted in indeed is located in Jakarta.

- Each for the data role has variation in job title that mostly consisted of the addition of Senior, Junior, Lead and probably the business function that they working on.

- Except for the 3 common data roles that we currently analyzed, Business Intelligence is also frequently appeared. We can also see there are Big Data Engineers that used different vocab as the name of job title but mostly have similar job descriptions as the others.

- The stand-alone title without the addition of adjective (senior, junior) is the most common title for each of the data role.


- For Data Analyst, reporting and insight is the one that can differentiate them amongst the others.

- For Data Engineer, data warehouse, data pipeline, and infrastructure is the one words that can differentiate this role from the others.

- For Data scientists, we can also see a word that could differentiate this role from the others are machine learning and model.

Other projects: