You can be fully remote for this role as long as you have excellent English.
Start Date - Jan 2021
· Over 5+ years of strong experience in Data Analyst, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Data modeling, Data Visualization,
· Experienced on data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, and advanced data processing (preferably with HD Insights)
· Experienced in Dimensional Data Modeling experience using Data modeling, Relational Data modeling, Star Join Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logical data modeling.
· Extensively used SQL, Numpy, Pandas, Scikit-learn, Spark, Hive for Data Analysis and Model building.
· Collaborates with cross-functional team in support of business case development and identifying modeling method (s) to provide business solutions.
· Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
· Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models.
· Developed Spark/Scala, Python code in the Hadoop/Hive environment
· Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
Additional relevant experience:
· Experienced with Integration Services (SSIS), Reporting Service (SSRS) and Analysis Services (SSAS)
· Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
· Expertise in Normalization to 3NF/De-normalization techniques for optimum performance in relational and dimensional database environments.
· Extensive experienced on ER Modeling, Dimensional Modeling (StarSchema, SnowflakeSchema) and Data warehousing and OLAP tools.
· Expertise in data base programming (SQL, PLSQL) XML, DB2, Informix, Teradata, Data base tuning and Query optimization.
· Expertise in performing data analysis and data profiling using complex SQL on various sources systems including SAP and Teradata.
Database Design Tools and Data Modeling: Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball.
Databases: SQL Server 2017,
Languages: SQL, Spark, Python.