We are seeking a highly skilled and experienced Data Scientist with expertise in Natural Language Processing (NLP) and classification model sets to join our dynamic team. As a Data Scientist, you will play a crucial role in developing and implementing advanced algorithms and models to extract insights from complex textual data and solve challenging business problems. You will have the opportunity to work on cutting-edge projects and contribute to the development of innovative solutions.
- Design, develop, and implement NLP algorithms and techniques for text preprocessing, feature extraction, sentiment analysis, topic modeling, named entity recognition, document classification, and other related tasks.
- Develop robust classification models and frameworks using state-of-the-art machine learning and deep learning techniques for various applications, including document categorization, text classification, sentiment analysis, and recommendation systems.
- Help define workflow and data stores to get data out of unstructured stores and into usable Data Science formats.
- Collaborate with cross-functional teams, including product owners, software developers, and domain experts, to understand business requirements and develop end-to-end solutions.
- Perform exploratory data analysis and visualization to gain insights into textual data, identify patterns, and inform feature engineering and model development.
- Evaluate and compare the performance of different (but not limited to) NLP, generative and classification models, and propose enhancements or modifications to improve their accuracy, efficiency, and scalability.
- Stay up-to-date with the latest advancements in machine learning methodologies, techniques, and frameworks, and apply them to solve complex business problems.
- Communicate findings, insights, and technical concepts effectively to both technical and non-technical stakeholders through reports, presentations, and visualizations.
- Support implementation of analytics tools and methodologies within our engineering tech stack.
- Master's or Ph.D. degree in Computer Science, Data Science, Statistics, or a related field.
- Strong background and expertise in Natural Language Processing (NLP) techniques, including text preprocessing, feature extraction, sentiment analysis, topic modeling, named entity recognition, and document classification.
- Proven experience in designing and implementing classification models and algorithms, such as Naïve Bayes, Logistic Regression, Support Vector Machines (SVM), Random Forests, Gradient Boosting, and Neural Networks.
- Proficiency in programming languages such as Python, Spark or Java, and libraries/frameworks such as NLTK, SpaCy, scikit-learn, TensorFlow, or PyTorch.
- Experience with data manipulation, analysis, and visualization using tools such as Pandas, NumPy, Matplotlib.
- Strong understanding of statistical analysis and machine learning principles, and ability to apply them to real-world legal problems.
- Solid knowledge of software development practices, version control systems, and agile methodologies.
- Excellent problem-solving skills, analytical thinking, and attention to detail.
- Effective communication skills and ability to collaborate in a team-oriented environment.
- Proven track record of delivering high-quality results on time and effectively managing high profile projects and priorities.
- Experience with true big data (exabytes and higher) procession practices.
- Knowledge of cloud computing platforms such as AWS, Azure, or GCP.
- Ability to mentor and educate on Data Science deployment and best practices to technical groups
30-Day Goals: Understanding our Data and defining Standard Fields to Fuel Settlement Prediction
- Understanding our Data and defining Standard Fields to Fuel Settlement PredictionConduct an in-depth analysis of the unstructured JSON data corpus to understand its characteristics, key attributes, and potential challenges in our Filevine Core dataset to help define a Standard Fields approach to leverage/use unstructured data.
- Develop a data preprocessing pipeline to clean, normalize, and transform the JSON data into a structured format suitable for NLP and text classification tasks on specific data sets using AWS Sagemaker.
60-Day Goals: Adding Value to our Data by Creating Standard Fields to Fuel Settlement Prediction
- Develop and fine-tune NLP models for tasks such as named entity recognition, topic modeling, and text categorization using the unstructured data from Filevine Core for standard fields (global).
90-Day Goals: Adding Value to our Data by Creating Standard Fields to Fuel Settlement Prediction
- Continue to develop and fine-tune NLP models for tasks such as named entity recognition, topic modeling, and text categorization using the unstructured data from Filevine Core for standard fields (local).
6 Month Goal: Standard fields are being used to generate a Settlement Prediction/Amount in Beta testing
- Have Standard Fields (Local and Global) available for DS/Analytics use from Filevine core data.
- Create a model that leverages standard fields (features) to predict settlement likelihood and settlement amount.
$150,000 - $230,000 a year
$150,000 - $230,000 a year
The base salary range represents the low and high end of the salary range for this position. The total compensation package for this position will be determined by each individual’s location, qualifications, education, work experience, skills and performance. We believe in the importance of pay equity - the range listed is just one component of Filevine’s total compensation package for employees. Other rewards may include commissions, stock options, a paid time off policy, as well as a comprehensive benefits package, including medical, dental, and vision.
Cool Company Benefits:
- A dynamic, rapidly growing company, focused on helping organizations thrive
- Medical, Dental, & Vision Insurance (for full-time employees)
- Competitive & Fair Pay
- Maternity & paternity leave (for full-time employees)
- Short & long-term disability
- Ergonomic and height-adjustable workstations for onsite employees
- Opportunity to learn from a dedicated leadership team
- Weekly Taco Lunches in the summer/fall/spring for onsite employees
- Centrally located open office building in Sugar House
- Flexible hybrid work schedules depending on the department with some departments offering fully remote positions in the United States (R&D)
- Top-of-the-line company swag