top of page
portfolio_Image.jpg
Picture1_edited.jpg

ETL, Statistics, and Dashboards

This project involved a rigorous statistical analysis of data derived from a random number generator, leveraging custom Python implementations of statistical tests, including the Chi-square (CS), Kolmogorov-Smirnov (KS), and Anderson-Darling (AD) tests, each crafted from foundational principles. The analysis focused on the frequency patterns of 8-bit segment pairs within 64-bit numbers, exploring both an overall distribution approach and a conditional probability approach by segmenting pairs based on the initial segment. Cleaned and meticulously organized data was stored in structured Excel sheets, setting a strong foundation for visual interpretation. Results were conveyed through a sophisticated BI dashboard, developed with Plotly Dash and deployed on Heroku, providing dynamic visual insights through various plots to enhance data accessibility and interpretation.

Deep and Shallow ML

The project tackled the challenge of predicting the distance to the nearest perfect square (either previous or next) for a given number xxx using both shallow and deep supervised machine learning models. The dataset encompassed integers from 1 to 100,000, detailing their closest square numbers along with additional mathematical features such as digital root and polarity. The supervised learning approach included two methodologies: a shallow learning technique using various estimators, with XGBoost emerging as the top performer, and a deep learning approach implemented with the PyTorch library. Results revealed that XGBoost outperformed the deep learning model, achieving notable accuracy, showcasing its suitability for this specific prediction task.

Picture3.png

Network Graph

Picture7.png

This project developed a comprehensive code to map and analyze Wikipedia-linked relationships around a given person, such as "Leonardo da Vinci," by extracting and exploring all relevant links that share specific elements (e.g., "Mona Lisa"). The code follows links recursively based on shared criteria to a user-defined search depth, gathering extracted entities in a "nodes list" saved as a CSV. To identify interconnections, an additional code assesses shared links between entities, constructing an adjacency matrix that reflects these relationships and enables the creation of a network graph (G). Another component utilizes web scraping from Wikidata.com to categorize entities as either persons (verified through the presence of a birth date) or non-persons, enriching each node with attributes like birth date and additional descriptive information. Visual representation of the network is achieved through Plotly Dash Cytoscape and Gephi, with centrality measures applied to reveal the most influential entities. Additionally, a geographical plot of nationalities offers insight into the network's global distribution of persons.

Analysis and Power BI

Used Python and Power BI to create an informative dashboard to help stakeholders track the revenue of a bycicles company.

Picture6.jpg
bottom of page