Overview
This Jupyter notebook contains a Python script for merging and processing CSV files related to shipping and logistics data. The script reads multiple CSV files from different directories, merges them, and exports the result to a single CSV file.
Dependencies
The script relies on the following Python libraries:
- pandas
- glob
File Structure
The notebook processes files of similar structure from their respective directories
Key Functions and Operations
Reading CSV Files
import pandas as pd
import glob
def read_csv_files(directory):
all_files = glob.glob(f"{directory}/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, delimiter='\n', skiprows=[0,1,2,3,9])
li.append(df)
return pd.concat(li, axis=0, ignore_index=True, sort=True)
Merging DataFrames
merged = [f, f1, f2, f3]
result = pd.concat(merged, sort=False)
Exporting Result
result.to_csv('inlandCarrier.csv')
Usage
- Ensure all required CSV files are in their respective directories.
- Run the notebook cells sequentially.
- The script will read all CSV files, merge them, and export the result to 'inlandCarrier.csv'.
Additional Notes
- The notebook includes operations on a file named 'Documents.csv', which contains JSON data. These operations are incomplete in the current version.
- Some file paths and naming conventions are hardcoded and may need adjustment for different environments.
- The notebook contains print statements and DataFrame displays, likely used for debugging and data inspection.
Potential Improvements
- Parameterize file paths and directory names for better flexibility.
- Add error handling for file reading and merging operations.
- Implement logging instead of print statements for better debugging.
- Complete the processing of 'Documents.csv' if required for the overall workflow.
- Add data validation steps to ensure data integrity after merging.
Conclusion
This notebook provides a foundation for merging multiple CSV files containing shipping and logistics data. It can be expanded and refined to fit into a larger data processing pipeline for analyzing shipping routes, prices, or other logistics-related information.