Music_DB

Wikipedia Scraper Instructions

The Wikipedia Scraper is a tool designed to extract, clean, and process Music Meta data from Wikipedia pages. This tool is part of the repo repository and is located in the code/wikipedia directory. Follow these instructions to set up and run the scraper.

Prerequisites

Before you start, ensure you have Python and Git installed on your system. You'll need Git to clone the repository and access the Wikipedia Scraper. If you're unsure whether you have Git or need to install it, please refer to the Git documentation. Getting Started

Clone the Repository

Start by cloning the repo repository to your local machine. Open a terminal or command prompt and run the following command:

git clone https://github.com/edin-dal/music_db

Navigate to the Wikipedia Scraper Directory

Change into the directory containing the Wikipedia Scraper script:

cd music_db/code/wikipedia

Make the Script Executable

Before running the script, you need to ensure it has the necessary execution permissions. Grant execution permissions by running:

chmod +x ./scrape_clean.sh

Run the Scraper

Now, you're ready to run the scraper. Execute the script by running:

./scrape_clean.sh

The script will begin processing. This may take some time.

Output Files

Upon successful completion, the script will create an output folder within the repo/code/wikipedia directory. This folder will contain the final cleaned and processed data extracted from Wikipedia. Troubleshooting

If you encounter permission errors while running ./scrape_clean.sh, ensure that you've correctly set the execution permissions as described in step 3. Ensure you are in the correct directory (repo/code/wikipedia) before running the script. Running it from a different directory may cause path-related errors.

Support

For questions, issues, or support regarding the Wikipedia Scraper, please open an issue in the GitHub repository, and I'll get back to you as soon as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.idea		.idea
code		code
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Music_DB

Wikipedia Scraper Instructions

Prerequisites

Clone the Repository

Navigate to the Wikipedia Scraper Directory

Make the Script Executable

Run the Scraper

Output Files

Support

About

Uh oh!

Releases

Packages

Languages

edin-dal/music_db

Folders and files

Latest commit

History

Repository files navigation

Music_DB

Wikipedia Scraper Instructions

Prerequisites

Clone the Repository

Navigate to the Wikipedia Scraper Directory

Make the Script Executable

Run the Scraper

Output Files

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages