Skip to content

Commit 9edb19a

Browse files
Merge pull request #100 from julianpollmann/update-docs
Update docs
2 parents 8825c52 + a685347 commit 9edb19a

File tree

2 files changed

+35
-40
lines changed

2 files changed

+35
-40
lines changed

.github/workflows/CI_build.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
- name: Set up Python
1616
uses: actions/setup-python@v5
1717
with:
18-
python-version: "3.10"
18+
python-version: "3.13"
1919
- name: Python info
2020
run: |
2121
which python
@@ -38,12 +38,12 @@ jobs:
3838
- name: Check whether import statements are used consistently
3939
shell: bash -l {0}
4040
run: poetry run isort --check-only --diff --conda-env spec2vec-dev .
41-
- name: SonarQube Scan
42-
if: github.repository == 'iomega/spec2vec'
43-
uses: SonarSource/sonarqube-scan-action@master
44-
env:
45-
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
46-
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
41+
# - name: SonarQube Scan
42+
# if: github.repository == 'iomega/spec2vec'
43+
# uses: SonarSource/sonarqube-scan-action@master
44+
# env:
45+
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
46+
# SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
4747

4848
build_pypi:
4949
name: Pypi and documentation build / python-${{ matrix.python-version }} / ${{ matrix.os }}
@@ -53,10 +53,10 @@ jobs:
5353
fail-fast: false
5454
matrix:
5555
os: ['ubuntu-latest', 'macos-latest', 'windows-latest']
56-
python-version: ['3.10']
56+
python-version: ['3.10', '3.11', '3.12', '3.13']
5757
exclude:
5858
# already tested in first_check job
59-
- python-version: "3.10"
59+
- python-version: "3.13"
6060
os: ubuntu-latest
6161
steps:
6262
- uses: actions/checkout@v4
@@ -123,7 +123,7 @@ jobs:
123123
activate-environment: spec2vec-build
124124
auto-update-conda: true
125125
environment-file: conda/environment-build.yml
126-
python-version: "3.10"
126+
python-version: "3.13"
127127
- name: Show conda config
128128
shell: bash -l {0}
129129
run: |

README.rst

Lines changed: 25 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ For more extensive documentation `see our readthedocs <https://spec2vec.readthed
8888

8989
Versions
9090
========
91-
Since version `0.5.0` Spec2Vec uses `gensim >= 4.0.0` which should make it faster and more future proof. Model trained with older versions should still be importable without any issues. If you had scripts that used additional gensim code, however, those might occationally need some adaptation, see also the `gensim documentation on how to migrate your code <https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4>`_.
91+
Since version `0.9.0` Spec2Vec uses `gensim >= 4.4.0` which should make it faster and more future proof. Model trained with older versions should still be importable without any issues. If you had scripts that used additional gensim code, however, those might occationally need some adaptation, see also the `gensim documentation on how to migrate your code <https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4>`_.
9292

9393

9494
Installation
@@ -97,14 +97,14 @@ Installation
9797

9898
Prerequisites:
9999

100-
- Python 3.7, 3.8, or 3.9
100+
- Python 3.10, 3.11, 3.12 or 3.13
101101
- Recommended: Anaconda
102102

103103
We recommend installing spec2vec from Anaconda Cloud with
104104

105105
.. code-block:: console
106106
107-
conda create --name spec2vec python=3.8
107+
conda create --name spec2vec python=3.13
108108
conda activate spec2vec
109109
conda install --channel bioconda --channel conda-forge spec2vec
110110
@@ -124,38 +124,32 @@ dataset.
124124

125125
.. code-block:: python
126126
127-
import os
128-
import matchms.filtering as msfilters
127+
from matchms import SpectrumProcessor
128+
from matchms.filtering.default_pipelines import DEFAULT_FILTERS
129129
from matchms.importing import load_from_mgf
130130
from spec2vec import SpectrumDocument
131131
from spec2vec.model_building import train_new_word2vec_model
132132
133-
def spectrum_processing(s):
134-
"""This is how one would typically design a desired pre- and post-
135-
processing pipeline."""
136-
s = msfilters.default_filters(s)
137-
s = msfilters.add_parent_mass(s)
138-
s = msfilters.normalize_intensities(s)
139-
s = msfilters.reduce_to_number_of_peaks(s, n_required=10, ratio_desired=0.5, n_max=500)
140-
s = msfilters.select_by_mz(s, mz_from=0, mz_to=1000)
141-
s = msfilters.require_minimum_number_of_peaks(s, n_required=10)
142-
return s
143-
144-
# Load data from MGF file and apply filters
145-
spectrums = [spectrum_processing(s) for s in load_from_mgf("reference_spectrums.mgf")]
146-
147-
# Omit spectrums that didn't qualify for analysis
148-
spectrums = [s for s in spectrums if s is not None]
133+
# Load spectra from MGF
134+
spectra = list(load_from_mgf("reference_spectrums.mgf"))
135+
136+
# Add some default filters. You can add more filters functions like require min. number of peaks
137+
processor = SpectrumProcessor(DEFAULT_FILTERS)
138+
139+
# Apply filter pipeline
140+
spectra_cleaned, _ = processor.process_spectra(spectra)
141+
spectra_cleaned = [s for s in spectra_cleaned if s is not None]
149142
150143
# Create spectrum documents
151-
reference_documents = [SpectrumDocument(s, n_decimals=2, loss_mz_from=10.0, loss_mz_to=200.0) for s in spectrums]
144+
reference_documents = [SpectrumDocument(s, n_decimals=2) for s in spectra_cleaned]
152145
146+
# Train your reference model
153147
model_file = "references.model"
154148
model = train_new_word2vec_model(reference_documents, iterations=[10, 20, 30], filename=model_file,
155149
workers=2, progress_logger=True)
156150
157151
Once a word2vec model has been trained, spec2vec allows to calculate the similarities
158-
between mass spectrums based on this model. In cases where the word2vec model was
152+
between mass spectra based on this model. In cases where the word2vec model was
159153
trained on data different than the data it is applied for, a number of peaks ("words")
160154
might be unknown to the model (if they weren't part of the training dataset). To
161155
account for those cases it is important to specify the ``allowed_missing_percentage``,
@@ -167,11 +161,12 @@ as in the example below.
167161
from matchms import calculate_scores
168162
from spec2vec import Spec2Vec
169163
170-
# query_spectrums loaded from files using https://matchms.readthedocs.io/en/latest/api/matchms.importing.load_from_mgf.html
171-
query_spectrums = [spectrum_processing(s) for s in load_from_mgf("query_spectrums.mgf")]
164+
# query_spectra loaded from files using https://matchms.readthedocs.io/en/latest/api/matchms.importing.load_from_mgf.html
165+
query_spectra = list(load_from_mgf("query_spectrums.mgf"))
166+
query_spectra_cleaned, _ = processor.process_spectra(query_spectra)
172167
173-
# Omit spectrums that didn't qualify for analysis
174-
query_spectrums = [s for s in query_spectrums if s is not None]
168+
# Omit spectra that didn't qualify for analysis
169+
query_spectra_cleaned = [s for s in query_spectra_cleaned if s is not None]
175170
176171
# Import pre-trained word2vec model (see code example above)
177172
model_file = "references.model"
@@ -181,11 +176,11 @@ as in the example below.
181176
spec2vec_similarity = Spec2Vec(model=model, intensity_weighting_power=0.5,
182177
allowed_missing_percentage=5.0)
183178
184-
# Calculate scores on all combinations of reference spectrums and queries
185-
scores = calculate_scores(reference_documents, query_spectrums, spec2vec_similarity)
179+
# Calculate scores on all combinations of reference spectra and queries
180+
scores = calculate_scores(reference_documents, query_spectra_cleaned, spec2vec_similarity)
186181
187182
# Find the highest scores for a query spectrum of interest
188-
best_matches = scores.scores_by_query(query_documents[0], sort=True)[:10]
183+
best_matches = scores.scores_by_query(query_spectra_cleaned[0], sort=True)[:10]
189184
190185
# Return highest scores
191186
print([x[1] for x in best_matches])

0 commit comments

Comments
 (0)