diff --git a/.gitignore b/.gitignore index bc161a0..eab0528 100644 --- a/.gitignore +++ b/.gitignore @@ -13,3 +13,8 @@ __pycache__/ unit_tests/ ruff.toml */scratch/ +csblast-2.2.3/ +outputs/ +pdb100_2021Mar03/ +RFAA_paper_weights.pt +SE3nv-20240131.sif diff --git a/README.md b/README.md index 830234b..7d071ab 100644 --- a/README.md +++ b/README.md @@ -20,21 +20,45 @@ RFAA is not accurate for all cases, but produces useful error estimates to allow ### Setup/Installation -1. Clone the package +1. Install Mamba +``` +wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" +bash Mambaforge-$(uname)-$(uname -m).sh # accept all terms and install to the default location +rm Mambaforge-$(uname)-$(uname -m).sh # (optionally) remove installer after using it +source ~/.bashrc # alternatively, one can restart their shell session to achieve the same result +``` +2. Clone the package ``` git clone https://github.com/baker-laboratory/RoseTTAFold-All-Atom cd RoseTTAFold-All-Atom ``` -2. Download the container used to run RFAA. +3. Create Mamba environment ``` -wget http://files.ipd.uw.edu/pub/RF-All-Atom/containers/SE3nv-20240131.sif +mamba env create -f environment.yaml +conda activate RFAA # NOTE: one still needs to use `conda` to (de)activate environments + +cd rf2aa/SE3Transformer/ +pip3 install --no-cache-dir -r requirements.txt +python3 setup.py install +cd ../../ ``` -3. Download the model weights. +4. Configure signalp6 after downloading a licensed copy of it from https://services.healthtech.dtu.dk/services/SignalP-6.0/ +``` +# NOTE: (current) version 6.0h is used in this example, which was downloaded to the current working directory using `wget` +signalp6-register signalp-6.0h.fast.tar.gz + +# NOTE: once registration is complete, one must rename the "distilled" model weights +mv $CONDA_PREFIX/lib/python3.10/site-packages/signalp/model_weights/distilled_model_signalp6.pt $CONDA_PREFIX/lib/python3.10/site-packages/signalp/model_weights/ensemble_model_signalp6.pt +``` +5. Install input preparation dependencies +``` +bash install_dependencies.sh +``` +6. Download the model weights. ``` wget http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFAA_paper_weights.pt - ``` -4. Download sequence databases for MSA and template generation. +7. Download sequence databases for MSA and template generation. ``` # uniref30 [46G] wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz @@ -56,11 +80,9 @@ tar xfz pdb100_2021Mar03.tar.gz We use a library called Hydra to compose config files for predictions. The actual script that runs the model is in `rf2aa/run_inference.py` and default parameters that were used to train the model are in `rf2aa/config/inference/base.yaml`. We highly suggest using the default parameters since those are closest to the training task for RFAA but we have found that increasing loader_params.MAXCYCLE=10 (default set to 4) gives better results for hard cases (as noted in the paper). -We use a container system called apptainers which have very simple syntax. Instead of developing a local conda environment, users can use the apptainer to run the model which has all the dependencies already packaged. - The general way to run the model is as follows: ``` -SE3nv-20240131.sif -m rf2aa.run_inference --config-name {your inference config} +python -m rf2aa.run_inference --config-name {your inference config} ``` The main inputs into the model are split into: - protein inputs (protein_inputs) @@ -90,7 +112,7 @@ When specifying the fasta file for your protein, you might notice that it is nes Now to predict the sample monomer structure, run: ``` -SE3nv-20240131.sif -m rf2aa.run_inference --config-name protein +python -m rf2aa.run_inference --config-name protein ``` @@ -118,7 +140,7 @@ This repo currently does not support making RNA MSAs or pairing protein MSAs wit Now, predict the example protein/NA complex. ``` -SE3nv-20240131.sif -m rf2aa.run_inference --config-name nucleic_acid +python -m rf2aa.run_inference --config-name nucleic_acid ``` ### Predicting Protein Small Molecule Complexes @@ -127,23 +149,24 @@ Here is an example (from `rf2aa/config/inference/protein_sm.yaml`): ``` defaults: - base - -job_name: 7qxr +job_name: "3fap" protein_inputs: - A: - fasta_file: examples/protein/7qxr.fasta + A: + fasta_file: examples/protein/3fap_A.fasta + B: + fasta_file: examples/protein/3fap_B.fasta sm_inputs: - B: - input: examples/small_molecule/NSW_ideal.sdf + C: + input: examples/small_molecule/ARD_ideal.sdf input_type: "sdf" ``` Small molecule inputs are provided as sdf files or smiles strings and users are **required** to provide both an input and an input_type field for every small molecule that they want to provide. Metal ions can also be provided as sdf files or smiles strings. To predict the example: ``` -SE3nv-20240131.sif -m rf2aa.run_inference --config-name protein_sm +python -m rf2aa.run_inference --config-name protein_sm ``` ### Predicting Higher Order Complexes @@ -172,7 +195,7 @@ sm_inputs: ``` And to run: ``` -SE3nv-20240131.sif -m rf2aa.run_inference --config-name protein_na_sm +python -m rf2aa.run_inference --config-name protein_na_sm ``` ### Predicting Covalently Modified Proteins diff --git a/environment.yaml b/environment.yaml new file mode 100644 index 0000000..8882e34 --- /dev/null +++ b/environment.yaml @@ -0,0 +1,329 @@ +name: RFAA +channels: + - predector + - pyg + - bioconda + - pytorch + - nvidia + - biocore + - conda-forge +dependencies: + - _libgcc_mutex=0.1=conda_forge + - _openmp_mutex=4.5=2_kmp_llvm + - absl-py=2.1.0=pyhd8ed1ab_0 + - aiohttp=3.9.3=py310h2372a71_0 + - aiosignal=1.3.1=pyhd8ed1ab_0 + - alsa-lib=1.2.8=h166bdaf_0 + - asttokens=2.4.1=pyhd8ed1ab_0 + - astunparse=1.6.3=pyhd8ed1ab_0 + - async-timeout=4.0.3=pyhd8ed1ab_0 + - attr=2.5.1=h166bdaf_1 + - attrs=23.2.0=pyh71513ae_0 + - blas=2.121=mkl + - blas-devel=3.9.0=21_linux64_mkl + - blast-legacy=2.2.26=2 + - blinker=1.7.0=pyhd8ed1ab_0 + - brotli=1.1.0=hd590300_1 + - brotli-bin=1.1.0=hd590300_1 + - brotli-python=1.1.0=py310hc6cd4ac_1 + - bzip2=1.0.8=hd590300_5 + - c-ares=1.27.0=hd590300_0 + - ca-certificates=2024.2.2=hbcca054_0 + - cached-property=1.5.2=hd8ed1ab_1 + - cached_property=1.5.2=pyha770c72_1 + - cachetools=5.3.3=pyhd8ed1ab_0 + - cairo=1.16.0=ha61ee94_1014 + - certifi=2024.2.2=pyhd8ed1ab_0 + - cffi=1.16.0=py310h2fee648_0 + - charset-normalizer=3.3.2=pyhd8ed1ab_0 + - click=8.1.7=unix_pyh707e725_0 + - colorama=0.4.6=pyhd8ed1ab_0 + - contourpy=1.2.0=py310hd41b1e2_0 + - cryptography=42.0.2=py310hb8475ec_0 + - cuda-cudart=11.8.89=0 + - cuda-cupti=11.8.87=0 + - cuda-libraries=11.8.0=0 + - cuda-nvrtc=11.8.89=0 + - cuda-nvtx=11.8.86=0 + - cuda-runtime=11.8.0=0 + - cuda-version=11.8=h70ddcb2_3 + - cudatoolkit=11.8.0=h4ba93d1_13 + - cudnn=8.8.0.121=hcdd5f01_4 + - cycler=0.12.1=pyhd8ed1ab_0 + - dbus=1.13.6=h5008d03_3 + - deepdiff=6.7.1=pyhd8ed1ab_0 + - dgl=1.1.2=cuda112py310hc641c19_2 + - executing=2.0.1=pyhd8ed1ab_0 + - expat=2.6.1=h59595ed_0 + - ffmpeg=4.3=hf484d3e_0 + - fftw=3.3.10=nompi_hc118613_108 + - filelock=3.13.1=pyhd8ed1ab_0 + - flatbuffers=22.12.06=hcb278e6_2 + - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 + - font-ttf-inconsolata=3.000=h77eed37_0 + - font-ttf-source-code-pro=2.038=h77eed37_0 + - font-ttf-ubuntu=0.83=h77eed37_1 + - fontconfig=2.14.2=h14ed4e7_0 + - fonts-conda-ecosystem=1=0 + - fonts-conda-forge=1=0 + - fonttools=4.49.0=py310h2372a71_0 + - freetype=2.12.1=h267a509_2 + - frozenlist=1.4.1=py310h2372a71_0 + - fsspec=2024.2.0=pyhca7485f_0 + - gast=0.4.0=pyh9f0ad1d_0 + - gettext=0.21.1=h27087fc_0 + - giflib=5.2.1=h0b41bf4_3 + - glib=2.78.4=hfc55251_4 + - glib-tools=2.78.4=hfc55251_4 + - gmp=6.3.0=h59595ed_0 + - gmpy2=2.1.2=py310h3ec546c_1 + - gnutls=3.6.13=h85f3911_1 + - google-auth=2.28.2=pyhca7485f_0 + - google-auth-oauthlib=0.4.6=pyhd8ed1ab_0 + - google-pasta=0.2.0=pyh8c360ce_0 + - graphite2=1.3.13=h58526e2_1001 + - grpcio=1.51.1=py310h4a5735c_1 + - gst-plugins-base=1.22.0=h4243ec0_2 + - gstreamer=1.22.0=h25f0c4b_2 + - gstreamer-orc=0.4.38=hd590300_0 + - gzip=1.13=hd590300_0 + - h5py=3.9.0=nompi_py310hcca72df_101 + - harfbuzz=6.0.0=h8e241bc_0 + - hdf5=1.14.1=nompi_h4f84152_100 + - hhsuite=3.3.0=py310pl5321h068649b_10 + - icecream=2.1.3=pyhd8ed1ab_0 + - icu=70.1=h27087fc_0 + - idna=3.6=pyhd8ed1ab_0 + - importlib-metadata=7.0.2=pyha770c72_0 + - jack=1.9.22=h11f4161_0 + - jinja2=3.1.3=pyhd8ed1ab_0 + - joblib=1.3.2=pyhd8ed1ab_0 + - jpeg=9e=h0b41bf4_3 + - keras=2.11.0=pyhd8ed1ab_0 + - keras-preprocessing=1.1.2=pyhd8ed1ab_0 + - keyutils=1.6.1=h166bdaf_0 + - kiwisolver=1.4.5=py310hd41b1e2_1 + - krb5=1.20.1=h81ceb04_0 + - lame=3.100=h166bdaf_1003 + - lcms2=2.15=hfd0df8a_0 + - ld_impl_linux-64=2.40=h41732ed_0 + - lerc=4.0.0=h27087fc_0 + - libabseil=20220623.0=cxx17_h05df665_6 + - libaec=1.1.2=h59595ed_1 + - libblas=3.9.0=21_linux64_mkl + - libbrotlicommon=1.1.0=hd590300_1 + - libbrotlidec=1.1.0=hd590300_1 + - libbrotlienc=1.1.0=hd590300_1 + - libcap=2.67=he9d0100_0 + - libcblas=3.9.0=21_linux64_mkl + - libclang=15.0.7=default_hb11cfb5_4 + - libclang13=15.0.7=default_ha2b6cf4_4 + - libcublas=11.11.3.6=0 + - libcufft=10.9.0.58=0 + - libcufile=1.9.0.20=0 + - libcups=2.3.3=h36d4200_3 + - libcurand=10.3.5.119=0 + - libcurl=8.1.2=h409715c_0 + - libcusolver=11.4.1.48=0 + - libcusparse=11.7.5.86=0 + - libdb=6.2.32=h9c3ff4c_0 + - libdeflate=1.17=h0b41bf4_0 + - libedit=3.1.20191231=he28a2e2_2 + - libev=4.33=hd590300_2 + - libevent=2.1.10=h28343ad_4 + - libexpat=2.6.1=h59595ed_0 + - libffi=3.4.2=h7f98852_5 + - libflac=1.4.3=h59595ed_0 + - libgcc-ng=13.2.0=h807b86a_5 + - libgcrypt=1.10.3=hd590300_0 + - libgfortran-ng=13.2.0=h69a702a_5 + - libgfortran5=13.2.0=ha4646dd_5 + - libglib=2.78.4=hf2295e7_4 + - libgomp=13.2.0=h807b86a_5 + - libgpg-error=1.48=h71f35ed_0 + - libgrpc=1.51.1=h4fad500_1 + - libhwloc=2.9.1=hd6dc26d_0 + - libiconv=1.17=hd590300_2 + - liblapack=3.9.0=21_linux64_mkl + - liblapacke=3.9.0=21_linux64_mkl + - libllvm15=15.0.7=hadd5161_1 + - libnghttp2=1.58.0=h47da74e_0 + - libnpp=11.8.0.86=0 + - libnsl=2.0.1=hd590300_0 + - libnvjpeg=11.9.0.86=0 + - libogg=1.3.4=h7f98852_1 + - libopus=1.3.1=h7f98852_1 + - libpng=1.6.43=h2797004_0 + - libpq=15.3=hbcd7760_1 + - libprotobuf=3.21.12=hfc55251_2 + - libsndfile=1.2.2=hc60ed4a_1 + - libsqlite=3.45.1=h2797004_0 + - libssh2=1.11.0=h0841786_0 + - libstdcxx-ng=13.2.0=h7e041cc_5 + - libsystemd0=253=h8c4010b_1 + - libtiff=4.5.0=h6adf6a1_2 + - libtool=2.4.7=h27087fc_0 + - libudev1=253=h0b41bf4_1 + - libuuid=2.38.1=h0b41bf4_0 + - libuv=1.48.0=hd590300_0 + - libvorbis=1.3.7=h9c3ff4c_0 + - libwebp-base=1.3.2=hd590300_0 + - libxcb=1.13=h7f98852_1004 + - libxcrypt=4.4.36=hd590300_1 + - libxkbcommon=1.5.0=h79f4944_1 + - libxml2=2.10.3=hca2bb57_4 + - libzlib=1.2.13=hd590300_5 + - llvm-openmp=17.0.6=h4dfa4b3_0 + - lz4-c=1.9.4=hcb278e6_0 + - markdown=3.5.2=pyhd8ed1ab_0 + - markupsafe=2.1.5=py310h2372a71_0 + - matplotlib=3.8.3=py310hff52083_0 + - matplotlib-base=3.8.3=py310h62c0568_0 + - metis=5.1.1=h59595ed_2 + - mkl=2024.0.0=ha957f24_49657 + - mkl-devel=2024.0.0=ha770c72_49657 + - mkl-include=2024.0.0=ha957f24_49657 + - mpc=1.3.1=hfe3b2da_0 + - mpfr=4.2.1=h9458935_0 + - mpg123=1.32.4=h59595ed_0 + - mpmath=1.3.0=pyhd8ed1ab_0 + - multidict=6.0.5=py310h2372a71_0 + - munkres=1.1.4=pyh9f0ad1d_0 + - mysql-common=8.0.33=hf1915f5_6 + - mysql-libs=8.0.33=hca2cd23_6 + - nccl=2.20.5.1=h6103f9b_0 + - ncurses=6.4=h59595ed_2 + - nettle=3.6=he412f7d_0 + - networkx=3.2.1=pyhd8ed1ab_0 + - nspr=4.35=h27087fc_0 + - nss=3.98=h1d7d5a4_0 + - numpy=1.26.4=py310hb13e2d6_0 + - oauthlib=3.2.2=pyhd8ed1ab_0 + - openbabel=3.1.1=py310heaf86c6_5 + - openh264=2.1.1=h780b84a_0 + - openjpeg=2.5.0=hfec8fc6_2 + - openssl=3.1.5=hd590300_0 + - opt_einsum=3.3.0=pyhc1e730c_2 + - ordered-set=4.1.0=pyhd8ed1ab_0 + - orjson=3.9.15=py310hcb5633a_0 + - packaging=23.2=pyhd8ed1ab_0 + - pandas=2.2.1=py310hcc13569_0 + - pcre2=10.43=hcad00b1_0 + - perl=5.32.1=7_hd590300_perl5 + - pillow=9.4.0=py310h023d228_1 + - pip=24.0=pyhd8ed1ab_0 + - pixman=0.43.2=h59595ed_0 + - ply=3.11=py_1 + - protobuf=4.21.12=py310heca2aa9_0 + - psipred=4.01=1 + - psutil=5.9.8=py310h2372a71_0 + - pthread-stubs=0.4=h36c2ea0_1001 + - pulseaudio=16.1=hcb278e6_3 + - pulseaudio-client=16.1=h5195f5e_3 + - pulseaudio-daemon=16.1=ha8d29e2_3 + - pyasn1=0.5.1=pyhd8ed1ab_0 + - pyasn1-modules=0.3.0=pyhd8ed1ab_0 + - pycparser=2.21=pyhd8ed1ab_0 + - pyg=2.5.0=py310_torch_2.0.0_cu118 + - pygments=2.17.2=pyhd8ed1ab_0 + - pyjwt=2.8.0=pyhd8ed1ab_1 + - pyopenssl=24.0.0=pyhd8ed1ab_0 + - pyparsing=3.1.2=pyhd8ed1ab_0 + - pyqt=5.15.9=py310h04931ad_5 + - pyqt5-sip=12.12.2=py310hc6cd4ac_5 + - pysocks=1.7.1=pyha2e5f31_6 + - python=3.10.13=hd12c33a_0_cpython + - python-dateutil=2.9.0=pyhd8ed1ab_0 + - python-flatbuffers=24.3.6=pyh59ac667_0 + - python-tzdata=2024.1=pyhd8ed1ab_0 + - python_abi=3.10=4_cp310 + - pytorch=2.0.1=py3.10_cuda11.8_cudnn8.7.0_0 + - pytorch-cuda=11.8=h7e8668a_5 + - pytorch-mutex=1.0=cuda + - pytz=2024.1=pyhd8ed1ab_0 + - pyu2f=0.1.5=pyhd8ed1ab_0 + - qt-main=5.15.8=h5d23da1_6 + - re2=2023.02.01=hcb278e6_0 + - readline=8.2=h8228510_1 + - requests=2.31.0=pyhd8ed1ab_0 + - requests-oauthlib=1.3.1=pyhd8ed1ab_0 + - rsa=4.9=pyhd8ed1ab_0 + - scikit-learn=1.4.1.post1=py310h1fdf081_0 + - scipy=1.12.0=py310hb13e2d6_2 + - setuptools=69.1.1=pyhd8ed1ab_0 + - signalp6=6.0g=1 + - sip=6.7.12=py310hc6cd4ac_0 + - six=1.16.0=pyh6c4a22f_0 + - snappy=1.1.10=h9fff704_0 + - sympy=1.12=pypyh9d50eac_103 + - tbb=2021.9.0=hf52228f_0 + - tensorboard=2.11.2=pyhd8ed1ab_0 + - tensorboard-data-server=0.6.1=py310h600f1e7_4 + - tensorboard-plugin-wit=1.8.1=pyhd8ed1ab_0 + - tensorflow=2.11.0=cuda112py310he87a039_0 + - tensorflow-base=2.11.0=cuda112py310h52da4a5_0 + - tensorflow-estimator=2.11.0=cuda112py310h37add04_0 + - termcolor=2.4.0=pyhd8ed1ab_0 + - threadpoolctl=3.3.0=pyhc1e730c_0 + - tk=8.6.13=noxft_h4845f30_101 + - toml=0.10.2=pyhd8ed1ab_0 + - tomli=2.0.1=pyhd8ed1ab_0 + - torchaudio=2.0.2=py310_cu118 + - torchtriton=2.0.0=py310 + - torchvision=0.15.2=py310_cu118 + - tornado=6.4=py310h2372a71_0 + - tqdm=4.66.2=pyhd8ed1ab_0 + - typing-extensions=4.10.0=hd8ed1ab_0 + - typing_extensions=4.10.0=pyha770c72_0 + - tzdata=2024a=h0c530f3_0 + - unicodedata2=15.1.0=py310h2372a71_0 + - unzip=6.0=h7f98852_3 + - urllib3=2.2.1=pyhd8ed1ab_0 + - werkzeug=3.0.1=pyhd8ed1ab_0 + - wheel=0.42.0=pyhd8ed1ab_0 + - wrapt=1.16.0=py310h2372a71_0 + - xcb-util=0.4.0=h516909a_0 + - xcb-util-image=0.4.0=h166bdaf_0 + - xcb-util-keysyms=0.4.0=h516909a_0 + - xcb-util-renderutil=0.3.9=h166bdaf_0 + - xcb-util-wm=0.4.1=h516909a_0 + - xkeyboard-config=2.38=h0b41bf4_0 + - xorg-kbproto=1.0.7=h7f98852_1002 + - xorg-libice=1.1.1=hd590300_0 + - xorg-libsm=1.2.4=h7391055_0 + - xorg-libx11=1.8.4=h0b41bf4_0 + - xorg-libxau=1.0.11=hd590300_0 + - xorg-libxdmcp=1.1.3=h7f98852_0 + - xorg-libxext=1.3.4=h0b41bf4_2 + - xorg-libxrender=0.9.10=h7f98852_1003 + - xorg-renderproto=0.11.1=h7f98852_1002 + - xorg-xextproto=7.3.0=h0b41bf4_1003 + - xorg-xproto=7.0.31=h7f98852_1007 + - xz=5.2.6=h166bdaf_0 + - yarl=1.9.4=py310h2372a71_0 + - zip=3.0=hd590300_3 + - zipp=3.17.0=pyhd8ed1ab_0 + - zlib=1.2.13=hd590300_5 + - zstd=1.5.5=hfc55251_0 + - pip: + - antlr4-python3-runtime==4.9.3 + - assertpy==1.1 + - configparser==6.0.1 + - git+https://github.com/NVIDIA/dllogger.git@0540a43971f4a8a16693a9de9de73c1072020769 + - docker-pycreds==0.4.0 + - e3nn==0.3.3 + - gitdb==4.0.11 + - gitpython==3.1.42 + - hydra-core==1.3.2 + - omegaconf==2.3.0 + - opt-einsum-fx==0.1.4 + - pathtools==0.1.2 + - promise==2.3 + - pynvml==11.0.0 + - pyrsistent==0.20.0 + - pyyaml==6.0.1 + - sentry-sdk==1.41.0 + - shortuuid==1.0.12 + - smmap==5.0.1 + - subprocess32==3.5.4 + - wandb==0.12.0 diff --git a/input_prep/make_ss.sh b/input_prep/make_ss.sh new file mode 100644 index 0000000..bb792f4 --- /dev/null +++ b/input_prep/make_ss.sh @@ -0,0 +1,30 @@ +#!/bin/bash +# From: https://github.com/RosettaCommons/RoseTTAFold + +DATADIR="$CONDA_PREFIX/share/psipred_4.01/data" +echo $DATADIR + +i_a3m="$1" +o_ss="$2" + +ID=$(basename $i_a3m .a3m).tmp + +$PIPE_DIR/csblast-2.2.3/bin/csbuild -i $i_a3m -I a3m -D $PIPE_DIR/csblast-2.2.3/data/K4000.crf -o $ID.chk -O chk + +head -n 2 $i_a3m > $ID.fasta +echo $ID.chk > $ID.pn +echo $ID.fasta > $ID.sn + +makemat -P $ID +psipred $ID.mtx $DATADIR/weights.dat $DATADIR/weights.dat2 $DATADIR/weights.dat3 > $ID.ss +psipass2 $DATADIR/weights_p2.dat 1 1.0 1.0 $i_a3m.csb.hhblits.ss2 $ID.ss > $ID.horiz + +( +echo ">ss_pred" +grep "^Pred" $ID.horiz | awk '{print $2}' +echo ">ss_conf" +grep "^Conf" $ID.horiz | awk '{print $2}' +) | awk '{if(substr($1,1,1)==">") {print "\n"$1} else {printf "%s", $1}} END {print ""}' | sed "1d" > $o_ss + +rm ${i_a3m}.csb.hhblits.ss2 +rm $ID.* \ No newline at end of file diff --git a/install_dependencies.sh b/install_dependencies.sh new file mode 100644 index 0000000..1ff56e6 --- /dev/null +++ b/install_dependencies.sh @@ -0,0 +1,22 @@ +#!/bin/bash +# From: https://github.com/RosettaCommons/RoseTTAFold + +# install external program not supported by conda installation +case "$(uname -s)" in + Linux*) platform=linux;; + Darwin*) platform=macosx;; + *) echo "unsupported OS type. exiting"; exit 1 +esac +echo "Installing dependencies for ${platform}..." + +# the cs-blast platform descriptoin includes the width of memory addresses +# we expect a 64-bit operating system +if [[ ${platform} == "linux" ]]; then + platform=${platform}64 +fi + +# download cs-blast +echo "Downloading cs-blast ..." +wget http://wwwuser.gwdg.de/~compbiol/data/csblast/releases/csblast-2.2.3_${platform}.tar.gz -O csblast-2.2.3.tar.gz +mkdir -p csblast-2.2.3 +tar xf csblast-2.2.3.tar.gz -C csblast-2.2.3 --strip-components=1 diff --git a/make_msa.sh b/make_msa.sh index 9fbc874..6da7bc4 100755 --- a/make_msa.sh +++ b/make_msa.sh @@ -8,9 +8,12 @@ out_dir="$2" CPU="$3" MEM="$4" -# pipe_dir -PIPE_DIR="$5" -DB_TEMPL="$6" +# template database +DB_TEMPL="$5" + +# current script directory (i.e., pipe directory) +SCRIPT=`realpath -s $0` +export PIPE_DIR=`dirname $SCRIPT` # sequence databases DB_UR30="$PIPE_DIR/uniclust/UniRef30_2021_06" @@ -109,6 +112,7 @@ then fi echo "Running PSIPRED" +mkdir -p $out_dir/log $PIPE_DIR/input_prep/make_ss.sh $out_dir/t000_.msa0.a3m $out_dir/t000_.ss2 > $out_dir/log/make_ss.stdout 2> $out_dir/log/make_ss.stderr if [ ! -s $out_dir/t000_.hhr ] diff --git a/rf2aa/config/inference/base.yaml b/rf2aa/config/inference/base.yaml index c6662ce..db93bcd 100644 --- a/rf2aa/config/inference/base.yaml +++ b/rf2aa/config/inference/base.yaml @@ -3,7 +3,7 @@ output_path: "" checkpoint_path: RFAA_paper_weights.pt database_params: sequencedb: "" - hhdb: "pdb100_2022Apr19/pdb100_2022Apr19" + hhdb: "pdb100_2021Mar03/pdb100_2021Mar03" command: make_msa.sh num_cpus: 4 mem: 64